Comparison Between TreeValue and DM-Tree¶
In this section, we will compare the feature and performance of the dm-tree library, which is developed by deepmind.
Before starting the comparison, let us define some thing.
[1]:
origin = {'a': 1, 'b': 2, 'c': {'x': 3, 'y': 4}}
Mapping Operation¶
Mapping operation is quite common in the processing of trees. A mapping function should be provided to create a new tree based on the mapped tree.
TreeValue’s mapping¶
In TreeValue, mapping is provided to simply create a new tree.
[2]:
from treevalue import FastTreeValue, mapping
tv = FastTreeValue(origin)
tv
[2]:
<FastTreeValue 0x7f7f60ecd2e0>
├── 'a' --> 1
├── 'b' --> 2
└── 'c' --> <FastTreeValue 0x7f7f60ecd940>
├── 'x' --> 3
└── 'y' --> 4
[3]:
mapping(tv, lambda x: x * 2)
[3]:
<FastTreeValue 0x7f7f60ecdf70>
├── 'a' --> 2
├── 'b' --> 4
└── 'c' --> <FastTreeValue 0x7f7f60ecd700>
├── 'x' --> 6
└── 'y' --> 8
Here is the performance test.
[4]:
%timeit mapping(tv, lambda x: x * 2)
1.69 µs ± 2.32 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In order to support the cased that the mapped value is related to both path and value of each node, we can use the ‘path mapping mode’ by simply use the second parameter of the mapping function.
[5]:
mapping(tv, lambda x, p: ('path', p, 'value', x))
[5]:
<FastTreeValue 0x7f7f60acf640>
├── 'a' --> ('path', ('a',), 'value', 1)
├── 'b' --> ('path', ('b',), 'value', 2)
└── 'c' --> <FastTreeValue 0x7f7f60acf220>
├── 'x' --> ('path', ('c', 'x'), 'value', 3)
└── 'y' --> ('path', ('c', 'y'), 'value', 4)
And here is the performance
[6]:
%timeit mapping(tv, lambda x, p: ('path', p, 'value', x))
1.73 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
DM-Tree’s mapping¶
In DM-Tree, mapping operation is supported by map_structure function.
[7]:
from tree import map_structure
[8]:
map_structure(lambda x: x * 2, origin)
[8]:
{'a': 2, 'b': 4, 'c': {'x': 6, 'y': 8}}
This is the performance of map_structure
, obviously much slower than mapping
in TreeValue.
[9]:
%timeit map_structure(lambda x: x * 2, origin)
12.4 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
To supported the second situation in the last section, map_structure_with_path can be used.
[10]:
from tree import map_structure_with_path
[11]:
map_structure_with_path(lambda path, x: ('path', path, 'value', x), origin)
[11]:
{'a': ('path', ('a',), 'value', 1),
'b': ('path', ('b',), 'value', 2),
'c': {'x': ('path', ('c', 'x'), 'value', 3),
'y': ('path', ('c', 'y'), 'value', 4)}}
Here is the performance.
[12]:
%timeit map_structure_with_path(lambda path, x: ('path', path, 'value', x), origin)
26.7 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Flatten and Unflatten¶
In tree operations, flatten is often used to linearly expand the tree structure for operations such as parallel processing. Based on flatten, unflatten is its inverse operation, which can recover the tree structure from the linear data.
TreeValue’s Performance¶
In TreeValue, flatten and unflatten are provided, which usage are simple.
[13]:
from treevalue import FastTreeValue, flatten, unflatten
origin_tree = FastTreeValue(origin)
origin_tree
[13]:
<FastTreeValue 0x7f7f60a4dc10>
├── 'a' --> 1
├── 'b' --> 2
└── 'c' --> <FastTreeValue 0x7f7f60a4d7f0>
├── 'x' --> 3
└── 'y' --> 4
[14]:
flatted = flatten(origin_tree)
flatted
[14]:
[(('a',), 1), (('b',), 2), (('c', 'x'), 3), (('c', 'y'), 4)]
Here is the performance of flatten
[15]:
%timeit flatten(origin_tree)
515 ns ± 1.93 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
The tree can be re-created from flatted
with function unflatten
.
[16]:
unflatten(flatted)
[16]:
<TreeValue 0x7f7f60a4dca0>
├── 'a' --> 1
├── 'b' --> 2
└── 'c' --> <TreeValue 0x7f7f60a4d6d0>
├── 'x' --> 3
└── 'y' --> 4
And here is the performance.
[17]:
%timeit unflatten(flatted)
577 ns ± 0.624 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
DM-Tree’s Performance¶
Flatten is provided in DM-Tree as well, but it differs from that in TreeValue, as the following
[18]:
from tree import flatten
flatten(origin)
[18]:
[1, 2, 3, 4]
Here is the performance
[19]:
%timeit flatten(origin)
753 ns ± 5.86 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
The structure of the tree is dropped, only the data is extracted from the tree. This means the flatten
function in DM-Tree is irreversible, we can not recover the original tree with the result above.
The reversible resolution provided in DM-Tree is flatten_with_path
[20]:
from tree import flatten_with_path
flatten_with_path(origin)
[20]:
[(('a',), 1), (('b',), 2), (('c', 'x'), 3), (('c', 'y'), 4)]
Here is the performance, much slower than flatten
in TreeValue.
[21]:
%timeit flatten_with_path(origin)
13 µs ± 119 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
To re-create the original tree, we need a tree structure with any objects filled inside. Use the unflatten_as to archive this goal.
[22]:
from tree import unflatten_as
unflatten_as({'a': None, 'b': None, 'c': {'x': None, 'y': None}}, [1, 2, 3, 4])
[22]:
{'a': 1, 'b': 2, 'c': {'x': 3, 'y': 4}}
Here is the performance.
[23]:
%timeit unflatten_as({'a': None, 'b': None, 'c': {'x': None, 'y': None}}, [1, 2, 3, 4])
10.2 µs ± 9.32 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Positional Replacement¶
It is obvious that the unflatten_as
in DM-Tree is quite different from unflatten
in TreeValue, for the former one is replacing
and the latter one is constructing
. This means the unflatten_as
in DM-Tree may supported some more features, such as creating a new tree with another tree’s structure and the given values. So we need an experiment on this, called ‘positional replacement’.
First, in TreeValue, we can build a function named replace
to realize this.
[24]:
from treevalue import flatten, unflatten
def replace(t, v):
pairs = flatten(t)
return unflatten([(path, vi) for (path, _), vi in zip(pairs, v)])
Create a new tree based on origin_tree
’s structure and new values.
[25]:
replace(origin_tree, [3, 5, 7, 9])
[25]:
<TreeValue 0x7f7f60a4de50>
├── 'a' --> 3
├── 'b' --> 5
└── 'c' --> <TreeValue 0x7f7f60a4ddf0>
├── 'x' --> 7
└── 'y' --> 9
Here is the performance.
[26]:
%timeit replace(origin_tree, [3, 5, 7, 9])
1.89 µs ± 8.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In DM-Tree, unflatten_as
can be directly used.
[27]:
from tree import unflatten_as
unflatten_as(origin, [3, 5, 7, 9])
[27]:
{'a': 3, 'b': 5, 'c': {'x': 7, 'y': 9}}
Here is the performance, even much slower than the replace
function for integration.
[28]:
%timeit unflatten_as(origin, [3, 5, 7, 9])
10 µs ± 67.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Conclusion¶
The mapping operation is supported by both library, and mapping
in TreeValue’s performance is significantly higher than the map_structure
and map_structure_with_path
in DM-Tree.
The flatten
and unflatten
in TreeValue is reversible, but in DM-Tree not. DM-Tree’s performance on flatten and unflatten operation is lower than that in TreeValue in all aspects.