Main Idea

Why TreeValue is designed?

When we build programs, we often encounter some structured data with complicated nested components, and the usual practice is to model them with tree-like data structures. In Python, native dict/list are commonly used containers.

However, it takes a lot of codes and some complex and non-intuitive calculation logics. It could be inconvenient to modify and extend related code and data. And parallelization is more impossible.

In fact, the root reason of this problem is that the native python dictionary data type is a basic data structure which is biased towards functional generalization, and it does not fully meet the various data processing and operations based on the tree data structure.

Therefore, we need a kind of more proper data container —— TreeValue.

For example, if we need to add all the integers of the 2 dictionaries together with native python, the code will be like below,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# native plus between dictionaries
def plus(a, b):
    _result = {}
    for key in set(a.keys()) | set(b.keys()):
        if isinstance(a[key], int) and isinstance(b[key], int):
            _result[key] = a[key] + b[key]
        else:
            _result[key] = plus(a[key], b[key])

    return _result


if __name__ == "__main__":
    d1 = {'a': 1, 'b': 2, 'x': {'c': 3, 'd': 4}}
    d2 = {'a': 11, 'b': 22, 'x': {'c': 30, 'd': 47}}

    print('plus(d1, d2):')
    print(plus(d1, d2))

with the following output.

1
2
plus(d1, d2):
{'x': {'c': 33, 'd': 51}, 'a': 12, 'b': 24}

Although in fact, as python coders, we may have been accustomed to such a native recursive writing method, but from the perspective of actual engineering construction, such a writing method is not intuitive enough:

  • Ease of Use: First of all, this way of writing has been quite different from the original simple addition in terms of look and feel and engineering structure. In order to support dictionary-based operations, the original simple addition operations have become completely unrecognizable.

  • Diversity of Data: Secondly, in the actual tree structure operation, there may be various exception handling situations (such as the missing of some key values, the corresponding positions on the two trees, one is an integer and the other is a child node, etc.), and if you want to In the native python code, these things that can often be expected with a high probability are more complete and satisfactory processing, and the code needs to be further complicated.

  • Scalability and Parallelization: In addition, the above code shows only operations based on two trees. If we need to extend the calculation itself, such as extending to operations with more than two trees, or operations with less than two trees, we will either continue to scale-enlarged change the logic of the plus function, or use the reduce function. In the former case, the above problems will be further aggravated, while in the latter case, another function needs to be encapsulated. No matter which kind of extension method, the code will be greatly complicated, and the quality for programmers will eventually be difficult to guarantee.

So the TreeValue is designed to solve the above problems and provided sufficient secondary development support for developers, and some more complex and comprehensive applications can be developed based on this framework.

This is the example code to implement the functions of the code above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from treevalue import TreeValue, func_treelize


@func_treelize()
def plus(a, b):
    return a + b


if __name__ == "__main__":
    d1 = TreeValue({'a': 1, 'b': 2, 'x': {'c': 3, 'd': 4}})
    d2 = TreeValue({'a': 11, 'b': 22, 'x': {'c': 30, 'd': 47}})

    print("plus(d1, d2):")
    print(plus(d1, d2))

The result should be like below.

1
2
3
4
5
6
7
plus(d1, d2):
<TreeValue 0x7fcee412de50>
├── 'a' --> 12
├── 'b' --> 24
└── 'x' --> <TreeValue 0x7fcee4594df0>
    ├── 'c' --> 33
    └── 'd' --> 51

Besides, based on the framework provided by us, another way of code building can be like the following code.

1
2
3
4
5
6
7
8
from treevalue import FastTreeValue

if __name__ == "__main__":
    d1 = FastTreeValue({'a': 1, 'b': 2, 'x': {'c': 3, 'd': 4}})
    d2 = FastTreeValue({'a': 11, 'b': 22, 'x': {'c': 30, 'd': 47}})

    print("d1 + d2:")
    print(d1 + d2)

This code will have the same function with the codes above.

1
2
3
4
5
6
7
d1 + d2:
<FastTreeValue 0x7f6552ebddf0>
├── 'a' --> 12
├── 'b' --> 24
└── 'x' --> <FastTreeValue 0x7f6552bf7a30>
    ├── 'c' --> 33
    └── 'd' --> 51

Definitions for Key Conceptions

In this section, the following key conceptions will be introduced:

  • TreeValue

  • Tree Calculation

  • Tree Function Wrapper

TreeValue

TreeValue is a data structure which is based on tree structure. Nested features of tree structure are natively supported in TreeValue. In this project, TreeValue is the basic calculation unit of any operations.

From a data structure perspective, TreeValue is a rooted, directed tree, and their nodes can be divided into two categories:

  • Non-leaf node, represents the subtree structure.

  • Leaf node, which stores the value.

Also, each edge of the tree has one string key which length is no less than 1, and all edges from the same parent node have different keys with each other. The following graph shows a simple example, the TreeValue t is a TreeValue instance, for the diamond-shaped node is the root node, the circle-shaped node is the non-leaf nodes and the rectangle-shaped node is the leaf nodes with their values.

../../_images/treevalue_demo.gv.svg

Treelize

Base on the definition of TreeValue, the calculation based on it is named Tree Calculation.

Usually, the native calculation, operations in python are all able to be abstracted to functions, for the values can be mapped as the arguments of the function, and the function’s return value is the calculation’s result. For example, native + operator can be transformed to __add__ function (actually it has already been define in operators module and can be imported in your code), the expression \(1 + 2 = 3\) can be transformed to function-based expression of \(\_\_add\_\_(1, 2) = 3\). In mathematical form, the transformed functions can be tagged as \(f\).

../../_images/treelize_demo.gv.svg

And if we need to expand these function to support the similar calculation on the TreeValue instances (just like the graph shows above), we will need to expand the function \(f\) to a new tree-supported function which is named \(f_T\). For example, the __add__ function for native integers can be seen as function \(f\), while the _add__ function on the TreeValue in the example above is the expanded function \(f_T\).

Note

Actually, the function \(f_T\) will not change any mathematical properties when all the arguments are native values. Expressed in symbolic expression is (all the \(a_i \left(0 \leq i \leq n\right)\), satisfies that \(a_i\) is a native value).

\[\forall \left\{ a_i \right\} \left( 0 \leq i \leq n\right), f_T\left(a_0, a_1, \cdots, a_n\right) = f\left(a_0, a_1, \cdots, a_n\right)\]

Based on this mathematical property, treelize operation to the native functions will not change its original property, when you are using it as a native function, you will feel native special than usual.

So we can define this kind of mappings from function \(f\) to \(f_T\) as mapping \(T\), so the treelize process can be expressed in the following symbolic form:

\[f_T = T\left( f \right)\]

Based on this, the actual tree-based calculation’s process can be expressed as:

\[f_T\left(a_0, a_1, \cdots, a_n\right) \ = T\left(f\right)\left(a_0, a_1, \cdots, a_n\right)\]

The TreeValue and treelize functions are the core features that supported this project, and all of the calculation. The TreeValue is implemented as class TreeValue in this project (the extended one is called FastTreeValue), and treelize is implemented as function func_treelize in this project (the treelize function for instance methods and class methods are called method_treelize and classmethod_treelize.