Apply into Scikit-Learn

Actually, TreeValue can be used in practice with not only numpy or torch library, such as scikit-learn. In the following part, a demo of PCA to tree-structured arrays will be shown.

In the field of traditional machine learning, PCA (Principal Component Analysis) is often used to preprocess data, by normalizing the data range, and trying to reduce the dimensionality of the data, so as to reduce the complexity of the input data and improve machine learning’s efficiency and quality. Just as the following image

PCA Principle

PCA in a nutshell. Source: Lavrenko and Sutton 2011, slide 13.

In the scikit-learn library, the PCA class is provided to support this function, and the function fit_transform can be used to simplify the data. For a set of np.array format data that presents a tree structure, we can implement the operation support for the tree structure by quickly wrapping the function fit_transform. The specific code is as follows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
from sklearn.decomposition import PCA

from treevalue import FastTreeValue

fit_transform = FastTreeValue.func()(lambda x: PCA(min(*x.shape)).fit_transform(x))

if __name__ == '__main__':
    data = FastTreeValue({
        'a': np.random.randint(-5, 15, (4, 3)),
        'x': {
            'c': np.random.randint(-15, 5, (5, 4)),
        }
    })
    print("Original int data:")
    print(data)

    pdata = fit_transform(data)
    print("Fit transformed data:")
    print(pdata)

The output should be

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Original int data:
<FastTreeValue 0x7f2cb55edeb0>
├── 'a' --> array([[ 7, 11,  3],
│                  [ 2, 13, 13],
│                  [ 3, 12, 10],
│                  [-1,  3,  5]])
└── 'x' --> <FastTreeValue 0x7f2ca8a76e20>
    └── 'c' --> array([[ -5, -11,   1,   0],
                       [-14,   2,  -1,  -3],
                       [ -6,  -4,  -2, -12],
                       [  2,   0,  -1,  -1],
                       [ -8,  -2,   0,  -4]])

Fit transformed data:
<FastTreeValue 0x7f2ca3cbaac0>
├── 'a' --> array([[ 1.24110345e+00,  6.37548620e+00, -1.83816599e-02],
│                  [-5.61178357e+00, -2.68193439e+00, -5.11186535e-02],
│                  [-3.15855077e+00, -4.53464649e-01,  7.36670349e-02],
│                  [ 7.52923089e+00, -3.24008717e+00, -4.16672148e-03]])
└── 'x' --> <FastTreeValue 0x7f2cb5cbcee0>
    └── 'c' --> array([[-6.15062694,  6.78899524, -0.2519523 , -0.12590445],
                       [ 8.3686889 ,  0.65037037, -4.05769525, -0.27780155],
                       [ 2.19200512, -1.55928584,  7.7296949 , -0.12473178],
                       [-6.29228588, -6.21436906, -2.67926586, -0.10143986],
                       [ 1.8822188 ,  0.3342893 , -0.74078149,  0.62987763]])

For further information, see the links below: