Apply into Scikit-Learn¶
Actually, TreeValue
can be used in practice with not only numpy
or torch
library, such as scikit-learn
.
In the following part, a demo of PCA to tree-structured arrays will be shown.
In the field of traditional machine learning, PCA (Principal Component Analysis) is often used to preprocess data, by normalizing the data range, and trying to reduce the dimensionality of the data, so as to reduce the complexity of the input data and improve machine learning’s efficiency and quality. Just as the following image
In the scikit-learn library, the PCA class is provided to support this function, and the function fit_transform
can be used to simplify the data. For a set of np.array
format data that presents a tree structure,
we can implement the operation support for the tree structure by quickly wrapping the function fit_transform
.
The specific code is as follows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import numpy as np from sklearn.decomposition import PCA from treevalue import FastTreeValue fit_transform = FastTreeValue.func()(lambda x: PCA(min(*x.shape)).fit_transform(x)) if __name__ == '__main__': data = FastTreeValue({ 'a': np.random.randint(-5, 15, (4, 3)), 'x': { 'c': np.random.randint(-15, 5, (5, 4)), } }) print("Original int data:") print(data) pdata = fit_transform(data) print("Fit transformed data:") print(pdata) |
The output should be
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | Original int data: <FastTreeValue 0x7ffa9d479b20> ├── 'a' --> array([[ 3, 7, -5], │ [ 4, -4, 7], │ [ 4, 7, -4], │ [ 3, 1, 3]]) └── 'x' --> <FastTreeValue 0x7ffa9d4fcee0> └── 'c' --> array([[ -3, -11, -2, -6], [ 0, -3, -11, -14], [ -2, -13, -13, -2], [ -3, 1, -11, -2], [-12, -15, -15, -6]]) Fit transformed data: <FastTreeValue 0x7ffa902a2e20> ├── 'a' --> array([[-6.7481839 , 0.12751534, 0.56635135], │ [ 9.54678897, -0.4463037 , 0.18610041], │ [-5.99511323, -0.4463037 , -0.48418007], │ [ 3.19650816, 0.76509206, -0.26827169]]) └── 'x' --> <FastTreeValue 0x7ffa9d12deb0> └── 'c' --> array([[ 1.23592547e-01, 8.54548392e+00, -1.90108441e+00, -1.65689697e+00], [-7.81069860e+00, -2.35608679e-03, 6.72021810e+00, 1.10979749e+00], [ 4.45032136e+00, -4.83881457e-01, -2.66695517e+00, 4.75897559e+00], [-7.11753760e+00, -4.83788801e+00, -4.97243660e+00, -1.79175119e+00], [ 1.03543223e+01, -3.22135836e+00, 2.82025808e+00, -2.42012492e+00]]) |
For further information, see the links below: