grl.neural_network¶
ConcatenateLayer¶
MultiLayerPerceptron¶
- class grl.neural_network.MultiLayerPerceptron(hidden_sizes, output_size, activation, dropout=None, layernorm=False, final_activation=None, scale=None, shrink=None)[source]¶
- Overview:
Multi-layer perceptron using fully-connected layers with activation, dropout, and layernorm. x -> fc1 -> act1 -> dropout -> layernorm -> … -> fcn -> actn -> out
- Interface:
__init__
,forward
- __init__(hidden_sizes, output_size, activation, dropout=None, layernorm=False, final_activation=None, scale=None, shrink=None)[source]¶
- Overview:
Initiate the multi-layer perceptron.
- Parameters:
hidden_sizes (-) – The list of hidden sizes.
output_size (-) – The number of channels in the output tensor.
activation (-) – The optional activation function.
dropout (-) – Probability of an element to be zeroed in the dropout. Default is None.
layernorm (-) – Whether to use layernorm in the fully-connected block. Default is False.
final_activation (-) – The optional activation function in the final layer. Default is None.
scale (-) – The scale of the output tensor. Default is None.
shrink (-) – The shrinkage factor of the output tensor. Default is None.
ConcatenateMLP¶
- class grl.neural_network.ConcatenateMLP(**kwargs)[source]¶
- Overview:
Concatenate the input tensors along the last dimension and then pass through a multi-layer perceptron.
- Interface:
__init__
,forward
TemporalSpatialResidualNet¶
- class grl.neural_network.TemporalSpatialResidualNet(hidden_sizes, output_dim, t_dim, input_dim=None, condition_dim=None, condition_hidden_dim=None, t_condition_hidden_dim=None)[source]¶
- Overview:
Temporal Spatial Residual Network using multiple TemporalSpatialResBlock.
- Interface:
__init__
,forward
- __init__(hidden_sizes, output_dim, t_dim, input_dim=None, condition_dim=None, condition_hidden_dim=None, t_condition_hidden_dim=None)[source]¶
- Overview:
Initiate the temporal spatial residual network.
- Parameters:
hidden_sizes (-) – The list of hidden sizes.
output_dim (-) – The number of channels in the output tensor.
t_dim (-) – The dimension of the temporal input.
condition_dim (-) – The number of channels in the condition tensor. Default is None.
condition_hidden_dim (-) – The number of channels in the hidden condition tensor. Default is None.
t_condition_hidden_dim (-) – The number of channels in the hidden temporal condition tensor. Default is None.
DiT¶
- class grl.neural_network.DiT(input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, num_classes=1000, learn_sigma=True, condition=True)[source]¶
- Overview:
Diffusion model with a Transformer backbone. This is the official implementation of Github repo: https://github.com/facebookresearch/DiT/blob/main/models.py
- Interfaces:
__init__
,forward
- __init__(input_size=32, patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, class_dropout_prob=0.1, num_classes=1000, learn_sigma=True, condition=True)[source]¶
- Overview:
Initialize the DiT model.
- Parameters:
input_size (
int
, defaults to 32) – The input size.patch_size (
int
, defaults to 2) – The patch size.in_channels (
int
, defaults to 4) – The number of input channels.hidden_size (
int
, defaults to 1152) – The hidden size.depth (
int
, defaults to 28) – The depth.num_heads (
int
, defaults to 16) – The number of attention heads.mlp_ratio (
float
, defaults to 4.0) – The hidden size of the MLP with respect to the hidden size of Attention.class_dropout_prob (
float
, defaults to 0.1) – The class dropout probability.num_classes (
int
, defaults to 1000) – The number of classes.learn_sigma (
bool
, defaults to True) – Whether to learn sigma.
- forward(t, x, condition=None)[source]¶
- Overview:
Forward pass of DiT.
- Parameters:
t (
torch.Tensor
) – Tensor of diffusion timesteps.x (
torch.Tensor
) – Tensor of spatial inputs (images or latent representations of images).condition (
Union[torch.Tensor, TensorDict]
, optional) – The input condition, such as class labels.
- forward_with_cfg(t, x, condition=None, cfg_scale=1.0)[source]¶
- Overview:
Forward pass of DiT, but also batches the unconditional forward pass for classifier-free guidance.
- Parameters:
t (
torch.Tensor
) – Tensor of diffusion timesteps.x (
torch.Tensor
) – Tensor of spatial inputs (images or latent representations of images).condition (
Union[torch.Tensor, TensorDict]
, optional) – The input condition, such as class labels.cfg_scale (
float
, defaults to 1.0) – The scale for classifier-free guidance.
DiT1D¶
- class grl.neural_network.DiT1D(token_size, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, condition_embedder=None)[source]¶
- Overview:
Transformer backbone for Diffusion model for 1D data.
- Interfaces:
__init__
,forward
- __init__(token_size, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, condition_embedder=None)[source]¶
- Overview:
Initialize the DiT model.
- Parameters:
in_channels (
Union[int, List[int], Tuple[int]]
) – The number of input channels, defaults to 4.hidden_size (
int
) – The hidden size of attention layer, defaults to 1152.depth (
int
) – The depth of transformer, defaults to 28.num_heads (
int
) – The number of attention heads, defaults to 16.mlp_ratio (
float
) – The hidden size of the MLP with respect to the hidden size of Attention, defaults to 4.0.
- forward(t, x, condition=None)[source]¶
- Overview:
Forward pass of DiT for 3D data.
- Parameters:
t (
torch.Tensor
) – Tensor of diffusion timesteps.x (
torch.Tensor
) – Tensor of inputs with spatial information (originally at t=0 it is tensor of videos or latent representations of videos).condition (
Union[torch.Tensor, TensorDict]
, optional) – The input condition, such as class labels.
DiT2D¶
DiT3D¶
- class grl.neural_network.DiT3D(patch_block_size=[10, 32, 32], patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, learn_sigma=True, convolved=False)[source]¶
- Overview:
Transformer backbone for Diffusion model for data of 3D shape.
- Interfaces:
__init__
,forward
- __init__(patch_block_size=[10, 32, 32], patch_size=2, in_channels=4, hidden_size=1152, depth=28, num_heads=16, mlp_ratio=4.0, learn_sigma=True, convolved=False)[source]¶
- Overview:
Initialize the DiT model.
- Parameters:
patch_block_size (
Union[List[int], Tuple[int]]
) – The size of patch block, defaults to [10, 32, 32].patch_size (
Union[int, List[int], Tuple[int]]
) – The patch size of each token in attention layer, defaults to 2.in_channels (
Union[int, List[int], Tuple[int]]
) – The number of input channels, defaults to 4.hidden_size (
int
) – The hidden size of attention layer, defaults to 1152.depth (
int
) – The depth of transformer, defaults to 28.num_heads (
int
) – The number of attention heads, defaults to 16.mlp_ratio (
float
) – The hidden size of the MLP with respect to the hidden size of Attention, defaults to 4.0.learn_sigma (
bool
) – Whether to learn sigma, defaults to True.convolved (
bool
) – Whether to use fully connected layer for all channels, defaults to False.
- forward(t, x, condition=None)[source]¶
- Overview:
Forward pass of DiT for 3D data.
- Parameters:
t (
torch.Tensor
) – Tensor of diffusion timesteps.x (
torch.Tensor
) – Tensor of inputs with spatial information (originally at t=0 it is tensor of videos or latent representations of videos).condition (
Union[torch.Tensor, TensorDict]
, optional) – The input condition, such as class labels.
- unpatchify(x)[source]¶
- Overview:
Unpatchify the output tensor of attention layer.
- Parameters:
x (
torch.Tensor
) – The input tensor of shape (N, total_patches = T’ * H’ * W’, patch_size[0] * patch_size[1] * patch_size[2] * C)- Returns:
The output tensor of shape (N, T, C, H, W).
- Return type:
x (
torch.Tensor
)