ding.torch_utils¶
loss¶
Please refer to ding/torch_utils/loss
for more details.
ContrastiveLoss¶
- class ding.torch_utils.loss.ContrastiveLoss(x_size: int | SequenceType, y_size: int | SequenceType, heads: SequenceType = [1, 1], encode_shape: int = 64, loss_type: str = 'infoNCE', temperature: float = 1.0)[source]¶
- Overview:
The class for contrastive learning losses. Only InfoNCE loss is supported currently. Code Reference: https://github.com/rdevon/DIM. Paper Reference: https://arxiv.org/abs/1808.06670.
- Interfaces:
__init__
,forward
.
- __init__(x_size: int | SequenceType, y_size: int | SequenceType, heads: SequenceType = [1, 1], encode_shape: int = 64, loss_type: str = 'infoNCE', temperature: float = 1.0) None [source]¶
- Overview:
Initialize the ContrastiveLoss object using the given arguments.
- Arguments:
x_size (
Union[int, SequenceType]
): input shape for x, both the obs shape and the encoding shape are supported.y_size (
Union[int, SequenceType]
): Input shape for y, both the obs shape and the encoding shape are supported.heads (
SequenceType
): A list of 2 int elems,heads[0]
for x andhead[1]
for y. Used in multi-head, global-local, local-local MI maximization process.encoder_shape (
Union[int, SequenceType]
): The dimension of encoder hidden state.loss_type: Only the InfoNCE loss is available now.
temperature: The parameter to adjust the
log_softmax
.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _create_encoder(obs_size: int | SequenceType, heads: int) Module [source]¶
- Overview:
Create the encoder for the input obs.
- Arguments:
obs_size (
Union[int, SequenceType]
): input shape for x, both the obs shape and the encoding shape are supported. If the obs_size is an int, it means the obs is a 1D vector. If the obs_size is a list such as [1, 16, 16], it means the obs is a 3D image with shape [1, 16, 16].heads (
int
): The number of heads.
- Returns:
encoder (
nn.Module
): The encoder module.
- Examples:
>>> obs_size = 16 or >>> obs_size = [1, 16, 16] >>> heads = 1 >>> encoder = self._create_encoder(obs_size, heads)
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, y: Tensor) Tensor [source]¶
- Overview:
Computes the noise contrastive estimation-based loss, a.k.a. infoNCE.
- Arguments:
x (
torch.Tensor
): The input x, both raw obs and encoding are supported.y (
torch.Tensor
): The input y, both raw obs and encoding are supported.
- Returns:
loss (
torch.Tensor
): The calculated loss value.- Examples:
>>> x_dim = [3, 16] >>> encode_shape = 16 >>> x = np.random.normal(0, 1, size=x_dim) >>> y = x ** 2 + 0.01 * np.random.normal(0, 1, size=x_dim) >>> estimator = ContrastiveLoss(dims, dims, encode_shape=encode_shape) >>> loss = estimator.forward(x, y)
- Examples:
>>> x_dim = [3, 1, 16, 16] >>> encode_shape = 16 >>> x = np.random.normal(0, 1, size=x_dim) >>> y = x ** 2 + 0.01 * np.random.normal(0, 1, size=x_dim) >>> estimator = ContrastiveLoss(dims, dims, encode_shape=encode_shape) >>> loss = estimator.forward(x, y)
- training: bool¶
LabelSmoothCELoss¶
- class ding.torch_utils.loss.LabelSmoothCELoss(ratio: float)[source]¶
- Overview:
Label smooth cross entropy loss.
- Interfaces:
__init__
,forward
.
- __init__(ratio: float) None [source]¶
- Overview:
Initialize the LabelSmoothCELoss object using the given arguments.
- Arguments:
ratio (
float
): The ratio of label-smoothing (the value is in 0-1). If the ratio is larger, the extent of label smoothing is larger.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(logits: Tensor, labels: LongTensor) Tensor [source]¶
- Overview:
Calculate label smooth cross entropy loss.
- Arguments:
logits (
torch.Tensor
): Predicted logits.labels (
torch.LongTensor
): Ground truth.
- Returns:
loss (
torch.Tensor
): Calculated loss.
- training: bool¶
SoftFocalLoss¶
- class ding.torch_utils.loss.SoftFocalLoss(gamma: int = 2, weight: Any | None = None, size_average: bool = True, reduce: bool | None = None)[source]¶
- Overview:
Soft focal loss.
- Interfaces:
__init__
,forward
.
- __init__(gamma: int = 2, weight: Any | None = None, size_average: bool = True, reduce: bool | None = None) None [source]¶
- Overview:
Initialize the SoftFocalLoss object using the given arguments.
- Arguments:
gamma (
int
): The extent of focus on hard samples. A smallergamma
will lead to more focus on easy samples, while a largergamma
will lead to more focus on hard samples.weight (
Any
): The weight for loss of each class.size_average (
bool
): By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the fieldsize_average
is set toFalse
, the losses are instead summed for each minibatch. Ignored whenreduce
isFalse
.reduce (
Optional[bool]
): By default, the losses are averaged or summed over observations for each minibatch depending on size_average. Whenreduce
isFalse
, returns a loss for each batch element instead and ignoressize_average
.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(inputs: Tensor, targets: LongTensor) Tensor [source]¶
- Overview:
Calculate soft focal loss.
- Arguments:
logits (
torch.Tensor
): Predicted logits.labels (
torch.LongTensor
): Ground truth.
- Returns:
loss (
torch.Tensor
): Calculated loss.
- training: bool¶
build_ce_criterion¶
- ding.torch_utils.loss.build_ce_criterion(cfg: dict) Module [source]¶
- Overview:
Get a cross entropy loss instance according to given config.
- Arguments:
- cfg (
dict
)Config dict. It contains: type (
str
): Type of loss function, now supports [‘cross_entropy’, ‘label_smooth_ce’, ‘soft_focal_loss’].kwargs (
dict
): Arguments for the corresponding loss function.
- cfg (
- Returns:
loss (
nn.Module
): loss function instance
MultiLogitsLoss¶
- class ding.torch_utils.loss.MultiLogitsLoss(criterion: str | None = None, smooth_ratio: float = 0.1)[source]¶
- Overview:
Base class for supervised learning on linklink, including basic processes.
- Interfaces:
__init__
,forward
.
- __init__(criterion: str | None = None, smooth_ratio: float = 0.1) None [source]¶
- Overview:
Initialization method, use cross_entropy as default criterion.
- Arguments:
criterion (
str
): Criterion type, supports [‘cross_entropy’, ‘label_smooth_ce’].smooth_ratio (
float
): Smoothing ratio for label smoothing.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- static _get_distance_matrix(lx: ndarray, ly: ndarray, mat: ndarray, M: int) ndarray [source]¶
- Overview:
Get distance matrix.
- Arguments:
lx (
np.ndarray
): lx.ly (
np.ndarray
): ly.mat (
np.ndarray
): mat.M (
int
): M.
- _get_metric_matrix(logits: Tensor, labels: LongTensor) Tensor [source]¶
- Overview:
Calculate the metric matrix.
- Arguments:
logits (
torch.Tensor
): Predicted logits.labels (
torch.LongTensor
): Ground truth.
- Returns:
metric (
torch.Tensor
): Calculated metric matrix.
- _is_full_backward_hook: bool | None¶
- _label_process(logits: Tensor, labels: LongTensor) LongTensor [source]¶
- Overview:
Process the label according to the criterion.
- Arguments:
logits (
torch.Tensor
): Predicted logits.labels (
torch.LongTensor
): Ground truth.
- Returns:
ret (
torch.LongTensor
): Processed label.
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _match(matrix: Tensor)[source]¶
- Overview:
Match the metric matrix.
- Arguments:
matrix (
torch.Tensor
): Metric matrix.
- Returns:
index (
np.ndarray
): Matched index.
- _modules: Dict[str, Module | None]¶
- _nll_loss(nlls: Tensor, labels: LongTensor) Tensor [source]¶
- Overview:
Calculate the negative log likelihood loss.
- Arguments:
nlls (
torch.Tensor
): Negative log likelihood loss.labels (
torch.LongTensor
): Ground truth.
- Returns:
ret (
torch.Tensor
): Calculated loss.
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(logits: Tensor, labels: LongTensor) Tensor [source]¶
- Overview:
Calculate multiple logits loss.
- Arguments:
logits (
torch.Tensor
): Predicted logits, whose shape must be 2-dim, like (B, N).labels (
torch.LongTensor
): Ground truth.
- Returns:
loss (
torch.Tensor
): Calculated loss.
- training: bool¶
network.activation¶
Please refer to ding/torch_utils/network/activation
for more details.
Lambda¶
- class ding.torch_utils.network.activation.Lambda(f: Callable)[source]¶
- Overview:
A custom lambda module for constructing custom layers.
- Interfaces:
__init__
,forward
.
- __init__(f: Callable)[source]¶
- Overview:
Initialize the lambda module with a given function.
- Arguments:
f (
Callable
): a python function
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Compute the function of the input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor.
- training: bool¶
GLU¶
- class ding.torch_utils.network.activation.GLU(input_dim: int, output_dim: int, context_dim: int, input_type: str = 'fc')[source]¶
- Overview:
Gating Linear Unit (GLU), a specific type of activation function, which is first proposed in [Language Modeling with Gated Convolutional Networks](https://arxiv.org/pdf/1612.08083.pdf).
- Interfaces:
__init__
,forward
.
- __init__(input_dim: int, output_dim: int, context_dim: int, input_type: str = 'fc') None [source]¶
- Overview:
Initialize the GLU module.
- Arguments:
input_dim (
int
): The dimension of the input tensor.output_dim (
int
): The dimension of the output tensor.context_dim (
int
): The dimension of the context tensor.input_type (
str
): The type of input, now supports [‘fc’, ‘conv2d’]
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, context: Tensor) Tensor [source]¶
- Overview:
Compute the GLU transformation of the input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor.context (
torch.Tensor
): The context tensor.
- Returns:
x (
torch.Tensor
): The output tensor after GLU transformation.
- training: bool¶
Swish¶
- class ding.torch_utils.network.activation.Swish[source]¶
- Overview:
Swish activation function, which is a smooth, non-monotonic activation function. For more details, please refer to [Searching for Activation Functions](https://arxiv.org/pdf/1710.05941.pdf).
- Interfaces:
__init__
,forward
.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Compute the Swish transformation of the input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
x (
torch.Tensor
): The output tensor after Swish transformation.
- training: bool¶
GELU¶
- class ding.torch_utils.network.activation.GELU[source]¶
- Overview:
Gaussian Error Linear Units (GELU) activation function, which is widely used in NLP models like GPT, BERT. For more details, please refer to the original paper: https://arxiv.org/pdf/1606.08415.pdf.
- Interfaces:
__init__
,forward
.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Compute the GELU transformation of the input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
x (
torch.Tensor
): The output tensor after GELU transformation.
- training: bool¶
build_activation¶
- ding.torch_utils.network.activation.build_activation(activation: str, inplace: bool | None = None) Module [source]¶
- Overview:
Build and return the activation module according to the given type.
- Arguments:
activation (
str
): The type of activation module, now supports [‘relu’, ‘glu’, ‘prelu’, ‘swish’, ‘gelu’, ‘tanh’, ‘sigmoid’, ‘softplus’, ‘elu’, ‘square’, ‘identity’].inplace (Optional[
bool
): Execute the operation in-place in activation, defaults to None.
- Returns:
act_func (
nn.module
): The corresponding activation module.
network.diffusion¶
Please refer to ding/torch_utils/network/diffusion
for more details.
extract¶
cosine_beta_schedule¶
- ding.torch_utils.network.diffusion.cosine_beta_schedule(timesteps: int, s: float = 0.008, dtype=torch.float32)[source]¶
- Overview:
cosine schedule as proposed in https://openreview.net/forum?id=-NEXDKk8gZ
- Arguments:
timesteps (
int
): timesteps of diffusion steps (
float
): sdtype (
torch.dtype
): dtype of beta
- Return:
Tensor of beta [timesteps,], computing by cosine.
apply_conditioning¶
DiffusionConv1d¶
- class ding.torch_utils.network.diffusion.DiffusionConv1d(in_channels: int, out_channels: int, kernel_size: int, padding: int, activation: Module | None = None, n_groups: int = 8)[source]¶
- Overview:
Conv1d with activation and normalization for diffusion models.
- Interfaces:
__init__
,forward
- __init__(in_channels: int, out_channels: int, kernel_size: int, padding: int, activation: Module | None = None, n_groups: int = 8) None [source]¶
- Overview:
Create a 1-dim convlution layer with activation and normalization. This Conv1d have GropuNorm. And need add 1-dim when compute norm
- Arguments:
in_channels (
int
): Number of channels in the input tensorout_channels (
int
): Number of channels in the output tensorkernel_size (
int
): Size of the convolving kernelpadding (
int
): Zero-padding added to both sides of the inputactivation (
nn.Module
): the optional activation function
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(inputs) Tensor [source]¶
- Overview:
compute conv1d for inputs.
- Arguments:
inputs (
torch.Tensor
): input tensor
- Return:
out (
torch.Tensor
): output tensor
- training: bool¶
SinusoidalPosEmb¶
- class ding.torch_utils.network.diffusion.SinusoidalPosEmb(dim: int)[source]¶
- Overview:
class for computing sin position embeding
- Interfaces:
__init__
,forward
- __init__(dim: int) None [source]¶
- Overview:
Initialization of SinusoidalPosEmb class
- Arguments:
dim (
int
): dimension of embeding
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x) Tensor [source]¶
- Overview:
compute sin position embeding
- Arguments:
x (
torch.Tensor
): input tensor
- Return:
emb (
torch.Tensor
): output tensor
- training: bool¶
Residual¶
- class ding.torch_utils.network.diffusion.Residual(fn)[source]¶
- Overview:
Basic Residual block
- Interfaces:
__init__
,forward
- __init__(fn)[source]¶
- Overview:
Initialization of Residual class
- Arguments:
fn (
nn.Module
): function of residual block
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x, *arg, **kwargs)[source]¶
- Overview:
compute residual block
- Arguments:
x (
torch.Tensor
): input tensor
- training: bool¶
LayerNorm¶
- class ding.torch_utils.network.diffusion.LayerNorm(dim, eps=1e-05)[source]¶
- Overview:
LayerNorm, compute dim = 1, because Temporal input x [batch, dim, horizon]
- Interfaces:
__init__
,forward
- __init__(dim, eps=1e-05) None [source]¶
- Overview:
Initialization of LayerNorm class
- Arguments:
dim (
int
): dimension of inputeps (
float
): eps of LayerNorm
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- training: bool¶
PreNorm¶
- class ding.torch_utils.network.diffusion.PreNorm(dim, fn)[source]¶
- Overview:
PreNorm, compute dim = 1, because Temporal input x [batch, dim, horizon]
- Interfaces:
__init__
,forward
- __init__(dim, fn) None [source]¶
- Overview:
Initialization of PreNorm class
- Arguments:
dim (
int
): dimension of inputfn (
nn.Module
): function of residual block
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- training: bool¶
LinearAttention¶
- class ding.torch_utils.network.diffusion.LinearAttention(dim, heads=4, dim_head=32)[source]¶
- Overview:
Linear Attention head
- Interfaces:
__init__
,forward
- __init__(dim, heads=4, dim_head=32) None [source]¶
- Overview:
Initialization of LinearAttention class
- Arguments:
dim (
int
): dimension of inputheads (
int
): heads of attentiondim_head (
int
): dim of head
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- training: bool¶
ResidualTemporalBlock¶
- class ding.torch_utils.network.diffusion.ResidualTemporalBlock(in_channels: int, out_channels: int, embed_dim: int, kernel_size: int = 5, mish: bool = True)[source]¶
- Overview:
Residual block of temporal
- Interfaces:
__init__
,forward
- __init__(in_channels: int, out_channels: int, embed_dim: int, kernel_size: int = 5, mish: bool = True) None [source]¶
- Overview:
Initialization of ResidualTemporalBlock class
- Arguments:
in_channels (:obj:’int’): dim of in_channels
out_channels (:obj:’int’): dim of out_channels
embed_dim (:obj:’int’): dim of embeding layer
kernel_size (:obj:’int’): kernel_size of conv1d
mish (:obj:’bool’): whether use mish as a activate function
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x, t)[source]¶
- Overview:
compute residual block
- Arguments:
x (:obj:’tensor’): input tensor
t (:obj:’tensor’): time tensor
- training: bool¶
DiffusionUNet1d¶
- class ding.torch_utils.network.diffusion.DiffusionUNet1d(transition_dim: int, dim: int = 32, dim_mults: SequenceType = [1, 2, 4, 8], returns_condition: bool = False, condition_dropout: float = 0.1, calc_energy: bool = False, kernel_size: int = 5, attention: bool = False)[source]¶
- Overview:
Diffusion unet for 1d vector data
- Interfaces:
__init__
,forward
,get_pred
- __init__(transition_dim: int, dim: int = 32, dim_mults: SequenceType = [1, 2, 4, 8], returns_condition: bool = False, condition_dropout: float = 0.1, calc_energy: bool = False, kernel_size: int = 5, attention: bool = False) None [source]¶
- Overview:
Initialization of DiffusionUNet1d class
- Arguments:
transition_dim (:obj:’int’): dim of transition, it is obs_dim + action_dim
dim (:obj:’int’): dim of layer
dim_mults (:obj:’SequenceType’): mults of dim
returns_condition (:obj:’bool’): whether use return as a condition
condition_dropout (:obj:’float’): dropout of returns condition
calc_energy (:obj:’bool’): whether use calc_energy
kernel_size (:obj:’int’): kernel_size of conv1d
attention (:obj:’bool’): whether use attention
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x, cond, time, returns=None, use_dropout: bool = True, force_dropout: bool = False)[source]¶
- Overview:
compute diffusion unet forward
- Arguments:
x (:obj:’tensor’): noise trajectory
cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0
time (:obj:’int’): timestep of diffusion step
returns (:obj:’tensor’): condition returns of trajectory, returns is normal return
use_dropout (:obj:’bool’): Whether use returns condition mask
force_dropout (:obj:’bool’): Whether use returns condition
- get_pred(x, cond, time, returns: bool | None = None, use_dropout: bool = True, force_dropout: bool = False)[source]¶
- Overview:
compute diffusion unet forward
- Arguments:
x (:obj:’tensor’): noise trajectory
cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0
time (:obj:’int’): timestep of diffusion step
returns (:obj:’tensor’): condition returns of trajectory, returns is normal return
use_dropout (:obj:’bool’): Whether use returns condition mask
force_dropout (:obj:’bool’): Whether use returns condition
- training: bool¶
TemporalValue¶
- class ding.torch_utils.network.diffusion.TemporalValue(horizon: int, transition_dim: int, dim: int = 32, time_dim: int | None = None, out_dim: int = 1, kernel_size: int = 5, dim_mults: SequenceType = [1, 2, 4, 8])[source]¶
- Overview:
temporal net for value function
- Interfaces:
__init__
,forward
- __init__(horizon: int, transition_dim: int, dim: int = 32, time_dim: int | None = None, out_dim: int = 1, kernel_size: int = 5, dim_mults: SequenceType = [1, 2, 4, 8]) None [source]¶
- Overview:
Initialization of TemporalValue class
- Arguments:
horizon (:obj:’int’): horizon of trajectory
transition_dim (:obj:’int’): dim of transition, it is obs_dim + action_dim
dim (:obj:’int’): dim of layer
time_dim (:obj:’int’): dim of time
out_dim (:obj:’int’): dim of output
kernel_size (:obj:’int’): kernel_size of conv1d
dim_mults (:obj:’SequenceType’): mults of dim
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x, cond, time, *args)[source]¶
- Overview:
compute temporal value forward
- Arguments:
x (:obj:’tensor’): noise trajectory
cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0
time (:obj:’int’): timestep of diffusion step
- training: bool¶
network.dreamer¶
Please refer to ding/torch_utils/network/dreamer
for more details.
Conv2dSame¶
- class ding.torch_utils.network.dreamer.Conv2dSame(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] = 1, padding: str | int | Tuple[int, int] = 0, dilation: int | Tuple[int, int] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶
- Overview:
Conv2dSame Network for dreamerv3.
- Interfaces:
__init__
,forward
- _reversed_padding_repeated_twice: List[int]¶
- bias: Tensor | None¶
- calc_same_pad(i, k, s, d)[source]¶
- Overview:
Calculate the same padding size.
- Arguments:
i (
int
): Input size.k (
int
): Kernel size.s (
int
): Stride size.d (
int
): Dilation size.
- dilation: Tuple[int, ...]¶
- forward(x)[source]¶
- Overview:
compute the forward of Conv2dSame.
- Arguments:
x (
torch.Tensor
): Input tensor.
- groups: int¶
- in_channels: int¶
- kernel_size: Tuple[int, ...]¶
- out_channels: int¶
- output_padding: Tuple[int, ...]¶
- padding: str | Tuple[int, ...]¶
- padding_mode: str¶
- stride: Tuple[int, ...]¶
- transposed: bool¶
- weight: Tensor¶
DreamerLayerNorm¶
- class ding.torch_utils.network.dreamer.DreamerLayerNorm(ch, eps=0.001)[source]¶
- Overview:
DreamerLayerNorm Network for dreamerv3.
- Interfaces:
__init__
,forward
- __init__(ch, eps=0.001)[source]¶
- Overview:
Init the DreamerLayerNorm class.
- Arguments:
ch (
int
): Input channel.eps (
float
): Epsilon.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x)[source]¶
- Overview:
compute the forward of DreamerLayerNorm.
- Arguments:
x (
torch.Tensor
): Input tensor.
- training: bool¶
DenseHead¶
- class ding.torch_utils.network.dreamer.DenseHead(inp_dim, shape, layer_num, units, act='SiLU', norm='LN', dist='normal', std=1.0, outscale=1.0, device='cpu')[source]¶
- Overview:
DenseHead Network for value head, reward head, and discount head of dreamerv3.
- Interfaces:
__init__
,forward
- __init__(inp_dim, shape, layer_num, units, act='SiLU', norm='LN', dist='normal', std=1.0, outscale=1.0, device='cpu')[source]¶
- Overview:
Init the DenseHead class.
- Arguments:
inp_dim (
int
): Input dimension.shape (
tuple
): Output shape.layer_num (
int
): Number of layers.units (
int
): Number of units.act (
str
): Activation function.norm (
str
): Normalization function.dist (
str
): Distribution function.std (
float
): Standard deviation.outscale (
float
): Output scale.device (
str
): Device.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(features)[source]¶
- Overview:
compute the forward of DenseHead.
- Arguments:
features (
torch.Tensor
): Input tensor.
- training: bool¶
ActionHead¶
- class ding.torch_utils.network.dreamer.ActionHead(inp_dim, size, layers, units, act=<class 'torch.nn.modules.activation.ELU'>, norm=<class 'torch.nn.modules.normalization.LayerNorm'>, dist='trunc_normal', init_std=0.0, min_std=0.1, max_std=1.0, temp=0.1, outscale=1.0, unimix_ratio=0.01)[source]¶
- Overview:
ActionHead Network for action head of dreamerv3.
- Interfaces:
__init__
,forward
- __init__(inp_dim, size, layers, units, act=<class 'torch.nn.modules.activation.ELU'>, norm=<class 'torch.nn.modules.normalization.LayerNorm'>, dist='trunc_normal', init_std=0.0, min_std=0.1, max_std=1.0, temp=0.1, outscale=1.0, unimix_ratio=0.01)[source]¶
- Overview:
Initialize the ActionHead class.
- Arguments:
inp_dim (
int
): Input dimension.size (
int
): Output size.layers (
int
): Number of layers.units (
int
): Number of units.act (
str
): Activation function.norm (
str
): Normalization function.dist (
str
): Distribution function.init_std (
float
): Initial standard deviation.min_std (
float
): Minimum standard deviation.max_std (
float
): Maximum standard deviation.temp (
float
): Temperature.outscale (
float
): Output scale.unimix_ratio (
float
): Unimix ratio.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(features)[source]¶
- Overview:
compute the forward of ActionHead.
- Arguments:
features (
torch.Tensor
): Input tensor.
- training: bool¶
SampleDist¶
- class ding.torch_utils.network.dreamer.SampleDist(dist, samples=100)[source]¶
- Overview:
A kind of sample Dist for ActionHead of dreamerv3.
- Interfaces:
__init__
,mean
,mode
,entropy
OneHotDist¶
- class ding.torch_utils.network.dreamer.OneHotDist(logits=None, probs=None, unimix_ratio=0.0)[source]¶
- Overview:
A kind of onehot Dist for dreamerv3.
- Interfaces:
__init__
,mode
,sample
TwoHotDistSymlog¶
- class ding.torch_utils.network.dreamer.TwoHotDistSymlog(logits=None, low=-20.0, high=20.0, device='cpu')[source]¶
- Overview:
A kind of twohotsymlog Dist for dreamerv3.
- Interfaces:
__init__
,mode
,mean
,log_prob
,log_prob_target
- __init__(logits=None, low=-20.0, high=20.0, device='cpu')[source]¶
- Overview:
Initialize the TwoHotDistSymlog class.
- Arguments:
logits (
torch.Tensor
): Logits.low (
float
): Low.high (
float
): High.device (
str
): Device.
- log_prob(x)[source]¶
- Overview:
Calculate the log probability of the distribution.
- Arguments:
x (
torch.Tensor
): Input tensor.
SymlogDist¶
- class ding.torch_utils.network.dreamer.SymlogDist(mode, dist='mse', aggregation='sum', tol=1e-08, dim_to_reduce=[-1, -2, -3])[source]¶
- Overview:
A kind of Symlog Dist for dreamerv3.
- Interfaces:
__init__
,entropy
,mode
,mean
,log_prob
- __init__(mode, dist='mse', aggregation='sum', tol=1e-08, dim_to_reduce=[-1, -2, -3])[source]¶
- Overview:
Initialize the SymlogDist class.
- Arguments:
mode (
torch.Tensor
): Mode.dist (
str
): Distribution function.aggregation (
str
): Aggregation function.tol (
float
): Tolerance.dim_to_reduce (
list
): Dimension to reduce.
ContDist¶
Bernoulli¶
- class ding.torch_utils.network.dreamer.Bernoulli(dist=None)[source]¶
- Overview:
A kind of Bernoulli Dist for dreamerv3.
- Interfaces:
__init__
,entropy
,mode
,sample
,log_prob
- __init__(dist=None)[source]¶
- Overview:
Initialize the Bernoulli distribution.
- Arguments:
dist (
torch.Tensor
): Distribution.
network.gtrxl¶
Please refer to ding/torch_utils/network/gtrxl
for more details.
PositionalEmbedding¶
- class ding.torch_utils.network.gtrxl.PositionalEmbedding(embedding_dim: int)[source]¶
- Overview:
The PositionalEmbedding module implements the positional embedding used in the vanilla Transformer model.
- Interfaces:
__init__
,forward
Note
This implementation is adapted from https://github.com/kimiyoung/transformer-xl/blob/ master/pytorch/mem_transformer.py
- __init__(embedding_dim: int)[source]¶
- Overview:
Initialize the PositionalEmbedding module.
- Arguments:
embedding_dim: (
int
): The dimensionality of the embeddings.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(pos_seq: Tensor) Tensor [source]¶
- Overview:
Compute positional embedding given a sequence of positions.
- Arguments:
pos_seq (
torch.Tensor
): The positional sequence, typically a 1D tensor of integers in the form of [seq_len-1, seq_len-2, …, 1, 0],
- Returns:
pos_embedding (
torch.Tensor
): The computed positional embeddings. The shape of the tensor is (seq_len, 1, embedding_dim).
- training: bool¶
GRUGatingUnit¶
- class ding.torch_utils.network.gtrxl.GRUGatingUnit(input_dim: int, bg: float = 2.0)[source]¶
- Overview:
The GRUGatingUnit module implements the GRU gating mechanism used in the GTrXL model.
- Interfaces:
__init__
,forward
- __init__(input_dim: int, bg: float = 2.0)[source]¶
- Overview:
Initialize the GRUGatingUnit module.
- Arguments:
input_dim (
int
): The dimensionality of the input.bg (
bg
): The gate bias. By setting bg > 0 we can explicitly initialize the gating mechanism to be close to the identity map. This can greatly improve the learning speed and stability since it initializes the agent close to a Markovian policy (ignore attention at the beginning).
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, y: Tensor)[source]¶
- Overview:
Compute the output value using the GRU gating mechanism.
- Arguments:
x: (
torch.Tensor
): The first input tensor.y: (
torch.Tensor
): The second input tensor. x and y should have the same shape and their last dimension should match the input_dim.
- Returns:
g: (
torch.Tensor
): The output of the GRU gating mechanism. The shape of g matches the shapes of x and y.
- training: bool¶
Memory¶
- class ding.torch_utils.network.gtrxl.Memory(memory_len: int = 20, batch_size: int = 64, embedding_dim: int = 256, layer_num: int = 3, memory: Tensor | None = None)[source]¶
- Overview:
A class that stores the context used to add memory to Transformer.
- Interfaces:
__init__
,init
,update
,get
,to
Note
For details, refer to Transformer-XL: https://arxiv.org/abs/1901.02860
- __init__(memory_len: int = 20, batch_size: int = 64, embedding_dim: int = 256, layer_num: int = 3, memory: Tensor | None = None) None [source]¶
- Overview:
Initialize the Memory module.
- Arguments:
memory_len (
int
): The dimension of memory, i.e., how many past observations to use as memory.batch_size (
int
): The dimension of each batch.embedding_dim (
int
): The dimension of embedding, which is the dimension of a single observation after embedding.layer_num (
int
): The number of transformer layers.memory (
Optional[torch.Tensor]
): The initial memory. Default is None.
- get()[source]¶
- Overview:
Get the current memory.
- Returns:
memory: (
Optional[torch.Tensor]
): The current memory, with shape (layer_num, memory_len, bs, embedding_dim).
- init(memory: Tensor | None = None)[source]¶
- Overview:
Initialize memory with an input list of tensors or create it automatically given its dimensions.
- Arguments:
memory (
Optional[torch.Tensor]
): Input memory tensor with shape (layer_num, memory_len, bs, embedding_dim). Its shape is (layer_num, memory_len, bs, embedding_dim), where memory_len is length of memory, bs is batch size and embedding_dim is the dimension of embedding.
- to(device: str = 'cpu')[source]¶
- Overview:
Move the current memory to the specified device.
- Arguments:
device (
str
): The device to move the memory to. Default is ‘cpu’.
- update(hidden_state: List[Tensor])[source]¶
- Overview:
Update the memory given a sequence of hidden states. Example for single layer: (memory_len=3, hidden_size_len=2, bs=3)
m00 m01 m02 h00 h01 h02 m20 m21 m22
- m = m10 m11 m12 h = h10 h11 h12 => new_m = h00 h01 h02
m20 m21 m22 h10 h11 h12
- Arguments:
hidden_state: (
List[torch.Tensor]
): The hidden states to update the memory. Each tensor in the list has shape (cur_seq, bs, embedding_dim), where cur_seq is the length of the sequence.
- Returns:
memory: (
Optional[torch.Tensor]
): The updated memory, with shape (layer_num, memory_len, bs, embedding_dim).
AttentionXL¶
- class ding.torch_utils.network.gtrxl.AttentionXL(input_dim: int, head_dim: int, head_num: int, dropout: Module)[source]¶
- Overview:
An implementation of the Attention mechanism used in the TransformerXL model.
- Interfaces:
__init__
,forward
- __init__(input_dim: int, head_dim: int, head_num: int, dropout: Module) None [source]¶
- Overview:
Initialize the AttentionXL module.
- Arguments:
input_dim (
int
): The dimensionality of the input features.head_dim (
int
): The dimensionality of each attention head.head_num (
int
): The number of attention heads.dropout (
nn.Module
): The dropout layer to use
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _rel_shift(x: Tensor, zero_upper: bool = False) Tensor [source]¶
- Overview:
Perform a relative shift operation on the attention score matrix. Example:
a00 a01 a02 0 a00 a01 a02 0 a00 a01 a02 0 a10 a02 0 0 a10 a11 a12 => 0 a10 a11 a12 => a02 0 a10 => a11 a12 0 => a11 a12 0 a20 a21 a22 0 a20 a21 a22 a11 a12 0 a20 a21 a22 a20 a21 a22
a20 a21 a22
Append one “column” of zeros to the left
Reshape the matrix from [3 x 4] into [4 x 3]
Remove the first “row”
Mask out the upper triangle (optional)
Note
See the following material for better understanding: https://github.com/kimiyoung/transformer-xl/issues/8 https://arxiv.org/pdf/1901.02860.pdf (Appendix B)
- Arguments:
x (
torch.Tensor
): The input tensor with shape (cur_seq, full_seq, bs, head_num).zero_upper (
bool
): If True, the upper-right triangle of the matrix is set to zero.
- Returns:
x (
torch.Tensor
): The input tensor after the relative shift operation, with shape (cur_seq, full_seq, bs, head_num).
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(inputs: Tensor, pos_embedding: Tensor, full_input: Tensor, u: Parameter, v: Parameter, mask: Tensor | None = None) Tensor [source]¶
- Overview:
Compute the forward pass for the AttentionXL module.
- Arguments:
inputs (
torch.Tensor
): The attention input with shape (cur_seq, bs, input_dim).pos_embedding (
torch.Tensor
): The positional embedding with shape (full_seq, 1, full_seq).full_input (
torch.Tensor
): The concatenated memory and input tensor with shape (full_seq, bs, input_dim).u (
torch.nn.Parameter
): The content parameter with shape (head_num, head_dim).v (
torch.nn.Parameter
): The position parameter with shape (head_num, head_dim).mask (
Optional[torch.Tensor]
): The attention mask with shape (cur_seq, full_seq, 1). If None, no masking is applied.
- Returns:
output (
torch.Tensor
): The output of the attention mechanism with shape (cur_seq, bs, input_dim).
- training: bool¶
GatedTransformerXLLayer¶
- class ding.torch_utils.network.gtrxl.GatedTransformerXLLayer(input_dim: int, head_dim: int, hidden_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module, gru_gating: bool = True, gru_bias: float = 2.0)[source]¶
- Overview:
This class implements the attention layer of GTrXL (Gated Transformer-XL).
- Interfaces:
__init__
,forward
- __init__(input_dim: int, head_dim: int, hidden_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module, gru_gating: bool = True, gru_bias: float = 2.0) None [source]¶
- Overview:
Initialize GatedTransformerXLLayer.
- Arguments:
input_dim (
int
): The dimension of the input tensor.head_dim (
int
): The dimension of each head in the multi-head attention.hidden_dim (
int
): The dimension of the hidden layer in the MLP.head_num (
int
): The number of heads for the multi-head attention.mlp_num (
int
): The number of MLP layers in the attention layer.dropout (
nn.Module
): The dropout module used in the MLP and attention layers.activation (
nn.Module
): The activation function to be used in the MLP layers.gru_gating (
bool
, optional): Whether to use GRU gates. If False, replace GRU gates with residual connections. Default is True.gru_bias (
float
, optional): The bias of the GRU gate. Default is 2.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(inputs: Tensor, pos_embedding: Tensor, u: Parameter, v: Parameter, memory: Tensor, mask: Tensor | None = None) Tensor [source]¶
- Overview:
Compute forward pass of GTrXL layer.
- Arguments:
inputs (
torch.Tensor
): The attention input tensor of shape (cur_seq, bs, input_dim).pos_embedding (
torch.Tensor
): The positional embedding tensor of shape (full_seq, 1, full_seq).u (
torch.nn.Parameter
): The content parameter tensor of shape (head_num, head_dim).v (
torch.nn.Parameter
): The position parameter tensor of shape (head_num, head_dim).memory (
torch.Tensor
): The memory tensor of shape (prev_seq, bs, input_dim).- mask (
Optional[torch.Tensor]
): The attention mask tensor of shape (cur_seq, full_seq, 1). Default is None.
- mask (
- Returns:
output (
torch.Tensor
): layer output of shape (cur_seq, bs, input_dim)
- training: bool¶
GTrXL¶
- class ding.torch_utils.network.gtrxl.GTrXL(input_dim: int, head_dim: int = 128, embedding_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, memory_len: int = 64, dropout_ratio: float = 0.0, activation: Module = ReLU(), gru_gating: bool = True, gru_bias: float = 2.0, use_embedding_layer: bool = True)[source]¶
- Overview:
GTrXL Transformer implementation as described in “Stabilizing Transformer for Reinforcement Learning” (https://arxiv.org/abs/1910.06764).
- Interfaces:
__init__
,forward
,reset_memory
,get_memory
- __init__(input_dim: int, head_dim: int = 128, embedding_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, memory_len: int = 64, dropout_ratio: float = 0.0, activation: Module = ReLU(), gru_gating: bool = True, gru_bias: float = 2.0, use_embedding_layer: bool = True) None [source]¶
- Overview:
Init GTrXL Model.
- Arguments:
input_dim (
int
): The dimension of the input observation.head_dim (
int
, optional): The dimension of each head. Default is 128.embedding_dim (
int
, optional): The dimension of the embedding. Default is 256.head_num (
int
, optional): The number of heads for multi-head attention. Default is 2.mlp_num (
int
, optional): The number of MLP layers in the attention layer. Default is 2.layer_num (
int
, optional): The number of transformer layers. Default is 3.memory_len (
int
, optional): The length of memory. Default is 64.dropout_ratio (
float
, optional): The dropout ratio. Default is 0.activation (
nn.Module
, optional): The activation function. Default is nn.ReLU().gru_gating (
bool
, optional): If False, replace GRU gates with residual connections. Default is True.gru_bias (
float
, optional): The GRU gate bias. Default is 2.0.use_embedding_layer (
bool
, optional): If False, don’t use input embedding layer. Default is True.
- Raises:
AssertionError: If embedding_dim is not an even number.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, batch_first: bool = False, return_mem: bool = True) Dict[str, Tensor] [source]¶
- Overview:
Performs a forward pass on the GTrXL.
- Arguments:
x (
torch.Tensor
): The input tensor with shape (seq_len, bs, input_size).batch_first (
bool
, optional): If the input data has shape (bs, seq_len, input_size), set this parameter to True to transpose along the first and second dimension and obtain shape (seq_len, bs, input_size). This does not affect the output memory. Default is False. - return_mem (bool
, optional): If False, return only the output tensor without dict. Default is True.
- Returns:
x (
Dict[str, torch.Tensor]
): A dictionary containing the transformer output of shape (seq_len, bs, embedding_size) and memory of shape (layer_num, seq_len, bs, embedding_size).
- get_memory()[source]¶
- Overview:
Returns the memory of GTrXL.
- Returns:
memory (
Optional[torch.Tensor]
): The output memory or None if memory has not been initialized. The shape is (layer_num, memory_len, bs, embedding_dim).
- reset_memory(batch_size: int | None = None, state: Tensor | None = None)[source]¶
- Overview:
Clear or set the memory of GTrXL.
- Arguments:
batch_size (
Optional[int]
): The batch size. Default is None.state (
Optional[torch.Tensor]
): The input memory with shape (layer_num, memory_len, bs, embedding_dim). Default is None.
- training: bool¶
network.gumbel_softmax¶
Please refer to ding/torch_utils/network/gumbel_softmax
for more details.
GumbelSoftmax¶
- class ding.torch_utils.network.gumbel_softmax.GumbelSoftmax[source]¶
- Overview:
An nn.Module that computes GumbelSoftmax.
- Interfaces:
__init__
,forward
,gumbel_softmax_sample
Note
For more information on GumbelSoftmax, refer to the paper [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144).
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, temperature: float = 1.0, hard: bool = False) Tensor [source]¶
- Overview:
Forward pass for the GumbelSoftmax module.
- Arguments:
x (
torch.Tensor
): Unnormalized log-probabilities.temperature (
float
): Non-negative scalar controlling the sharpness of the distribution.hard (
bool
): If True, returns one-hot encoded labels. Default is False.
- Returns:
output (
torch.Tensor
): Sample from Gumbel-Softmax distribution.
- Shapes:
x: its shape is \((B, N)\), where B is the batch size and N is the number of classes.
y: its shape is \((B, N)\), where B is the batch size and N is the number of classes.
- gumbel_softmax_sample(x: Tensor, temperature: float, eps: float = 1e-08) Tensor [source]¶
- Overview:
Draw a sample from the Gumbel-Softmax distribution.
- Arguments:
x (
torch.Tensor
): Input tensor.temperature (
float
): Non-negative scalar controlling the sharpness of the distribution.eps (
float
): Small number to prevent division by zero, default is 1e-8.
- Returns:
output (
torch.Tensor
): Sample from Gumbel-Softmax distribution.
- training: bool¶
network.merge¶
Please refer to ding/torch_utils/network/merge
for more details.
BilinearGeneral¶
- class ding.torch_utils.network.merge.BilinearGeneral(in1_features: int, in2_features: int, out_features: int)[source]¶
- Overview:
Bilinear implementation as in: Multiplicative Interactions and Where to Find Them, ICLR 2020, https://openreview.net/forum?id=rylnK6VtDH.
- Interfaces:
__init__
,forward
- __init__(in1_features: int, in2_features: int, out_features: int)[source]¶
- Overview:
Initialize the Bilinear layer.
- Arguments:
in1_features (
int
): The size of each first input sample.in2_features (
int
): The size of each second input sample.out_features (
int
): The size of each output sample.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, z: Tensor)[source]¶
- Overview:
compute the bilinear function.
- Arguments:
x (
torch.Tensor
): The first input tensor.z (
torch.Tensor
): The second input tensor.
- training: bool¶
TorchBilinearCustomized¶
- class ding.torch_utils.network.merge.TorchBilinearCustomized(in1_features: int, in2_features: int, out_features: int)[source]¶
- Overview:
Customized Torch Bilinear implementation.
- Interfaces:
__init__
,forward
- __init__(in1_features: int, in2_features: int, out_features: int)[source]¶
- Overview:
Initialize the Bilinear layer.
- Arguments:
in1_features (
int
): The size of each first input sample.in2_features (
int
): The size of each second input sample.out_features (
int
): The size of each output sample.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x, z)[source]¶
- Overview:
Compute the bilinear function.
- Arguments:
x (
torch.Tensor
): The first input tensor.z (
torch.Tensor
): The second input tensor.
- training: bool¶
FiLM¶
- class ding.torch_utils.network.merge.FiLM(feature_dim: int, context_dim: int)[source]¶
- Overview:
Feature-wise Linear Modulation (FiLM) Layer. This layer applies feature-wise affine transformation based on context.
- Interfaces:
__init__
,forward
- __init__(feature_dim: int, context_dim: int)[source]¶
- Overview:
Initialize the FiLM layer.
- Arguments:
feature_dim (
int
). The dimension of the input feature vector.context_dim (
int
). The dimension of the input context vector.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(feature: Tensor, context: Tensor)[source]¶
- Overview:
Forward propagation.
- Arguments:
feature (
torch.Tensor
). The input feature, shape (batch_size, feature_dim).context (
torch.Tensor
). The input context, shape (batch_size, context_dim).
- Returns:
conditioned_feature : torch.Tensor. The output feature after FiLM, shape (batch_size, feature_dim).
- training: bool¶
GatingType¶
SumMerge¶
- class ding.torch_utils.network.merge.SumMerge(*args, **kwargs)[source]¶
- Overview:
A PyTorch module that merges a list of tensors by computing their sum. All input tensors must have the same size. This module can work with any type of tensor (vector, units or visual).
- Interfaces:
__init__
,forward
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(tensors: List[Tensor]) Tensor [source]¶
- Overview:
Forward pass of the SumMerge module, which sums the input tensors.
- Arguments:
tensors (
List[Tensor]
): List of input tensors to be summed. All tensors must have the same size.
- Returns:
summed (
Tensor
): Tensor resulting from the sum of all input tensors.
- training: bool¶
VectorMerge¶
- class ding.torch_utils.network.merge.VectorMerge(input_sizes: Dict[str, int], output_size: int, gating_type: GatingType = GatingType.NONE, use_layer_norm: bool = True)[source]¶
- Overview:
Merges multiple vector streams. Streams are first transformed through layer normalization, relu, and linear layers, then summed. They don’t need to have the same size. Gating can also be used before the sum.
- Interfaces:
__init__
,encode
,_compute_gate
,forward
Note
For more details about the gating types, please refer to the GatingType enum class.
- __init__(input_sizes: Dict[str, int], output_size: int, gating_type: GatingType = GatingType.NONE, use_layer_norm: bool = True)[source]¶
- Overview:
Initialize the VectorMerge module.
- Arguments:
input_sizes (
Dict[str, int]
): A dictionary mapping input names to their sizes. The size is a single integer for 1D inputs, or None for 0D inputs. If an input size is None, we assume it’s ().output_size (
int
): The size of the output vector.gating_type (
GatingType
): The type of gating mechanism to use. Default is GatingType.NONE.use_layer_norm (
bool
): Whether to use layer normalization. Default is True.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _compute_gate(init_gate: List[Tensor]) List[Tensor] [source]¶
- Overview:
Compute the gate values based on the initial gate values.
- Arguments:
init_gate (
List[Tensor]
): The initial gate values.
- Returns:
gate (
List[Tensor]
): The computed gate values.
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- encode(inputs: Dict[str, Tensor]) Tuple[List[Tensor], List[Tensor]] [source]¶
- Overview:
Encode the input tensors using layer normalization, relu, and linear transformations.
- Arguments:
inputs (
Dict[str, Tensor]
): The input tensors.
- Returns:
gates (
List[Tensor]
): The gate tensors after transformations.outputs (
List[Tensor]
): The output tensors after transformations.
- forward(inputs: Dict[str, Tensor]) Tensor [source]¶
- Overview:
Forward pass through the VectorMerge module.
- Arguments:
inputs (
Dict[str, Tensor]
): The input tensors.
- Returns:
output (
Tensor
): The output tensor after passing through the module.
- training: bool¶
network.nn_module¶
Please refer to ding/torch_utils/network/nn_module
for more details.
weight_init¶
- ding.torch_utils.network.nn_module.weight_init_(weight: Tensor, init_type: str = 'xavier', activation: str | None = None) None [source]¶
- Overview:
Initialize weight according to the specified type.
- Arguments:
weight (
torch.Tensor
): The weight that needs to be initialized.init_type (
str
, optional): The type of initialization to implement, supports [“xavier”, “kaiming”, “orthogonal”].activation (
str
, optional): The activation function name. Recommended to use only with [‘relu’, ‘leaky_relu’].
sequential_pack¶
- ding.torch_utils.network.nn_module.sequential_pack(layers: List[Module]) Sequential [source]¶
- Overview:
Pack the layers in the input list to a nn.Sequential module. If there is a convolutional layer in module, an extra attribute out_channels will be added to the module and set to the out_channel of the conv layer.
- Arguments:
layers (
List[nn.Module]
): The input list of layers.
- Returns:
seq (
nn.Sequential
): Packed sequential container.
conv1d_block¶
- ding.torch_utils.network.nn_module.conv1d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, activation: Module | None = None, norm_type: str | None = None) Sequential [source]¶
- Overview:
Create a 1-dimensional convolution layer with activation and normalization.
- Arguments:
in_channels (
int
): Number of channels in the input tensor.out_channels (
int
): Number of channels in the output tensor.kernel_size (
int
): Size of the convolving kernel.stride (
int
, optional): Stride of the convolution. Default is 1.padding (
int
, optional): Zero-padding added to both sides of the input. Default is 0.dilation (
int
, optional): Spacing between kernel elements. Default is 1.groups (
int
, optional): Number of blocked connections from input channels to output channels. Default is 1.activation (
nn.Module
, optional): The optional activation function.norm_type (
str
, optional): Type of the normalization.
- Returns:
block (
nn.Sequential
): A sequential list containing the torch layers of the 1-dimensional convolution layer.
conv2d_block¶
- ding.torch_utils.network.nn_module.conv2d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, pad_type: str = 'zero', activation: Module | None = None, norm_type: str | None = None, num_groups_for_gn: int = 1, bias: bool = True) Sequential [source]¶
- Overview:
Create a 2-dimensional convolution layer with activation and normalization.
- Arguments:
in_channels (
int
): Number of channels in the input tensor.out_channels (
int
): Number of channels in the output tensor.kernel_size (
int
): Size of the convolving kernel.stride (
int
, optional): Stride of the convolution. Default is 1.padding (
int
, optional): Zero-padding added to both sides of the input. Default is 0.dilation (
int
): Spacing between kernel elements.groups (
int
, optional): Number of blocked connections from input channels to output channels. Default is 1.pad_type (
str
, optional): The way to add padding, include [‘zero’, ‘reflect’, ‘replicate’]. Default is ‘zero’.activation (
nn.Module
): the optional activation function.norm_type (
str
): The type of the normalization, now support [‘BN’, ‘LN’, ‘IN’, ‘GN’, ‘SyncBN’], default set to None, which means no normalization.num_groups_for_gn (
int
): Number of groups for GroupNorm.bias (
bool
): whether to add a learnable bias to the nn.Conv2d. Default is True.
- Returns:
block (
nn.Sequential
): A sequential list containing the torch layers of the 2-dimensional convolution layer.
deconv2d_block¶
- ding.torch_utils.network.nn_module.deconv2d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, output_padding: int = 0, groups: int = 1, activation: int | None = None, norm_type: int | None = None) Sequential [source]¶
- Overview:
Create a 2-dimensional transpose convolution layer with activation and normalization.
- Arguments:
in_channels (
int
): Number of channels in the input tensor.out_channels (
int
): Number of channels in the output tensor.kernel_size (
int
): Size of the convolving kernel.stride (
int
, optional): Stride of the convolution. Default is 1.padding (
int
, optional): Zero-padding added to both sides of the input. Default is 0.output_padding (
int
, optional): Additional size added to one side of the output shape. Default is 0.groups (
int
, optional): Number of blocked connections from input channels to output channels. Default is 1.activation (
int
, optional): The optional activation function.norm_type (
int
, optional): Type of the normalization.
- Returns:
block (
nn.Sequential
): A sequential list containing the torch layers of the 2-dimensional transpose convolution layer.
Note
ConvTranspose2d (https://pytorch.org/docs/master/generated/torch.nn.ConvTranspose2d.html)
fc_block¶
- ding.torch_utils.network.nn_module.fc_block(in_channels: int, out_channels: int, activation: Module | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5) Sequential [source]¶
- Overview:
Create a fully-connected block with activation, normalization, and dropout. Optional normalization can be done to the dim 1 (across the channels). x -> fc -> norm -> act -> dropout -> out
- Arguments:
in_channels (
int
): Number of channels in the input tensor.out_channels (
int
): Number of channels in the output tensor.activation (
nn.Module
, optional): The optional activation function.norm_type (
str
, optional): Type of the normalization.use_dropout (
bool
, optional): Whether to use dropout in the fully-connected block. Default is False.dropout_probability (
float
, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.
- Returns:
block (
nn.Sequential
): A sequential list containing the torch layers of the fully-connected block.
Note
You can refer to nn.linear (https://pytorch.org/docs/master/generated/torch.nn.Linear.html).
normed_linear¶
- ding.torch_utils.network.nn_module.normed_linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None, scale: float = 1.0) Linear [source]¶
- Overview:
Create a nn.Linear module but with normalized fan-in init.
- Arguments:
in_features (
int
): Number of features in the input tensor.out_features (
int
): Number of features in the output tensor.bias (
bool
, optional): Whether to add a learnable bias to the nn.Linear. Default is True.device (
torch.device
, optional): The device to put the created module on. Default is None.dtype (
torch.dtype
, optional): The desired data type of created module. Default is None.scale (
float
, optional): The scale factor for initialization. Default is 1.0.
- Returns:
out (
nn.Linear
): A nn.Linear module with normalized fan-in init.
normed_conv2d¶
- ding.torch_utils.network.nn_module.normed_conv2d(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] = 1, padding: int | Tuple[int, int] = 0, dilation: int | Tuple[int, int] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None, scale: float = 1) Conv2d [source]¶
- Overview:
Create a nn.Conv2d module but with normalized fan-in init.
- Arguments:
in_channels (
int
): Number of channels in the input tensor.out_channels (
int
): Number of channels in the output tensor.kernel_size (
Union[int, Tuple[int, int]]
): Size of the convolving kernel.stride (
Union[int, Tuple[int, int]]
, optional): Stride of the convolution. Default is 1.padding (
Union[int, Tuple[int, int]]
, optional): Zero-padding added to both sides of the input. Default is 0.dilation (:Union[int, Tuple[int, int]], optional): Spacing between kernel elements. Default is 1.
groups (
int
, optional): Number of blocked connections from input channels to output channels. Default is 1.bias (
bool
, optional): Whether to add a learnable bias to the nn.Conv2d. Default is True.padding_mode (
str
, optional): The type of padding algorithm to use. Default is ‘zeros’.device (
torch.device
, optional): The device to put the created module on. Default is None.dtype (
torch.dtype
, optional): The desired data type of created module. Default is None.scale (
float
, optional): The scale factor for initialization. Default is 1.
- Returns:
out (
nn.Conv2d
): A nn.Conv2d module with normalized fan-in init.
MLP¶
- ding.torch_utils.network.nn_module.MLP(in_channels: int, hidden_channels: int, out_channels: int, layer_num: int, layer_fn: Callable | None = None, activation: Module | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5, output_activation: bool = True, output_norm: bool = True, last_linear_layer_init_zero: bool = False)[source]¶
- Overview:
Create a multi-layer perceptron using fully-connected blocks with activation, normalization, and dropout, optional normalization can be done to the dim 1 (across the channels). x -> fc -> norm -> act -> dropout -> out
- Arguments:
in_channels (
int
): Number of channels in the input tensor.hidden_channels (
int
): Number of channels in the hidden tensor.out_channels (
int
): Number of channels in the output tensor.layer_num (
int
): Number of layers.layer_fn (
Callable
, optional): Layer function.activation (
nn.Module
, optional): The optional activation function.norm_type (
str
, optional): The type of the normalization.use_dropout (
bool
, optional): Whether to use dropout in the fully-connected block. Default is False.dropout_probability (
float
, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.output_activation (
bool
, optional): Whether to use activation in the output layer. If True, we use the same activation as front layers. Default is True.output_norm (
bool
, optional): Whether to use normalization in the output layer. If True, we use the same normalization as front layers. Default is True.last_linear_layer_init_zero (
bool
, optional): Whether to use zero initializations for the last linear layer (including w and b), which can provide stable zero outputs in the beginning, usually used in the policy network in RL settings.
- Returns:
block (
nn.Sequential
): A sequential list containing the torch layers of the multi-layer perceptron.
Note
you can refer to nn.linear (https://pytorch.org/docs/master/generated/torch.nn.Linear.html).
ChannelShuffle¶
- class ding.torch_utils.network.nn_module.ChannelShuffle(group_num: int)[source]¶
- Overview:
Apply channel shuffle to the input tensor. For more details about the channel shuffle, please refer to the ‘ShuffleNet’ paper: https://arxiv.org/abs/1707.01083
- Interfaces:
__init__
,forward
- __init__(group_num: int) None [source]¶
- Overview:
Initialize the ChannelShuffle class.
- Arguments:
group_num (
int
): The number of groups to exchange.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Forward pass through the ChannelShuffle module.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
x (
torch.Tensor
): The shuffled input tensor.
- training: bool¶
one_hot¶
- ding.torch_utils.network.nn_module.one_hot(val: LongTensor, num: int, num_first: bool = False) FloatTensor [source]¶
- Overview:
Convert a torch.LongTensor to one-hot encoding. This implementation can be slightly faster than
torch.nn.functional.one_hot
.- Arguments:
val (
torch.LongTensor
): Each element contains the state to be encoded, the range should be [0, num-1]num (
int
): Number of states of the one-hot encodingnum_first (
bool
, optional): If False, the one-hot encoding is added as the last dimension; otherwise, it is added as the first dimension. Default is False.
- Returns:
one_hot (
torch.FloatTensor
): The one-hot encoded tensor.
- Example:
>>> one_hot(2*torch.ones([2,2]).long(),3) tensor([[[0., 0., 1.], [0., 0., 1.]], [[0., 0., 1.], [0., 0., 1.]]]) >>> one_hot(2*torch.ones([2,2]).long(),3,num_first=True) tensor([[[0., 0.], [1., 0.]], [[0., 1.], [0., 0.]], [[1., 0.], [0., 1.]]])
NearestUpsample¶
- class ding.torch_utils.network.nn_module.NearestUpsample(scale_factor: float | List[float])[source]¶
- Overview:
This module upsamples the input to the given scale_factor using the nearest mode.
- Interfaces:
__init__
,forward
- __init__(scale_factor: float | List[float]) None [source]¶
- Overview:
Initialize the NearestUpsample class.
- Arguments:
scale_factor (
Union[float, List[float]]
): The multiplier for the spatial size.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Return the upsampled input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
upsample(
torch.Tensor
): The upsampled input tensor.
- training: bool¶
BilinearUpsample¶
- class ding.torch_utils.network.nn_module.BilinearUpsample(scale_factor: float | List[float])[source]¶
- Overview:
This module upsamples the input to the given scale_factor using the bilinear mode.
- Interfaces:
__init__
,forward
- __init__(scale_factor: float | List[float]) None [source]¶
- Overview:
Initialize the BilinearUpsample class.
- Arguments:
scale_factor (
Union[float, List[float]]
): The multiplier for the spatial size.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Return the upsampled input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
upsample(
torch.Tensor
): The upsampled input tensor.
- training: bool¶
binary_encode¶
- ding.torch_utils.network.nn_module.binary_encode(y: Tensor, max_val: Tensor) Tensor [source]¶
- Overview:
Convert elements in a tensor to its binary representation.
- Arguments:
y (
torch.Tensor
): The tensor to be converted into its binary representation.max_val (
torch.Tensor
): The maximum value of the elements in the tensor.
- Returns:
binary (
torch.Tensor
): The input tensor in its binary representation.
- Example:
>>> binary_encode(torch.tensor([3,2]),torch.tensor(8)) tensor([[0, 0, 1, 1],[0, 0, 1, 0]])
NoiseLinearLayer¶
- class ding.torch_utils.network.nn_module.NoiseLinearLayer(in_channels: int, out_channels: int, sigma0: int = 0.4)[source]¶
- Overview:
This is a linear layer with random noise.
- Interfaces:
__init__
,reset_noise
,reset_parameters
,forward
- __init__(in_channels: int, out_channels: int, sigma0: int = 0.4) None [source]¶
- Overview:
Initialize the NoiseLinearLayer class.
- Arguments:
in_channels (
int
): Number of channels in the input tensor.out_channels (
int
): Number of channels in the output tensor.sigma0 (
int
, optional): Default noise volume when initializing NoiseLinearLayer. Default is 0.4.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _scale_noise(size: int | Tuple)[source]¶
- Overview:
Scale the noise.
- Arguments:
size (
Union[int, Tuple]
): The size of the noise.
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor)[source]¶
- Overview:
Perform the forward pass with noise.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
output (
torch.Tensor
): The output tensor with noise.
- training: bool¶
noise_block¶
- ding.torch_utils.network.nn_module.noise_block(in_channels: int, out_channels: int, activation: str | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5, sigma0: float = 0.4)[source]¶
- Overview:
Create a fully-connected noise layer with activation, normalization, and dropout. Optional normalization can be done to the dim 1 (across the channels).
- Arguments:
in_channels (
int
): Number of channels in the input tensor.out_channels (
int
): Number of channels in the output tensor.activation (
str
, optional): The optional activation function. Default is None.norm_type (
str
, optional): Type of normalization. Default is None.use_dropout (
bool
, optional): Whether to use dropout in the fully-connected block.dropout_probability (
float
, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.sigma0 (
float
, optional): The sigma0 is the default noise volume when initializing NoiseLinearLayer. Default is 0.4.
- Returns:
block (
nn.Sequential
): A sequential list containing the torch layers of the fully-connected block.
NaiveFlatten¶
- class ding.torch_utils.network.nn_module.NaiveFlatten(start_dim: int = 1, end_dim: int = -1)[source]¶
- Overview:
This module is a naive implementation of the flatten operation.
- Interfaces:
__init__
,forward
- __init__(start_dim: int = 1, end_dim: int = -1) None [source]¶
- Overview:
Initialize the NaiveFlatten class.
- Arguments:
start_dim (
int
, optional): The first dimension to flatten. Default is 1.end_dim (
int
, optional): The last dimension to flatten. Default is -1.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Perform the flatten operation on the input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
output (
torch.Tensor
): The flattened output tensor.
- training: bool¶
network.normalization¶
Please refer to ding/torch_utils/network/normalization
for more details.
build_normalization¶
- ding.torch_utils.network.normalization.build_normalization(norm_type: str, dim: int | None = None) Module [source]¶
- Overview:
Construct the corresponding normalization module. For beginners, refer to [this article](https://zhuanlan.zhihu.com/p/34879333) to learn more about batch normalization.
- Arguments:
norm_type (
str
): Type of the normalization. Currently supports [‘BN’, ‘LN’, ‘IN’, ‘SyncBN’].dim (
Optional[int]
): Dimension of the normalization, applicable when norm_type is in [‘BN’, ‘IN’].
- Returns:
norm_func (
nn.Module
): The corresponding batch normalization function.
network.popart¶
Please refer to ding/torch_utils/network/popart
for more details.
PopArt¶
- class ding.torch_utils.network.popart.PopArt(input_features: int | None = None, output_features: int | None = None, beta: float = 0.5)[source]¶
- Overview:
A linear layer with popart normalization. This class implements a linear transformation followed by PopArt normalization, which is a method to automatically adapt the contribution of each task to the agent’s updates in multi-task learning, as described in the paper <https://arxiv.org/abs/1809.04474>.
- Interfaces:
__init__
,reset_parameters
,forward
,update_parameters
- __init__(input_features: int | None = None, output_features: int | None = None, beta: float = 0.5) None [source]¶
- Overview:
Initialize the class with input features, output features, and the beta parameter.
- Arguments:
input_features (
Union[int, None]
): The size of each input sample.output_features (
Union[int, None]
): The size of each output sample.beta (
float
): The parameter for moving average.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Dict[str, Tensor] [source]¶
- Overview:
Implement the forward computation of the linear layer and return both the output and the normalized output of the layer.
- Arguments:
x (
torch.Tensor
): Input tensor which is to be normalized.
- Returns:
output (
Dict[str, torch.Tensor]
): A dictionary contains ‘pred’ and ‘unnormalized_pred’.
- reset_parameters()[source]¶
- Overview:
Reset the parameters including weights and bias using
kaiming_uniform_
anduniform_
initialization.
- training: bool¶
- update_parameters(value: Tensor) Dict[str, Tensor] [source]¶
- Overview:
Update the normalization parameters based on the given value and return the new mean and standard deviation after the update.
- Arguments:
value (
torch.Tensor
): The tensor to be used for updating parameters.
- Returns:
update_results (
Dict[str, torch.Tensor]
): A dictionary contains ‘new_mean’ and ‘new_std’.
network.res_block¶
Please refer to ding/torch_utils/network/res_block
for more details.
ResBlock¶
- class ding.torch_utils.network.res_block.ResBlock(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', res_type: str = 'basic', bias: bool = True, out_channels: int | None = None)[source]¶
- Overview:
- Residual Block with 2D convolution layers, including 3 types:
- basic block:
input channel: C x -> 3*3*C -> norm -> act -> 3*3*C -> norm -> act -> out __________________________________________/+
- bottleneck block:
x -> 1*1*(1/4*C) -> norm -> act -> 3*3*(1/4*C) -> norm -> act -> 1*1*C -> norm -> act -> out _____________________________________________________________________________/+
- downsample block: used in EfficientZero
input channel: C x -> 3*3*C -> norm -> act -> 3*3*C -> norm -> act -> out __________________ 3*3*C ____________________/+
Note
You can refer to Deep Residual Learning for Image Recognition for more details.
- Interfaces:
__init__
,forward
- __init__(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', res_type: str = 'basic', bias: bool = True, out_channels: int | None = None) None [source]¶
- Overview:
Init the 2D convolution residual block.
- Arguments:
in_channels (
int
): Number of channels in the input tensor.activation (
nn.Module
): The optional activation function.norm_type (
str
): Type of the normalization, default set to ‘BN’(Batch Normalization), supports [‘BN’, ‘LN’, ‘IN’, ‘GN’, ‘SyncBN’, None].res_type (
str
): Type of residual block, supports [‘basic’, ‘bottleneck’, ‘downsample’]bias (
bool
): Whether to add a learnable bias to the conv2d_block. default set to True.out_channels (
int
): Number of channels in the output tensor, default set to None, which means out_channels = in_channels.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Return the redisual block output.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
x (
torch.Tensor
): The resblock output tensor.
- training: bool¶
ResFCBlock¶
- class ding.torch_utils.network.res_block.ResFCBlock(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', dropout: float | None = None)[source]¶
- Overview:
Residual Block with 2 fully connected layers. x -> fc1 -> norm -> act -> fc2 -> norm -> act -> out _____________________________________/+
- Interfaces:
__init__
,forward
- __init__(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', dropout: float | None = None)[source]¶
- Overview:
Init the fully connected layer residual block.
- Arguments:
in_channels (
int
): The number of channels in the input tensor.activation (
nn.Module
): The optional activation function.norm_type (
str
): The type of the normalization, default set to ‘BN’.dropout (
float
): The dropout rate, default set to None.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Return the output of the redisual block.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
x (
torch.Tensor
): The resblock output tensor.
- training: bool¶
network.resnet¶
Please refer to ding/torch_utils/network/resnet
for more details.
to_2tuple¶
get_same_padding¶
- ding.torch_utils.network.resnet.get_same_padding(x: int, k: int, s: int, d: int) int [source]¶
- Overview:
Calculate asymmetric TensorFlow-like ‘SAME’ padding for a convolution.
- Arguments:
x (
int
): The size of the input.k (
int
): The size of the kernel.s (
int
): The stride of the convolution.d (
int
): The dilation of the convolution.
- Returns:
(
int
): The size of the padding.
pad_same¶
- ding.torch_utils.network.resnet.pad_same(x, k: List[int], s: List[int], d: List[int] = (1, 1), value: float = 0)[source]¶
- Overview:
Dynamically pad input x with ‘SAME’ padding for conv with specified args.
- Arguments:
x (
Tensor
): The input tensor.k (
List[int]
): The size of the kernel.s (
List[int]
): The stride of the convolution.d (
List[int]
): The dilation of the convolution.value (
float
): Value to fill the padding.
- Returns:
(
Tensor
): The padded tensor.
avg_pool2d_same¶
- ding.torch_utils.network.resnet.avg_pool2d_same(x, kernel_size: List[int], stride: List[int], padding: List[int] = (0, 0), ceil_mode: bool = False, count_include_pad: bool = True)[source]¶
- Overview:
Apply average pooling with ‘SAME’ padding on the input tensor.
- Arguments:
x (
Tensor
): The input tensor.kernel_size (
List[int]
): The size of the kernel.stride (
List[int]
): The stride of the convolution.padding (
List[int]
): The size of the padding.ceil_mode (
bool
): When True, will use ceil instead of floor to compute the output shape.count_include_pad (
bool
): When True, will include the zero-padding in the averaging calculation.
- Returns:
(
Tensor
): The tensor after average pooling.
AvgPool2dSame¶
- class ding.torch_utils.network.resnet.AvgPool2dSame(kernel_size: int, stride: Tuple[int, int] | None = None, padding: int = 0, ceil_mode: bool = False, count_include_pad: bool = True)[source]¶
- Overview:
Tensorflow-like ‘SAME’ wrapper for 2D average pooling.
- Interfaces:
__init__
,forward
- __init__(kernel_size: int, stride: Tuple[int, int] | None = None, padding: int = 0, ceil_mode: bool = False, count_include_pad: bool = True) None [source]¶
- Overview:
Initialize the AvgPool2dSame with given arguments.
- Arguments:
kernel_size (
int
): The size of the window to take an average over.stride (
Optional[Tuple[int, int]]
): The stride of the window. If None, default to kernel_size.padding (
int
): Implicit zero padding to be added on both sides.ceil_mode (
bool
): When True, will use ceil instead of floor to compute the output shape.count_include_pad (
bool
): When True, will include the zero-padding in the averaging calculation.
- ceil_mode: bool¶
- count_include_pad: bool¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Forward pass of the AvgPool2dSame.
- Argument:
x (
torch.Tensor
): Input tensor.
- Returns:
(
torch.Tensor
): Output tensor after average pooling.
- kernel_size: int | Tuple[int, int]¶
- padding: int | Tuple[int, int]¶
- stride: int | Tuple[int, int]¶
create_classifier¶
- ding.torch_utils.network.resnet.create_classifier(num_features: int, num_classes: int, pool_type: str = 'avg', use_conv: bool = False) Tuple[Module, Module] [source]¶
- Overview:
Create a classifier with global pooling layer and fully connected layer.
- Arguments:
num_features (
int
): The number of features.num_classes (
int
): The number of classes for the final classification.pool_type (
str
): The type of pooling to use; ‘avg’ for Average Pooling.use_conv (
bool
): Whether to use convolution or not.
- Returns:
global_pool (
nn.Module
): The created global pooling layer.fc (
nn.Module
): The created fully connected layer.
ClassifierHead¶
- class ding.torch_utils.network.resnet.ClassifierHead(in_chs: int, num_classes: int, pool_type: str = 'avg', drop_rate: float = 0.0, use_conv: bool = False)[source]¶
- Overview:
Classifier head with configurable global pooling and dropout.
- Interfaces:
__init__
,forward
- __init__(in_chs: int, num_classes: int, pool_type: str = 'avg', drop_rate: float = 0.0, use_conv: bool = False) None [source]¶
- Overview:
Initialize the ClassifierHead with given arguments.
- Arguments:
in_chs (
int
): Number of input channels.num_classes (
int
): Number of classes for the final classification.pool_type (
str
): The type of pooling to use; ‘avg’ for Average Pooling.drop_rate (
float
): The dropout rate.use_conv (
bool
): Whether to use convolution or not.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Forward pass of the ClassifierHead.
- Argument:
x (
torch.Tensor
): Input tensor.
- Returns:
(
torch.Tensor
): Output tensor after classification.
- training: bool¶
create_attn¶
get_padding¶
- ding.torch_utils.network.resnet.get_padding(kernel_size: int, stride: int, dilation: int = 1) int [source]¶
- Overview:
Compute the padding based on the kernel size, stride and dilation.
- Arguments:
kernel_size (
int
): The size of the kernel.stride (
int
): The stride of the convolution.dilation (
int
): The dilation factor.
- Returns:
padding (
int
): The computed padding.
BasicBlock¶
- class ding.torch_utils.network.resnet.BasicBlock(inplanes: int, planes: int, stride: int = 1, downsample: ~typing.Callable | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Callable = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Callable | None = None, aa_layer: ~typing.Callable | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None)[source]¶
- Overview:
The basic building block for models like ResNet. This class extends pytorch’s Module class. It represents a standard block of layers including two convolutions, batch normalization, an optional attention mechanism, and activation functions.
- Interfaces:
__init__
,forward
,zero_init_last_bn
- Properties:
expansion (:obj:int): Specifies the expansion factor for the planes of the conv layers.
- __init__(inplanes: int, planes: int, stride: int = 1, downsample: ~typing.Callable | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Callable = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Callable | None = None, aa_layer: ~typing.Callable | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None) None [source]¶
- Overview:
Initialize the BasicBlock with given parameters.
- Arguments:
inplanes (
int
): Number of input channels.planes (
int
): Number of output channels.stride (
int
): The stride of the convolutional layer.downsample (
Callable
): Function for downsampling the inputs.cardinality (
int
): Group size for grouped convolution.base_width (
int
): Base width of the convolutions.reduce_first (
int
): Reduction factor for first convolution of each block.dilation (
int
): Spacing between kernel points.first_dilation (
int
): First dilation value.act_layer (
Callable
): Function for activation layer.norm_layer (
Callable
): Function for normalization layer.attn_layer (
Callable
): Function for attention layer.aa_layer (
Callable
): Function for anti-aliasing layer.drop_block (
Callable
): Method for dropping block.drop_path (
Callable
): Method for dropping path.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- expansion = 1¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Defines the computation performed at every call.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
output (
torch.Tensor
): The output tensor after passing through the BasicBlock.
- training: bool¶
Bottleneck¶
- class ding.torch_utils.network.resnet.Bottleneck(inplanes: int, planes: int, stride: int = 1, downsample: ~torch.nn.modules.module.Module | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, aa_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None)[source]¶
- Overview:
The Bottleneck class is a basic block used to build ResNet networks. It is a part of the PyTorch’s implementation of ResNet. This block is designed with several layers including a convolutional layer, normalization layer, activation layer, attention layer, anti-aliasing layer, and a dropout layer.
- Interfaces:
__init__
,forward
,zero_init_last_bn
- Properties:
expansion, inplanes, planes, stride, downsample, cardinality, base_width, reduce_first, dilation, first_dilation, act_layer, norm_layer, attn_layer, aa_layer, drop_block, drop_path
- __init__(inplanes: int, planes: int, stride: int = 1, downsample: ~torch.nn.modules.module.Module | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, aa_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None) None [source]¶
- Overview:
Initialize the Bottleneck class with various parameters.
- Arguments:
inplanes (
int
): The number of input planes.planes (
int
): The number of output planes.stride (
int
, optional): The stride size, defaults to 1.downsample (
nn.Module
, optional): The downsample method, defaults to None.cardinality (
int
, optional): The size of the group convolutions, defaults to 1.base_width (
int
, optional): The base width, defaults to 64.reduce_first (
int
, optional): The first reduction factor, defaults to 1.dilation (
int
, optional): The dilation factor, defaults to 1.first_dilation (
int
, optional): The first dilation factor, defaults to None.act_layer (
Type[nn.Module]
, optional): The activation layer type, defaults to nn.ReLU.norm_layer (
Type[nn.Module]
, optional): The normalization layer type, defaults to nn.BatchNorm2d.attn_layer (
Type[nn.Module]
, optional): The attention layer type, defaults to None.aa_layer (
Type[nn.Module]
, optional): The anti-aliasing layer type, defaults to None.drop_block (
Callable
): The dropout block, defaults to None.drop_path (
Callable
): The drop path, defaults to None.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- expansion = 4¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Defines the computation performed at every call.
- Arguments:
x (
Tensor
): The input tensor.
- Returns:
x (
Tensor
): The output tensor resulting from the computation.
- training: bool¶
downsample_conv¶
- ding.torch_utils.network.resnet.downsample_conv(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, first_dilation: int | None = None, norm_layer: Type[Module] | None = None) Sequential [source]¶
- Overview:
Create a sequential module for downsampling that includes a convolution layer and a normalization layer.
- Arguments:
in_channels (
int
): The number of input channels.out_channels (
int
): The number of output channels.kernel_size (
int
): The size of the kernel.stride (
int
, optional): The stride size, defaults to 1.dilation (
int
, optional): The dilation factor, defaults to 1.first_dilation (
int
, optional): The first dilation factor, defaults to None.norm_layer (
Type[nn.Module]
, optional): The normalization layer type, defaults to nn.BatchNorm2d.
- Returns:
nn.Sequential: A sequence of layers performing downsampling through convolution.
downsample_avg¶
- ding.torch_utils.network.resnet.downsample_avg(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, first_dilation: int | None = None, norm_layer: Type[Module] | None = None) Sequential [source]¶
- Overview:
Create a sequential module for downsampling that includes an average pooling layer, a convolution layer, and a normalization layer.
- Arguments:
in_channels (
int
): The number of input channels.out_channels (
int
): The number of output channels.kernel_size (
int
): The size of the kernel.stride (
int
, optional): The stride size, defaults to 1.dilation (
int
, optional): The dilation factor, defaults to 1.first_dilation (
int
, optional): The first dilation factor, defaults to None.norm_layer (
Type[nn.Module]
, optional): The normalization layer type, defaults to nn.BatchNorm2d.
- Returns:
nn.Sequential: A sequence of layers performing downsampling through average pooling.
drop_blocks¶
make_blocks¶
- ding.torch_utils.network.resnet.make_blocks(block_fn: Type[Module], channels: List[int], block_repeats: List[int], inplanes: int, reduce_first: int = 1, output_stride: int = 32, down_kernel_size: int = 1, avg_down: bool = False, drop_block_rate: float = 0.0, drop_path_rate: float = 0.0, **kwargs) Tuple[List[Tuple[str, Module]], List[Dict[str, int | str]]] [source]¶
- Overview:
Create a list of blocks for the network, with each block having a given number of repeats. Also, create a feature info list that contains information about the output of each block.
- Arguments:
block_fn (
Type[nn.Module]
): The type of block to use.channels (
List[int]
): The list of output channels for each block.block_repeats (
List[int]
): The list of number of repeats for each block.inplanes (
int
): The number of input planes.reduce_first (
int
, optional): The first reduction factor, defaults to 1.output_stride (
int
, optional): The total stride of the network, defaults to 32.down_kernel_size (
int
, optional): The size of the downsample kernel, defaults to 1.avg_down (
bool
, optional): Whether to use average pooling for downsampling, defaults to False.drop_block_rate (
float
, optional): The drop block rate, defaults to 0.drop_path_rate (
float
, optional): The drop path rate, defaults to 0.
- Returns:
Tuple[List[Tuple[str, nn.Module]], List[Dict[str, Union[int, str]]]]: A tuple that includes a list of blocks for the network and a feature info list.
ResNet¶
- class ding.torch_utils.network.resnet.ResNet(block: ~torch.nn.modules.module.Module, layers: ~typing.List[int], num_classes: int = 1000, in_chans: int = 3, cardinality: int = 1, base_width: int = 64, stem_width: int = 64, stem_type: str = '', replace_stem_pool: bool = False, output_stride: int = 32, block_reduce_first: int = 1, down_kernel_size: int = 1, avg_down: bool = False, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, aa_layer: ~torch.nn.modules.module.Module | None = None, drop_rate: float = 0.0, drop_path_rate: float = 0.0, drop_block_rate: float = 0.0, global_pool: str = 'avg', zero_init_last_bn: bool = True, block_args: dict | None = None)[source]¶
- Overview:
Implements ResNet, ResNeXt, SE-ResNeXt, and SENet models. This implementation supports various modifications based on the v1c, v1d, v1e, and v1s variants included in the MXNet Gluon ResNetV1b model. For more details about the variants and options, please refer to the ‘Bag of Tricks’ paper: https://arxiv.org/pdf/1812.01187.
- Interfaces:
__init__
,forward
,zero_init_last_bn
,get_classifier
- __init__(block: ~torch.nn.modules.module.Module, layers: ~typing.List[int], num_classes: int = 1000, in_chans: int = 3, cardinality: int = 1, base_width: int = 64, stem_width: int = 64, stem_type: str = '', replace_stem_pool: bool = False, output_stride: int = 32, block_reduce_first: int = 1, down_kernel_size: int = 1, avg_down: bool = False, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, aa_layer: ~torch.nn.modules.module.Module | None = None, drop_rate: float = 0.0, drop_path_rate: float = 0.0, drop_block_rate: float = 0.0, global_pool: str = 'avg', zero_init_last_bn: bool = True, block_args: dict | None = None) None [source]¶
- Overview:
Initialize the ResNet model with given block, layers and other configuration options.
- Arguments:
block (
nn.Module
): Class for the residual block.layers (
List[int]
): Numbers of layers in each block.num_classes (
int
, optional): Number of classification classes. Default is 1000.in_chans (
int
, optional): Number of input (color) channels. Default is 3.cardinality (
int
, optional): Number of convolution groups for 3x3 conv in Bottleneck. Default is 1.base_width (
int
, optional): Factor determining bottleneck channels. Default is 64.stem_width (
int
, optional): Number of channels in stem convolutions. Default is 64.stem_type (
str
, optional): The type of stem. Default is ‘’.replace_stem_pool (
bool
, optional): Whether to replace stem pooling. Default is False.output_stride (
int
, optional): Output stride of the network. Default is 32.block_reduce_first (
int
, optional): Reduction factor for first convolution output width of residual blocks. Default is 1.down_kernel_size (
int
, optional): Kernel size of residual block downsampling path. Default is 1.- avg_down (
bool
, optional): Whether to use average pooling for projection skip connection between stages/downsample. Default is False.
- avg_down (
act_layer (
nn.Module
, optional): Activation layer. Default is nn.ReLU.norm_layer (
nn.Module
, optional): Normalization layer. Default is nn.BatchNorm2d.aa_layer (
Optional[nn.Module]
, optional): Anti-aliasing layer. Default is None.drop_rate (
float
, optional): Dropout probability before classifier, for training. Default is 0.0.drop_path_rate (
float
, optional): Drop path rate. Default is 0.0.drop_block_rate (
float
, optional): Drop block rate. Default is 0.0.global_pool (
str
, optional): Global pooling type. Default is ‘avg’.zero_init_last_bn (
bool
, optional): Whether to initialize last batch normalization with zero. Default is True.block_args (
Optional[dict]
, optional): Additional arguments for block. Default is None.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Full forward pass through the model.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
x (
torch.Tensor
): The output tensor after passing through the model.
- forward_features(x: Tensor) Tensor [source]¶
- Overview:
Forward pass through the feature layers of the model.
- Arguments:
x (
torch.Tensor
): The input tensor.
- Returns:
x (
torch.Tensor
): The output tensor after passing through feature layers.
- get_classifier() Module [source]¶
- Overview:
Get the classifier module from the model.
- Returns:
classifier (
nn.Module
): The classifier module in the model.
- init_weights(zero_init_last_bn: bool = True) None [source]¶
- Overview:
Initialize the weights in the model.
- Arguments:
- zero_init_last_bn (
bool
, optional): Whether to initialize last batch normalization with zero. Default is True.
- zero_init_last_bn (
- reset_classifier(num_classes: int, global_pool: str = 'avg') None [source]¶
- Overview:
Reset the classifier with a new number of classes and pooling type.
- Arguments:
num_classes (
int
): New number of classification classes.global_pool (
str
, optional): New global pooling type. Default is ‘avg’.
- training: bool¶
resnet18¶
network.rnn¶
Please refer to ding/torch_utils/network/rnn
for more details.
is_sequence¶
sequence_mask¶
- ding.torch_utils.network.rnn.sequence_mask(lengths: Tensor, max_len: int | None = None) BoolTensor [source]¶
- Overview:
Generates a boolean mask for a batch of sequences with differing lengths.
- Arguments:
lengths (
torch.Tensor
): A tensor with the lengths of each sequence. Shape could be (n, 1) or (n).max_len (
int
, optional): The padding size. If max_len is None, the padding size is the max length of sequences.
- Returns:
masks (
torch.BoolTensor
): A boolean mask tensor. The mask has the same device as lengths.
LSTMForwardWrapper¶
- class ding.torch_utils.network.rnn.LSTMForwardWrapper[source]¶
- Overview:
Class providing methods to use before and after the LSTM forward method. Wraps the LSTM forward method.
- Interfaces:
_before_forward
,_after_forward
- _after_forward(next_state: Tuple[Tensor], list_next_state: bool = False) List[Dict] | Dict[str, Tensor] [source]¶
- Overview:
Post-processes the next_state after the LSTM forward method.
- Arguments:
next_state (
Tuple[torch.Tensor]
): Tuple containing the next state (h, c).list_next_state (
bool
, optional): Determines the format of the returned next_state. If True, returns next_state in list format. Default is False.
- Returns:
next_state(
Union[List[Dict], Dict[str, torch.Tensor]]
): The post-processed next_state.
- _before_forward(inputs: Tensor, prev_state: None | List[Dict]) Tensor [source]¶
- Overview:
Preprocesses the inputs and previous states before the LSTM forward method.
- Arguments:
inputs (
torch.Tensor
): Input vector of the LSTM cell. Shape: [seq_len, batch_size, input_size]prev_state (
Union[None, List[Dict]]
): Previous state tensor. Shape: [num_directions*num_layers, batch_size, hidden_size]. If None, prv_state will be initialized to all zeros.
- Returns:
prev_state (
torch.Tensor
): Preprocessed previous state for the LSTM batch.
LSTM¶
- class ding.torch_utils.network.rnn.LSTM(input_size: int, hidden_size: int, num_layers: int, norm_type: str | None = None, dropout: float = 0.0)[source]¶
- Overview:
Implementation of an LSTM cell with Layer Normalization (LN).
- Interfaces:
__init__
,forward
Note
For a primer on LSTM, refer to https://zhuanlan.zhihu.com/p/32085405.
- __init__(input_size: int, hidden_size: int, num_layers: int, norm_type: str | None = None, dropout: float = 0.0) None [source]¶
- Overview:
Initialize LSTM cell parameters.
- Arguments:
input_size (
int
): Size of the input vector.hidden_size (
int
): Size of the hidden state vector.num_layers (
int
): Number of LSTM layers.norm_type (
Optional[str]
): Normalization type, default is None.dropout (
float
): Dropout rate, default is 0.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(inputs: Tensor, prev_state: Tensor, list_next_state: bool = True) Tuple[Tensor, Tensor | list] [source]¶
- Overview:
Compute output and next state given previous state and input.
- Arguments:
inputs (
torch.Tensor
): Input vector of cell, size [seq_len, batch_size, input_size].prev_state (
torch.Tensor
): Previous state, size [num_directions*num_layers, batch_size, hidden_size].list_next_state (
bool
): Whether to return next_state in list format, default is True.
- Returns:
x (
torch.Tensor
): Output from LSTM.next_state (
Union[torch.Tensor, list]
): Hidden state from LSTM.
- training: bool¶
PytorchLSTM¶
- class ding.torch_utils.network.rnn.PytorchLSTM(input_size: int, hidden_size: int, num_layers: int = 1, bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, proj_size: int = 0, device=None, dtype=None)[source]¶
- class ding.torch_utils.network.rnn.PytorchLSTM(*args, **kwargs)
- Overview:
Wrapper class for PyTorch’s nn.LSTM, formats the input and output. For more details on nn.LSTM, refer to https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM
- Interfaces:
forward
- batch_first: bool¶
- bias: bool¶
- bidirectional: bool¶
- dropout: float¶
- forward(inputs: Tensor, prev_state: Tensor, list_next_state: bool = True) Tuple[Tensor, Tensor | list] [source]¶
- Overview:
Executes nn.LSTM.forward with preprocessed input.
- Arguments:
inputs (
torch.Tensor
): Input vector of cell, size [seq_len, batch_size, input_size].prev_state (
torch.Tensor
): Previous state, size [num_directions*num_layers, batch_size, hidden_size].list_next_state (
bool
): Whether to return next_state in list format, default is True.
- Returns:
output (
torch.Tensor
): Output from LSTM.next_state (
Union[torch.Tensor, list]
): Hidden state from LSTM.
- input_size: int¶
- mode: str¶
- num_layers: int¶
- proj_size: int¶
GRU¶
- class ding.torch_utils.network.rnn.GRU(input_size: int, hidden_size: int, num_layers: int)[source]¶
- Overview:
This class extends the torch.nn.GRUCell and LSTMForwardWrapper classes, and formats inputs and outputs accordingly.
- Interfaces:
__init__
,forward
- Properties:
hidden_size, num_layers
Note
For further details, refer to the official PyTorch documentation: <https://pytorch.org/docs/stable/generated/torch.nn.GRU.html#torch.nn.GRU>
- __init__(input_size: int, hidden_size: int, num_layers: int) None [source]¶
- Overview:
Initialize the GRU class with input size, hidden size, and number of layers.
- Arguments:
input_size (
int
): The size of the input vector.hidden_size (
int
): The size of the hidden state vector.num_layers (
int
): The number of GRU layers.
- bias: bool¶
- forward(inputs: Tensor, prev_state: Tensor | None = None, list_next_state: bool = True) Tuple[Tensor, Tensor | List] [source]¶
- Overview:
Wrap the nn.GRU.forward method.
- Arguments:
inputs (
torch.Tensor
): Input vector of cell, tensor of size [seq_len, batch_size, input_size].prev_state (
Optional[torch.Tensor]
): None or tensor of size [num_directions*num_layers, batch_size, hidden_size].list_next_state (
bool
): Whether to return next_state in list format (default is True).
- Returns:
output (
torch.Tensor
): Output from GRU.next_state (
torch.Tensor
orlist
): Hidden state from GRU.
- input_size: int¶
- weight_hh: Tensor¶
- weight_ih: Tensor¶
get_lstm¶
- ding.torch_utils.network.rnn.get_lstm(lstm_type: str, input_size: int, hidden_size: int, num_layers: int = 1, norm_type: str = 'LN', dropout: float = 0.0, seq_len: int | None = None, batch_size: int | None = None) LSTM | PytorchLSTM [source]¶
- Overview:
Build and return the corresponding LSTM cell based on the provided parameters.
- Arguments:
lstm_type (
str
): Version of RNN cell. Supported options are [‘normal’, ‘pytorch’, ‘hpc’, ‘gru’].input_size (
int
): Size of the input vector.hidden_size (
int
): Size of the hidden state vector.num_layers (
int
): Number of LSTM layers (default is 1).norm_type (
str
): Type of normalization (default is ‘LN’).dropout (
float
): Dropout rate (default is 0.0).seq_len (
Optional[int]
): Sequence length (default is None).batch_size (
Optional[int]
): Batch size (default is None).
- Returns:
lstm (
Union[LSTM, PytorchLSTM]
): The corresponding LSTM cell.
network.scatter_connection¶
Please refer to ding/torch_utils/network/scatter_connection
for more details.
shape_fn_scatter_connection¶
- ding.torch_utils.network.scatter_connection.shape_fn_scatter_connection(args, kwargs) List[int] [source]¶
- Overview:
Return the shape of scatter_connection for HPC.
- Arguments:
args (
Tuple
): The arguments passed to the scatter_connection function.kwargs (
Dict
): The keyword arguments passed to the scatter_connection function.
- Returns:
shape (
List[int]
): A list representing the shape of scatter_connection, in the form of [B, M, N, H, W, scatter_type].
ScatterConnection¶
- class ding.torch_utils.network.scatter_connection.ScatterConnection(scatter_type: str)[source]¶
- Overview:
Scatter feature to its corresponding location. In AlphaStar, each entity is embedded into a tensor, and these tensors are scattered into a feature map with map size.
- Interfaces:
__init__
,forward
,xy_forward
- __init__(scatter_type: str) None [source]¶
- Overview:
Initialize the ScatterConnection object.
- Arguments:
scatter_type (
str
): The scatter type, which decides the behavior when two entities have the same location. It can be either ‘add’ or ‘cover’. If ‘add’, the first one will be added to the second one. If ‘cover’, the first one will be covered by the second one.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, spatial_size: Tuple[int, int], location: Tensor) Tensor [source]¶
- Overview:
Scatter input tensor ‘x’ into a spatial feature map.
- Arguments:
x (
torch.Tensor
): The input tensor of shape (B, M, N), where B is the batch size, M is the number of entities, and N is the dimension of entity attributes.spatial_size (
Tuple[int, int]
): The size (H, W) of the spatial feature map into which ‘x’ will be scattered, where H is the height and W is the width.location (
torch.Tensor
): The tensor of locations of shape (B, M, 2). Each location should be (y, x).
- Returns:
output (
torch.Tensor
): The scattered feature map of shape (B, N, H, W).
- Note:
When there are some overlapping in locations, ‘cover’ mode will result in the loss of information. ‘add’ mode is used as a temporary substitute.
- training: bool¶
- xy_forward(x: Tensor, spatial_size: Tuple[int, int], coord_x: Tensor, coord_y) Tensor [source]¶
- Overview:
Scatter input tensor ‘x’ into a spatial feature map using separate x and y coordinates.
- Arguments:
x (
torch.Tensor
): The input tensor of shape (B, M, N), where B is the batch size, M is the number of entities, and N is the dimension of entity attributes.spatial_size (
Tuple[int, int]
): The size (H, W) of the spatial feature map into which ‘x’ will be scattered, where H is the height and W is the width.coord_x (
torch.Tensor
): The x-coordinates tensor of shape (B, M).coord_y (
torch.Tensor
): The y-coordinates tensor of shape (B, M).
- Returns:
output (
torch.Tensor
): The scattered feature map of shape (B, N, H, W).
- Note:
When there are some overlapping in locations, ‘cover’ mode will result in the loss of information. ‘add’ mode is used as a temporary substitute.
network.soft_argmax¶
Please refer to ding/torch_utils/network/soft_argmax
for more details.
SoftArgmax¶
- class ding.torch_utils.network.soft_argmax.SoftArgmax[source]¶
- Overview:
A neural network module that computes the SoftArgmax operation (essentially a 2-dimensional spatial softmax), which is often used for location regression tasks. It converts a feature map (such as a heatmap) into precise coordinate locations.
- Interfaces:
__init__
,forward
Note
For more information on SoftArgmax, you can refer to <https://en.wikipedia.org/wiki/Softmax_function> and the paper <https://arxiv.org/pdf/1504.00702.pdf>.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor) Tensor [source]¶
- Overview:
Perform the forward pass of the SoftArgmax operation.
- Arguments:
x (
torch.Tensor
): The input tensor, typically a heatmap representing predicted locations.
- Returns:
location (
torch.Tensor
): The predicted coordinates as a result of the SoftArgmax operation.
- Shapes:
x: \((B, C, H, W)\), where B is the batch size, C is the number of channels, and H and W represent height and width respectively.
location: \((B, 2)\), where B is the batch size and 2 represents the coordinates (height, width).
- training: bool¶
network.transformer¶
Please refer to ding/torch_utils/network/transformer
for more details.
Attention¶
- class ding.torch_utils.network.transformer.Attention(input_dim: int, head_dim: int, output_dim: int, head_num: int, dropout: Module)[source]¶
- Overview:
For each entry embedding, compute individual attention across all entries, add them up to get output attention.
- Interfaces:
__init__
,split
,forward
- __init__(input_dim: int, head_dim: int, output_dim: int, head_num: int, dropout: Module) None [source]¶
- Overview:
Initialize the Attention module with the provided dimensions and dropout layer.
- Arguments:
input_dim (
int
): The dimension of the input.head_dim (
int
): The dimension of each head in the multi-head attention mechanism.output_dim (
int
): The dimension of the output.head_num (
int
): The number of heads in the multi-head attention mechanism.dropout (
nn.Module
): The dropout layer used in the attention mechanism.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, mask: Tensor | None = None) Tensor [source]¶
- Overview:
Compute the attention from the input tensor.
- Arguments:
x (
torch.Tensor
): The input tensor for the forward computation.- mask (
Optional[torch.Tensor]
, optional): Optional mask to exclude invalid entries. Defaults to None.
- mask (
- Returns:
attention (
torch.Tensor
): The computed attention tensor.
- split(x: Tensor, T: bool = False) List[Tensor] [source]¶
- Overview:
Split the input to get multi-head queries, keys, and values.
- Arguments:
x (
torch.Tensor
): The tensor to be split, which could be a query, key, or value.T (
bool
, optional): If True, transpose the output tensors. Defaults to False.
- Returns:
x (
List[torch.Tensor]
): A list of output tensors for each head.
- training: bool¶
TransformerLayer¶
- class ding.torch_utils.network.transformer.TransformerLayer(input_dim: int, head_dim: int, hidden_dim: int, output_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module)[source]¶
- Overview:
In transformer layer, first computes entries’s attention and applies a feedforward layer.
- Interfaces:
__init__
,forward
- __init__(input_dim: int, head_dim: int, hidden_dim: int, output_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module) None [source]¶
- Overview:
Initialize the TransformerLayer with the provided dimensions, dropout layer, and activation function.
- Arguments:
input_dim (
int
): The dimension of the input.head_dim (
int
): The dimension of each head in the multi-head attention mechanism.hidden_dim (
int
): The dimension of the hidden layer in the MLP (Multi-Layer Perceptron).output_dim (
int
): The dimension of the output.head_num (
int
): The number of heads in the multi-head attention mechanism.mlp_num (
int
): The number of layers in the MLP.dropout (
nn.Module
): The dropout layer used in the attention mechanism.activation (
nn.Module
): The activation function used in the MLP.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(inputs: Tuple[Tensor, Tensor]) Tuple[Tensor, Tensor] [source]¶
- Overview:
Compute the forward pass through the Transformer layer.
- Arguments:
- inputs (
Tuple[torch.Tensor, torch.Tensor]
): A tuple containing the input tensor x and the mask tensor.
- inputs (
- Returns:
- output (
Tuple[torch.Tensor, torch.Tensor]
): A tuple containing the predicted value tensor and the mask tensor.
- output (
- training: bool¶
Transformer¶
- class ding.torch_utils.network.transformer.Transformer(input_dim: int, head_dim: int = 128, hidden_dim: int = 1024, output_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, dropout_ratio: float = 0.0, activation: Module = ReLU())[source]¶
- Overview:
Implementation of the Transformer model.
Note
For more details, refer to “Attention is All You Need”: http://arxiv.org/abs/1706.03762.
- Interfaces:
__init__
,forward
- __init__(input_dim: int, head_dim: int = 128, hidden_dim: int = 1024, output_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, dropout_ratio: float = 0.0, activation: Module = ReLU())[source]¶
- Overview:
Initialize the Transformer with the provided dimensions, dropout layer, activation function, and layer numbers.
- Arguments:
input_dim (
int
): The dimension of the input.head_dim (
int
): The dimension of each head in the multi-head attention mechanism.hidden_dim (
int
): The dimension of the hidden layer in the MLP (Multi-Layer Perceptron).output_dim (
int
): The dimension of the output.head_num (
int
): The number of heads in the multi-head attention mechanism.mlp_num (
int
): The number of layers in the MLP.layer_num (
int
): The number of Transformer layers.dropout_ratio (
float
): The dropout ratio for the dropout layer.activation (
nn.Module
): The activation function used in the MLP.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(x: Tensor, mask: Tensor | None = None) Tensor [source]¶
- Overview:
Perform the forward pass through the Transformer.
- Arguments:
x (
torch.Tensor
): The input tensor, with shape (B, N, C), where B is batch size, N is the number of entries, and C is the feature dimension.mask (
Optional[torch.Tensor]
, optional): The mask tensor (bool), used to mask out invalid entries in attention. It has shape (B, N), where B is batch size and N is number of entries. Defaults to None.
- Returns:
x (
torch.Tensor
): The output tensor from the Transformer.
- training: bool¶
ScaledDotProductAttention¶
- class ding.torch_utils.network.transformer.ScaledDotProductAttention(d_k: int, dropout: float = 0.0)[source]¶
- Overview:
Implementation of Scaled Dot Product Attention, a key component of Transformer models. This class performs the dot product of the query, key and value tensors, scales it with the square root of the dimension of the key vector (d_k) and applies dropout for regularization.
- Interfaces:
__init__
,forward
- __init__(d_k: int, dropout: float = 0.0) None [source]¶
- Overview:
Initialize the ScaledDotProductAttention module with the dimension of the key vector and the dropout rate.
- Arguments:
d_k (
int
): The dimension of the key vector. This will be used to scale the dot product of the query and key.dropout (
float
, optional): The dropout rate to be applied after the softmax operation. Defaults to 0.0.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward(q: Tensor, k: Tensor, v: Tensor, mask: Tensor | None = None) Tensor [source]¶
- Overview:
Perform the Scaled Dot Product Attention operation on the query, key and value tensors.
- Arguments:
q (
torch.Tensor
): The query tensor.k (
torch.Tensor
): The key tensor.v (
torch.Tensor
): The value tensor.- mask (
Optional[torch.Tensor]
): An optional mask tensor to be applied on the attention scores. Defaults to None.
- mask (
- Returns:
output (
torch.Tensor
): The output tensor after the attention operation.
- training: bool¶
backend_helper¶
Please refer to ding/torch_utils/backend_helper
for more details.
enable_tf32¶
- ding.torch_utils.backend_helper.enable_tf32() None [source]¶
- Overview:
Enable tf32 on matmul and cudnn for faster computation. This only works on Ampere GPU devices. For detailed information, please refer to: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices.
checkpoint_helper¶
Please refer to ding/torch_utils/checkpoint_helper
for more details.
build_checkpoint_helper¶
- ding.torch_utils.checkpoint_helper.build_checkpoint_helper(cfg)[source]¶
- Overview:
Use config to build checkpoint helper.
- Arguments:
cfg (
dict
): ckpt_helper config
- Returns:
(
CheckpointHelper
): checkpoint_helper created by this function
CheckpointHelper¶
- class ding.torch_utils.checkpoint_helper.CheckpointHelper[source]¶
- Overview:
Help to save or load checkpoint by give args.
- Interfaces:
__init__
,save
,load
,_remove_prefix
,_add_prefix
,_load_matched_model_state_dict
- _add_prefix(state_dict: dict, prefix: str = 'module.') dict [source]¶
- Overview:
Add prefix in state_dict
- Arguments:
state_dict (
dict
): model’s state_dictprefix (
str
): this prefix will be added in keys
- Returns:
(
dict
): new state_dict after adding prefix
- _load_matched_model_state_dict(model: Module, ckpt_state_dict: dict) None [source]¶
- Overview:
Load matched model state_dict, and show mismatch keys between model’s state_dict and checkpoint’s state_dict
- Arguments:
model (
torch.nn.Module
): modelckpt_state_dict (
dict
): checkpoint’s state_dict
- _remove_prefix(state_dict: dict, prefix: str = 'module.') dict [source]¶
- Overview:
Remove prefix in state_dict
- Arguments:
state_dict (
dict
): model’s state_dictprefix (
str
): this prefix will be removed in keys
- Returns:
new_state_dict (
dict
): new state_dict after removing prefix
- load(load_path: str, model: Module, optimizer: Optimizer = None, last_iter: CountVar = None, last_epoch: CountVar = None, last_frame: CountVar = None, lr_schduler: Scheduler = None, dataset: Dataset = None, collector_info: Module = None, prefix_op: str = None, prefix: str = None, strict: bool = True, logger_prefix: str = '', state_dict_mask: list = [])[source]¶
- Overview:
Load checkpoint by given path
- Arguments:
load_path (
str
): checkpoint’s pathmodel (
torch.nn.Module
): model definitionoptimizer (
torch.optim.Optimizer
): optimizer objlast_iter (
CountVar
): iter num, default Nonelast_epoch (
CountVar
): epoch num, default Nonelast_frame (
CountVar
): frame num, default Nonelr_schduler (
Schduler
): lr_schduler objdataset (
torch.utils.data.Dataset
): dataset, should be replaydatasetcollector_info (
torch.nn.Module
): attr of checkpoint, save collector infoprefix_op (
str
): should be [‘remove’, ‘add’], process on state_dictprefix (
str
): prefix to be processed on state_dictstrict (
bool
): args of model.load_state_dictlogger_prefix (
str
): prefix of loggerstate_dict_mask (
list
): A list containing state_dict keys, which shouldn’t be loaded into model(after prefix op)
Note
The checkpoint loaded from load_path is a dict, whose format is like ‘{‘state_dict’: OrderedDict(), …}’
- save(path: str, model: Module, optimizer: Optimizer | None = None, last_iter: CountVar | None = None, last_epoch: CountVar | None = None, last_frame: CountVar | None = None, dataset: Dataset | None = None, collector_info: Module | None = None, prefix_op: str | None = None, prefix: str | None = None) None [source]¶
- Overview:
Save checkpoint by given args
- Arguments:
path (
str
): the path of saving checkpointmodel (
torch.nn.Module
): model to be savedoptimizer (
torch.optim.Optimizer
): optimizer objlast_iter (
CountVar
): iter num, default Nonelast_epoch (
CountVar
): epoch num, default Nonelast_frame (
CountVar
): frame num, default Nonedataset (
torch.utils.data.Dataset
): dataset, should be replaydatasetcollector_info (
torch.nn.Module
): attr of checkpoint, save collector infoprefix_op (
str
): should be [‘remove’, ‘add’], process on state_dictprefix (
str
): prefix to be processed on state_dict
CountVar¶
- class ding.torch_utils.checkpoint_helper.CountVar(init_val: int)[source]¶
- Overview:
Number counter
- Interfaces:
__init__
,update
,add
- Properties:
val (
int
): the value of the counter
- __init__(init_val: int) None [source]¶
- Overview:
Init the var counter
- Arguments:
init_val (
int
): the init value of the counter
- add(add_num: int)[source]¶
- Overview:
Add the number to counter
- Arguments:
add_num (
int
): the number added to the counter
- update(val: int) None [source]¶
- Overview:
Update the var counter
- Arguments:
val (
int
): the update value of the counter
- property val: int¶
- Overview:
Get the var counter
auto_checkpoint¶
- ding.torch_utils.checkpoint_helper.auto_checkpoint(func: Callable) Callable [source]¶
- Overview:
Create a wrapper to wrap function, and the wrapper will call the save_checkpoint method whenever an exception happens.
- Arguments:
func(
Callable
): the function to be wrapped
- Returns:
wrapper (
Callable
): the wrapped function
data_helper¶
Please refer to ding/torch_utils/data_helper
for more details.
to_device¶
- ding.torch_utils.data_helper.to_device(item: Any, device: str, ignore_keys: list = []) Any [source]¶
- Overview:
Transfer data to certain device.
- Arguments:
item (
Any
): The item to be transferred.device (
str
): The device wanted.ignore_keys (
list
): The keys to be ignored in transfer, default set to empty.
- Returns:
item (
Any
): The transferred item.
- Examples:
>>> setup_data_dict['module'] = nn.Linear(3, 5) >>> device = 'cuda' >>> cuda_d = to_device(setup_data_dict, device, ignore_keys=['module']) >>> assert cuda_d['module'].weight.device == torch.device('cpu')
- Examples:
>>> setup_data_dict['module'] = nn.Linear(3, 5) >>> device = 'cuda' >>> cuda_d = to_device(setup_data_dict, device) >>> assert cuda_d['module'].weight.device == torch.device('cuda:0')
to_dtype¶
- ding.torch_utils.data_helper.to_dtype(item: Any, dtype: type) Any [source]¶
- Overview:
Change data to certain dtype.
- Arguments:
item (
Any
): The item for changing the dtype.dtype (
type
): The type wanted.
- Returns:
item (
object
): The item with changed dtype.
- Examples (tensor):
>>> t = torch.randint(0, 10, (3, 5)) >>> tfloat = to_dtype(t, torch.float) >>> assert tfloat.dtype == torch.float
- Examples (list):
>>> tlist = [torch.randint(0, 10, (3, 5))] >>> tlfloat = to_dtype(tlist, torch.float) >>> assert tlfloat[0].dtype == torch.float
- Examples (dict):
>>> tdict = {'t': torch.randint(0, 10, (3, 5))} >>> tdictf = to_dtype(tdict, torch.float) >>> assert tdictf['t'].dtype == torch.float
to_tensor¶
- ding.torch_utils.data_helper.to_tensor(item: Any, dtype: dtype | None = None, ignore_keys: list = [], transform_scalar: bool = True) Any [source]¶
- Overview:
Convert
numpy.ndarray
object totorch.Tensor
.- Arguments:
item (
Any
): Thenumpy.ndarray
objects to be converted. It can be exactly anumpy.ndarray
object or a container (list, tuple or dict) that contains severalnumpy.ndarray
objects.dtype (
torch.dtype
): The type of wanted tensor. If set toNone
, its dtype will be unchanged.ignore_keys (
list
): If theitem
is a dict, values whose keys are inignore_keys
will not be converted.transform_scalar (
bool
): If set toTrue
, a scalar will be also converted to a tensor object.
- Returns:
item (
Any
): The converted tensors.
- Examples (scalar):
>>> i = 10 >>> t = to_tensor(i) >>> assert t.item() == i
- Examples (dict):
>>> d = {'i': i} >>> dt = to_tensor(d, torch.int) >>> assert dt['i'].item() == i
- Examples (named tuple):
>>> data_type = namedtuple('data_type', ['x', 'y']) >>> inputs = data_type(np.random.random(3), 4) >>> outputs = to_tensor(inputs, torch.float32) >>> assert type(outputs) == data_type >>> assert isinstance(outputs.x, torch.Tensor) >>> assert isinstance(outputs.y, torch.Tensor) >>> assert outputs.x.dtype == torch.float32 >>> assert outputs.y.dtype == torch.float32
to_ndarray¶
- ding.torch_utils.data_helper.to_ndarray(item: Any, dtype: dtype | None = None) Any [source]¶
- Overview:
Convert
torch.Tensor
tonumpy.ndarray
.- Arguments:
item (
Any
): Thetorch.Tensor
objects to be converted. It can be exactly atorch.Tensor
object or a container (list, tuple or dict) that contains severaltorch.Tensor
objects.dtype (
np.dtype
): The type of wanted array. If set toNone
, its dtype will be unchanged.
- Returns:
item (
object
): The changed arrays.
- Examples (ndarray):
>>> t = torch.randn(3, 5) >>> tarray1 = to_ndarray(t) >>> assert tarray1.shape == (3, 5) >>> assert isinstance(tarray1, np.ndarray)
- Examples (list):
>>> t = [torch.randn(5, ) for i in range(3)] >>> tarray1 = to_ndarray(t, np.float32) >>> assert isinstance(tarray1, list) >>> assert tarray1[0].shape == (5, ) >>> assert isinstance(tarray1[0], np.ndarray)
to_list¶
- ding.torch_utils.data_helper.to_list(item: Any) Any [source]¶
- Overview:
Convert
torch.Tensor
,numpy.ndarray
objects tolist
objects, and keep their dtypes unchanged.- Arguments:
item (
Any
): The item to be converted.
- Returns:
item (
Any
): The list after conversion.
- Examples:
>>> data = { 'tensor': torch.randn(4), 'list': [True, False, False], 'tuple': (4, 5, 6), 'bool': True, 'int': 10, 'float': 10., 'array': np.random.randn(4), 'str': "asdf", 'none': None, } >>> transformed_data = to_list(data)
Note
Now supports item type:
torch.Tensor
,numpy.ndarray
,dict
,list
,tuple
andNone
.
tensor_to_list¶
- ding.torch_utils.data_helper.tensor_to_list(item: Any) Any [source]¶
- Overview:
Convert
torch.Tensor
objects tolist
, and keep their dtypes unchanged.- Arguments:
item (
Any
): The item to be converted.
- Returns:
item (
Any
): The lists after conversion.
- Examples (2d-tensor):
>>> t = torch.randn(3, 5) >>> tlist1 = tensor_to_list(t) >>> assert len(tlist1) == 3 >>> assert len(tlist1[0]) == 5
- Examples (1d-tensor):
>>> t = torch.randn(3, ) >>> tlist1 = tensor_to_list(t) >>> assert len(tlist1) == 3
- Examples (list)
>>> t = [torch.randn(5, ) for i in range(3)] >>> tlist1 = tensor_to_list(t) >>> assert len(tlist1) == 3 >>> assert len(tlist1[0]) == 5
- Examples (dict):
>>> td = {'t': torch.randn(3, 5)} >>> tdlist1 = tensor_to_list(td) >>> assert len(tdlist1['t']) == 3 >>> assert len(tdlist1['t'][0]) == 5
Note
Now supports item type:
torch.Tensor
,dict
,list
,tuple
andNone
.
to_item¶
- ding.torch_utils.data_helper.to_item(data: Any, ignore_error: bool = True) Any [source]¶
- Overview:
Convert data to python native scalar (i.e. data item), and keep their dtypes unchanged.
- Arguments:
data (
Any
): The data that needs to be converted.ignore_error (
bool
): Whether to ignore the error when the data type is not supported. That is to say, only the data can be transformed into a python native scalar will be returned.
- Returns:
data (
Any
): Converted data.
- Examples:
>>>> data = { ‘tensor’: torch.randn(1), ‘list’: [True, False, torch.randn(1)], ‘tuple’: (4, 5, 6), ‘bool’: True, ‘int’: 10, ‘float’: 10., ‘array’: np.random.randn(1), ‘str’: “asdf”, ‘none’: None, } >>>> new_data = to_item(data) >>>> assert np.isscalar(new_data[‘tensor’]) >>>> assert np.isscalar(new_data[‘array’]) >>>> assert np.isscalar(new_data[‘list’][-1])
Note
Now supports item type:
torch.Tensor
,torch.Tensor
,ttorch.Tensor
,bool
,str
,dict
,list
,tuple
andNone
.
same_shape¶
- ding.torch_utils.data_helper.same_shape(data: list) bool [source]¶
- Overview:
Judge whether all data elements in a list have the same shapes.
- Arguments:
data (
list
): The list of data.
- Returns:
same (
bool
): Whether the list of data all have the same shape.
- Examples:
>>> tlist = [torch.randn(3, 5) for i in range(5)] >>> assert same_shape(tlist) >>> tlist = [torch.randn(3, 5), torch.randn(4, 5)] >>> assert not same_shape(tlist)
LogDict¶
- class ding.torch_utils.data_helper.LogDict[source]¶
- Overview:
Derived from
dict
. Would converttorch.Tensor
tolist
for convenient logging.- Interfaces:
_transform
,__setitem__
,update
.
build_log_buffer¶
- ding.torch_utils.data_helper.build_log_buffer() LogDict [source]¶
- Overview:
Build log buffer, a subclass of dict, which can convert the input data into log format.
- Returns:
log_buffer (
LogDict
): Log buffer dict.
- Examples:
>>> log_buffer = build_log_buffer() >>> log_buffer['not_tensor'] = torch.randn(3) >>> assert isinstance(log_buffer['not_tensor'], list) >>> assert len(log_buffer['not_tensor']) == 3 >>> log_buffer.update({'not_tensor': 4, 'a': 5}) >>> assert log_buffer['not_tensor'] == 4
CudaFetcher¶
- class ding.torch_utils.data_helper.CudaFetcher(data_source: Iterable, device: str, queue_size: int = 4, sleep: float = 0.1)[source]¶
- Overview:
Fetch data from source, and transfer it to a specified device.
- Interfaces:
__init__
,__next__
,run
,close
.
- __init__(data_source: Iterable, device: str, queue_size: int = 4, sleep: float = 0.1) None [source]¶
- Overview:
Initialize the CudaFetcher object using the given arguments.
- Arguments:
data_source (
Iterable
): The iterable data source.device (
str
): The device to put data to, such as “cuda:0”.queue_size (
int
): The internal size of queue, such as 4.sleep (
float
): Sleeping time when the internal queue is full.
- _producer() None [source]¶
- Overview:
Keep fetching data from source, change the device, and put into
queue
for request.
- run() None [source]¶
- Overview:
Start
producer
thread: Keep fetching data from source, change the device, and put intoqueue
for request.- Examples:
>>> timer = EasyTimer() >>> dataloader = iter([torch.randn(3, 3) for _ in range(10)]) >>> dataloader = CudaFetcher(dataloader, device='cuda', sleep=0.1) >>> dataloader.run() >>> data = next(dataloader)
get_tensor_data¶
- ding.torch_utils.data_helper.get_tensor_data(data: Any) Any [source]¶
- Overview:
Get pure tensor data from the given data (without disturbing grad computation graph).
- Arguments:
data (
Any
): The original data. It can be exactly a tensor or a container (Sequence or dict).
- Returns:
output (
Any
): The output data.
- Examples:
>>> a = { 'tensor': torch.tensor([1, 2, 3.], requires_grad=True), 'list': [torch.tensor([1, 2, 3.], requires_grad=True) for _ in range(2)], 'none': None } >>> tensor_a = get_tensor_data(a) >>> assert not tensor_a['tensor'].requires_grad >>> for t in tensor_a['list']: >>> assert not t.requires_grad
unsqueeze¶
- ding.torch_utils.data_helper.unsqueeze(data: Any, dim: int = 0) Any [source]¶
- Overview:
Unsqueeze the tensor data.
- Arguments:
data (
Any
): The original data. It can be exactly a tensor or a container (Sequence or dict).dim (
int
): The dimension to be unsqueezed.
- Returns:
output (
Any
): The output data.
- Examples (tensor):
>>> t = torch.randn(3, 3) >>> tt = unsqueeze(t, dim=0) >>> assert tt.shape == torch.Shape([1, 3, 3])
- Examples (list):
>>> t = [torch.randn(3, 3)] >>> tt = unsqueeze(t, dim=0) >>> assert tt[0].shape == torch.Shape([1, 3, 3])
- Examples (dict):
>>> t = {"t": torch.randn(3, 3)} >>> tt = unsqueeze(t, dim=0) >>> assert tt["t"].shape == torch.Shape([1, 3, 3])
squeeze¶
- ding.torch_utils.data_helper.squeeze(data: Any, dim: int = 0) Any [source]¶
- Overview:
Squeeze the tensor data.
- Arguments:
data (
Any
): The original data. It can be exactly a tensor or a container (Sequence or dict).dim (
int
): The dimension to be Squeezed.
- Returns:
output (
Any
): The output data.
- Examples (tensor):
>>> t = torch.randn(1, 3, 3) >>> tt = squeeze(t, dim=0) >>> assert tt.shape == torch.Shape([3, 3])
- Examples (list):
>>> t = [torch.randn(1, 3, 3)] >>> tt = squeeze(t, dim=0) >>> assert tt[0].shape == torch.Shape([3, 3])
- Examples (dict):
>>> t = {"t": torch.randn(1, 3, 3)} >>> tt = squeeze(t, dim=0) >>> assert tt["t"].shape == torch.Shape([3, 3])
get_null_data¶
- ding.torch_utils.data_helper.get_null_data(template: Any, num: int) List[Any] [source]¶
- Overview:
Get null data given an input template.
- Arguments:
template (
Any
): The template data.num (
int
): The number of null data items to generate.
- Returns:
output (
List[Any]
): The generated null data.
- Examples:
>>> temp = {'obs': [1, 2, 3], 'action': 1, 'done': False, 'reward': torch.tensor(1.)} >>> null_data = get_null_data(temp, 2) >>> assert len(null_data) ==2 >>> assert null_data[0]['null'] and null_data[0]['done']
zeros_like¶
- ding.torch_utils.data_helper.zeros_like(h: Any) Any [source]¶
- Overview:
Generate zero-tensors like the input data.
- Arguments:
h (
Any
): The original data. It can be exactly a tensor or a container (Sequence or dict).
- Returns:
output (
Any
): The output zero-tensors.
- Examples (tensor):
>>> t = torch.randn(3, 3) >>> tt = zeros_like(t) >>> assert tt.shape == torch.Shape([3, 3]) >>> assert torch.sum(torch.abs(tt)) < 1e-8
- Examples (list):
>>> t = [torch.randn(3, 3)] >>> tt = zeros_like(t) >>> assert tt[0].shape == torch.Shape([3, 3]) >>> assert torch.sum(torch.abs(tt[0])) < 1e-8
- Examples (dict):
>>> t = {"t": torch.randn(3, 3)} >>> tt = zeros_like(t) >>> assert tt["t"].shape == torch.Shape([3, 3]) >>> assert torch.sum(torch.abs(tt["t"])) < 1e-8
dataparallel¶
Please refer to ding/torch_utils/dataparallel
for more details.
DataParallel¶
- class ding.torch_utils.dataparallel.DataParallel(module, device_ids=None, output_device=None, dim=0)[source]¶
- Overview:
A wrapper class for nn.DataParallel.
- Interfaces:
__init__
,parameters
- __init__(module, device_ids=None, output_device=None, dim=0)[source]¶
- Overview:
Initialize the DataParallel object.
- Arguments:
module (
nn.Module
): The module to be parallelized.device_ids (
list
): The list of GPU ids.output_device (
int
): The output GPU id.dim (
int
): The dimension to be parallelized.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- parameters(recurse: bool = True)[source]¶
- Overview:
Return the parameters of the module.
- Arguments:
recurse (
bool
): Whether to return the parameters of the submodules.
- Returns:
params (
generator
): The generator of the parameters.
- training: bool¶
distribution¶
Please refer to ding/torch_utils/distribution
for more details.
Pd¶
- class ding.torch_utils.distribution.Pd[source]¶
- Overview:
Abstract class for parameterizable probability distributions and sampling functions.
- Interfaces:
neglogp
,entropy
,noise_mode
,mode
,sample
Tip
In dereived classes, logits should be an attribute member stored in class.
- entropy() Tensor [source]¶
- Overview:
Calculate the softmax entropy of logits
- Arguments:
reduction (
str
): support [None, ‘mean’], default set to ‘mean’
- Returns:
entropy (
torch.Tensor
): the calculated entropy
CategoricalPd¶
- class ding.torch_utils.distribution.CategoricalPd(logits: Tensor | None = None)[source]¶
- Overview:
Catagorical probility distribution sampler
- Interfaces:
__init__
,neglogp
,entropy
,noise_mode
,mode
,sample
- __init__(logits: Tensor | None = None) None [source]¶
- Overview:
Init the Pd with logits
- Arguments:
logits (:obj:torch.Tensor): logits to sample from
- entropy(reduction: str = 'mean') Tensor [source]¶
- Overview:
Calculate the softmax entropy of logits
- Arguments:
reduction (
str
): support [None, ‘mean’], default set to mean
- Returns:
entropy (
torch.Tensor
): the calculated entropy
- mode(viz: bool = False) Tuple[Tensor, Dict[str, ndarray]] [source]¶
- Overview:
return logits argmax result
- Arguments:
- viz (
bool
): Whether to return numpy from of logits, noise and noise_logits; Short for
visualize
. (Because tensor type cannot visualize in tb or text log)
- viz (
- Returns:
result (
torch.Tensor
): the logits argmax resultviz_feature (
Dict[str, np.ndarray]
): ndarray type data for visualization.
- neglogp(x, reduction: str = 'mean') Tensor [source]¶
- Overview:
Calculate cross_entropy between input x and logits
- Arguments:
x (
torch.Tensor
): the input tensorreduction (
str
): support [None, ‘mean’], default set to mean
- Return:
cross_entropy (
torch.Tensor
): the returned cross_entropy loss
- noise_mode(viz: bool = False) Tuple[Tensor, Dict[str, ndarray]] [source]¶
- Overview:
add noise to logits
- Arguments:
viz (
bool
): Whether to return numpy from of logits, noise and noise_logits; Short forvisualize
. (Because tensor type cannot visualize in tb or text log)
- Returns:
result (
torch.Tensor
): noised logitsviz_feature (
Dict[str, np.ndarray]
): ndarray type data for visualization.
- sample(viz: bool = False) Tuple[Tensor, Dict[str, ndarray]] [source]¶
- Overview:
Sample from logits’s distribution by using softmax
- Arguments:
viz (
bool
): Whether to return numpy from of logits, noise and noise_logits; Short forvisualize
. (Because tensor type cannot visualize in tb or text log)
- Returns:
result (
torch.Tensor
): the logits sampled resultviz_feature (
Dict[str, np.ndarray]
): ndarray type data for visualization.
CategoricalPdPytorch¶
- class ding.torch_utils.distribution.CategoricalPdPytorch(probs: Tensor | None = None)[source]¶
- Overview:
Wrapped
torch.distributions.Categorical
- Interfaces:
__init__
,update_logits
,update_probs
,sample
,neglogp
,mode
,entropy
- __init__(probs: Tensor | None = None) None [source]¶
- Overview:
Initialize the CategoricalPdPytorch object.
- Arguments:
probs (
torch.Tensor
): The tensor of probabilities.
- entropy(reduction: str | None = None) Tensor [source]¶
- Overview:
Calculate the softmax entropy of logits
- Arguments:
reduction (
str
): support [None, ‘mean’], default set to mean
- Returns:
entropy (
torch.Tensor
): the calculated entropy
- mode() Tensor [source]¶
- Overview:
Return logits argmax result
- Return:
result(
torch.Tensor
): the logits argmax result
- neglogp(actions: Tensor, reduction: str = 'mean') Tensor [source]¶
- Overview:
Calculate cross_entropy between input x and logits
- Arguments:
actions (
torch.Tensor
): the input action tensorreduction (
str
): support [None, ‘mean’], default set to mean
- Return:
cross_entropy (
torch.Tensor
): the returned cross_entropy loss
- sample() Tensor [source]¶
- Overview:
Sample from logits’s distribution by using softmax
- Return:
result (
torch.Tensor
): the logits sampled result
lr_scheduler¶
Please refer to ding/torch_utils/lr_scheduler
for more details.
get_lr_ratio¶
- ding.torch_utils.lr_scheduler.get_lr_ratio(epoch: int, warmup_epochs: int, learning_rate: float, lr_decay_epochs: int, min_lr: float) float [source]¶
- Overview:
Get learning rate ratio for each epoch.
- Arguments:
epoch (
int
): Current epoch.warmup_epochs (
int
): Warmup epochs.learning_rate (
float
): Learning rate.lr_decay_epochs (
int
): Learning rate decay epochs.min_lr (
float
): Minimum learning rate.
cos_lr_scheduler¶
- ding.torch_utils.lr_scheduler.cos_lr_scheduler(optimizer: Optimizer, learning_rate: float, warmup_epochs: float = 5, lr_decay_epochs: float = 100, min_lr: float = 6e-05) LambdaLR [source]¶
- Overview:
Cosine learning rate scheduler.
- Arguments:
optimizer (
torch.optim.Optimizer
): Optimizer.learning_rate (
float
): Learning rate.warmup_epochs (
float
): Warmup epochs.lr_decay_epochs (
float
): Learning rate decay epochs.min_lr (
float
): Minimum learning rate.
math_helper¶
Please refer to ding/torch_utils/math_helper
for more details.
cov¶
- ding.torch_utils.math_helper.cov(x: Tensor, rowvar: bool = False, bias: bool = False, ddof: int | None = None, aweights: Tensor | None = None) Tensor [source]¶
- Overview:
Estimates covariance matrix like
numpy.cov
.- Arguments:
x (
torch.Tensor
): A 1-D or 2-D tensor containing multiple variables and observations. Each row ofx
represents a variable, and each column a single observation of all those variables.rowvar (
bool
): Ifrowvar
is True by default, and each column is a single observation of all those variables. Otherwise, each column represents a variable, while the rows contain observations.bias (
bool
): Default normalization (False) is by dividingN - 1
, whereN
is the number of observations given (unbiased estimate). Ifbias
isTrue
, then normalization is byN
.ddof (
Optional[int]
): Ifddof
is notNone
, it implies that the argumentbias
is overridden. Note thatddof=1
will return the unbiased estimate (equals tobias=False
), andddof=0
will return the biased estimation (equals tobias=True
).aweights (
Optional[torch.Tensor]
): 1-D tensor of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. Ifddof=0
, the tensor of weights can be used to assign weights to observation vectors.
- Returns:
cov_mat (
torch.Tensor
): Covariance matrix calculated.
metric¶
Please refer to ding/torch_utils/metric
for more details.
levenshtein_distance¶
- ding.torch_utils.metric.levenshtein_distance(pred: LongTensor, target: LongTensor, pred_extra: Tensor | None = None, target_extra: Tensor | None = None, extra_fn: Callable | None = None) FloatTensor [source]¶
- Overview:
Levenshtein Distance, i.e. Edit Distance.
- Arguments:
pred (
torch.LongTensor
): The first tensor to calculate the distance, shape: (N1, ) (N1 >= 0).target (
torch.LongTensor
): The second tensor to calculate the distance, shape: (N2, ) (N2 >= 0).pred_extra (
Optional[torch.Tensor]
): Extra tensor to calculate the distance, only works whenextra_fn
is notNone
.target_extra (
Optional[torch.Tensor]
): Extra tensor to calculate the distance, only works whenextra_fn
is notNone
.extra_fn (
Optional[Callable]
): The distance function forpred_extra
andtarget_extra
. If set toNone
, this distance will not be considered.
- Returns:
distance (
torch.FloatTensor
): distance(scalar), shape: (1, ).
hamming_distance¶
- ding.torch_utils.metric.hamming_distance(pred: LongTensor, target: LongTensor, weight=1.0) LongTensor [source]¶
- Overview:
Hamming Distance.
- Arguments:
pred (
torch.LongTensor
): Pred input, boolean vector(0 or 1).target (
torch.LongTensor
): Target input, boolean vector(0 or 1).weight (
torch.LongTensor
): Weight to multiply.
- Returns:
distance(
torch.LongTensor
): Distance (scalar), shape (1, ).
- Shapes:
pred & target (
torch.LongTensor
): shape \((B, N)\), while B is the batch size, N is the dimension
model_helper¶
Please refer to ding/torch_utils/model_helper
for more details.
get_num_params¶
- ding.torch_utils.model_helper.get_num_params(model: Module) int [source]¶
- Overview:
Return the number of parameters in the model.
- Arguments:
model (
torch.nn.Module
): The model object to calculate the parameter number.
- Returns:
n_params (
int
): The calculated number of parameters.
- Examples:
>>> model = torch.nn.Linear(3, 5) >>> num = get_num_params(model) >>> assert num == 15
nn_test_helper¶
Please refer to ding/torch_utils/nn_test_helper
for more details.
is_differentiable¶
- ding.torch_utils.nn_test_helper.is_differentiable(loss: Tensor, model: Module | List[Module], print_instead: bool = False) None [source]¶
- Overview:
Judge whether the model/models are differentiable. First check whether module’s grad is None, then do loss’s back propagation, finally check whether module’s grad are torch.Tensor.
- Arguments:
loss (
torch.Tensor
): loss tensor of the modelmodel (
Union[torch.nn.Module, List[torch.nn.Module]]
): model or models to be checkedprint_instead (
bool
): Whether to print module’s final grad result, instead of asserting. Default set toFalse
.
optimizer_helper¶
Please refer to ding/torch_utils/optimizer_helper
for more details.
calculate_grad_norm¶
calculate_grad_norm_without_bias_two_norm¶
grad_ignore_norm¶
- ding.torch_utils.optimizer_helper.grad_ignore_norm(parameters, max_norm, norm_type=2)[source]¶
- Overview:
Clip the gradient norm of an iterable of parameters.
- Arguments:
parameters (
Iterable
): an iterable of torch.Tensormax_norm (
float
): the max norm of the gradientsnorm_type (
float
): 2.0 means use norm2 to clip
grad_ignore_value¶
Adam¶
- class ding.torch_utils.optimizer_helper.Adam(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, optim_type: str = 'adam', grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶
- Overview:
Rewrited Adam optimizer to support more features.
- Interfaces:
__init__
,step
,_state_init
,get_grad
- __init__(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, optim_type: str = 'adam', grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶
- Overview:
init method of refactored Adam class
- Arguments:
params (
iterable
): – an iterable of torch.Tensor s or dict s. Specifies what Tensors should be optimizedlr (
float
): learning rate, default set to 1e-3betas (
Tuple[float, float]
): coefficients used for computing running averages of gradient and its square, default set to (0.9, 0.999))eps (
float
): term added to the denominator to improve numerical stability, default set to 1e-8weight_decay (
float
): weight decay coefficient, deault set to 0amsgrad (
bool
): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond <https://arxiv.org/abs/1904.09237>optim_type (:obj:str): support [“adam”, “adamw”]
grad_clip_type (
str
): support [None, ‘clip_momentum’, ‘clip_value’, ‘clip_norm’, ‘clip_momentum_norm’]clip_value (
float
): the value to start clippingclip_coef (
float
): the cliping coefficientclip_norm_type (
float
): 2.0 means use norm2 to clipclip_momentum_timestep (
int
): after how many step should we start the momentum clippinggrad_ignore_type (
str
): support [None, ‘ignore_momentum’, ‘ignore_value’, ‘ignore_norm’, ‘ignore_momentum_norm’]ignore_value (
float
): the value to start ignoringignore_coef (
float
): the ignoreing coefficientignore_norm_type (
float
): 2.0 means use norm2 to ignoreignore_momentum_timestep (
int
): after how many step should we start the momentum ignoring
- _optimizer_load_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶
- _optimizer_load_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶
- _optimizer_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶
- _optimizer_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶
- _optimizer_step_post_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], None]]¶
- _optimizer_step_pre_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], Tuple[Tuple[Any, ...], Dict[str, Any]] | None]]¶
- _state_init(p, amsgrad)[source]¶
- Overview:
Initialize the state of the optimizer
- Arguments:
p (
torch.Tensor
): the parameter to be optimizedamsgrad (
bool
): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond <https://arxiv.org/abs/1904.09237>
RMSprop¶
- class ding.torch_utils.optimizer_helper.RMSprop(params: Iterable, lr: float = 0.01, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0, centered: bool = False, grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶
- Overview:
Rewrited RMSprop optimizer to support more features.
- Interfaces:
__init__
,step
,_state_init
,get_grad
- __init__(params: Iterable, lr: float = 0.01, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0, centered: bool = False, grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶
- Overview:
init method of refactored Adam class
- Arguments:
params (
iterable
): – an iterable of torch.Tensor s or dict s. Specifies what Tensors should be optimizedlr (
float
): learning rate, default set to 1e-3alpha (
float
): smoothing constant, default set to 0.99eps (
float
): term added to the denominator to improve numerical stability, default set to 1e-8weight_decay (
float
): weight decay coefficient, deault set to 0centred (
bool
): if True, compute the centered RMSprop, the gradient is normalized by an estimation of its variancegrad_clip_type (
str
): support [None, ‘clip_momentum’, ‘clip_value’, ‘clip_norm’, ‘clip_momentum_norm’]clip_value (
float
): the value to start clippingclip_coef (
float
): the cliping coefficientclip_norm_type (
float
): 2.0 means use norm2 to clipclip_momentum_timestep (
int
): after how many step should we start the momentum clippinggrad_ignore_type (
str
): support [None, ‘ignore_momentum’, ‘ignore_value’, ‘ignore_norm’, ‘ignore_momentum_norm’]ignore_value (
float
): the value to start ignoringignore_coef (
float
): the ignoreing coefficientignore_norm_type (
float
): 2.0 means use norm2 to ignoreignore_momentum_timestep (
int
): after how many step should we start the momentum ignoring
- _optimizer_load_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶
- _optimizer_load_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶
- _optimizer_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶
- _optimizer_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶
- _optimizer_step_post_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], None]]¶
- _optimizer_step_pre_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], Tuple[Tuple[Any, ...], Dict[str, Any]] | None]]¶
- _state_init(p, momentum, centered)[source]¶
- Overview:
Initialize the state of the optimizer
- Arguments:
p (
torch.Tensor
): the parameter to be optimizedmomentum (
float
): the momentum coefficientcentered (
bool
): if True, compute the centered RMSprop, the gradient is normalized by an estimation of its variance
PCGrad¶
- class ding.torch_utils.optimizer_helper.PCGrad(optimizer, reduction='mean')[source]¶
- Overview:
PCGrad optimizer to support multi-task. you can view the paper in the following link https://arxiv.org/pdf/2001.06782.pdf
- Interfaces:
__init__
,zero_grad
,step
,pc_backward
- Properties:
optimizer (
torch.optim
): the optimizer to be used
- __init__(optimizer, reduction='mean')[source]¶
- Overview:
Initialization of PCGrad optimizer
- Arguments:
optimizer (
torch.optim
): the optimizer to be usedreduction (
str
): the reduction method, support [‘mean’, ‘sum’]
- _flatten_grad(grads, shapes)[source]¶
- Overview:
flatten the gradient of the parameters of the network
- Arguments:
grads (
list
): a list of the gradient of the parametersshapes (
list
): a list of the shape of the parameters
- _pack_grad(objectives)[source]¶
- Overview:
pack the gradient of the parameters of the network for each objective
- Arguments:
objectives: a list of objectives
- Returns:
grad: a list of the gradient of the parameters
shape: a list of the shape of the parameters
has_grad: a list of mask represent whether the parameter has gradient
- _project_conflicting(grads, has_grads, shapes=None)[source]¶
- Overview:
project the conflicting gradient to the orthogonal space
- Arguments:
grads (
list
): a list of the gradient of the parametershas_grads (
list
): a list of mask represent whether the parameter has gradientshapes (
list
): a list of the shape of the parameters
- _retrieve_grad()[source]¶
- Overview:
get the gradient of the parameters of the network with specific objective
- Returns:
grad: a list of the gradient of the parameters
shape: a list of the shape of the parameters
has_grad: a list of mask represent whether the parameter has gradient
- _set_grad(grads)[source]¶
- Overview:
set the modified gradients to the network
- Arguments:
grads (
list
): a list of the gradient of the parameters
- _unflatten_grad(grads, shapes)[source]¶
- Overview:
unflatten the gradient of the parameters of the network
- Arguments:
grads (
list
): a list of the gradient of the parametersshapes (
list
): a list of the shape of the parameters
- property optimizer¶
- Overview:
get the optimizer
configure_weight_decay¶
- ding.torch_utils.optimizer_helper.configure_weight_decay(model: Module, weight_decay: float) List [source]¶
- Overview:
Separating out all parameters of the model into two buckets: those that will experience weight decay for regularization and those that won’t (biases, and layer-norm or embedding weights).
- Arguments:
model (
nn.Module
): The given PyTorch model.weight_decay (
float
): Weight decay value for optimizer.
- Returns:
optim groups (
List
): The parameter groups to be set in the latter optimizer.
parameter¶
Please refer to ding/torch_utils/parameter
for more details.
NonegativeParameter¶
- class ding.torch_utils.parameter.NonegativeParameter(data: Tensor | None = None, requires_grad: bool = True, delta: float = 1e-08)[source]¶
- Overview:
This module will output a non-negative parameter during the forward process.
- Interfaces:
__init__
,forward
,set_data
.
- __init__(data: Tensor | None = None, requires_grad: bool = True, delta: float = 1e-08)[source]¶
- Overview:
Initialize the NonegativeParameter object using the given arguments.
- Arguments:
data (
Optional[torch.Tensor]
): The initial value of generated parameter. If set toNone
, the default value is 0.requires_grad (
bool
): Whether this parameter requires grad.delta (
Any
): The delta of log function.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward() Tensor [source]¶
- Overview:
Output the non-negative parameter during the forward process.
- Returns:
parameter (
torch.Tensor
): The generated parameter.
- set_data(data: Tensor) None [source]¶
- Overview:
Set the value of the non-negative parameter.
- Arguments:
data (
torch.Tensor
): The new value of the non-negative parameter.
- training: bool¶
TanhParameter¶
- class ding.torch_utils.parameter.TanhParameter(data: Tensor | None = None, requires_grad: bool = True)[source]¶
- Overview:
This module will output a tanh parameter during the forward process.
- Interfaces:
__init__
,forward
,set_data
.
- __init__(data: Tensor | None = None, requires_grad: bool = True)[source]¶
- Overview:
Initialize the TanhParameter object using the given arguments.
- Arguments:
data (
Optional[torch.Tensor]
): The initial value of generated parameter. If set toNone
, the default value is 1.requires_grad (
bool
): Whether this parameter requires grad.
- _backward_hooks: Dict[int, Callable]¶
- _backward_pre_hooks: Dict[int, Callable]¶
- _buffers: Dict[str, Tensor | None]¶
- _forward_hooks: Dict[int, Callable]¶
- _forward_hooks_always_called: Dict[int, bool]¶
- _forward_hooks_with_kwargs: Dict[int, bool]¶
- _forward_pre_hooks: Dict[int, Callable]¶
- _forward_pre_hooks_with_kwargs: Dict[int, bool]¶
- _is_full_backward_hook: bool | None¶
- _load_state_dict_post_hooks: Dict[int, Callable]¶
- _load_state_dict_pre_hooks: Dict[int, Callable]¶
- _modules: Dict[str, Module | None]¶
- _non_persistent_buffers_set: Set[str]¶
- _parameters: Dict[str, Parameter | None]¶
- _state_dict_hooks: Dict[int, Callable]¶
- _state_dict_pre_hooks: Dict[int, Callable]¶
- forward() Tensor [source]¶
- Overview:
Output the tanh parameter during the forward process.
- Returns:
parameter (
torch.Tensor
): The generated parameter.
- set_data(data: Tensor) None [source]¶
- Overview:
Set the value of the tanh parameter.
- Arguments:
data (
torch.Tensor
): The new value of the tanh parameter.
- training: bool¶
reshape_helper¶
Please refer to ding/torch_utils/reshape_helper
for more details.
fold_batch¶
- ding.torch_utils.reshape_helper.fold_batch(x: Tensor, nonbatch_ndims: int = 1) Tuple[Tensor, Size] [source]¶
- Overview:
\((T, B, X) \leftarrow (T*B, X)\) Fold the first (ndim - nonbatch_ndims) dimensions of a tensor as batch dimension. This operation is similar to torch.flatten but provides an inverse function unfold_batch to restore the folded dimensions.
- Arguments:
x (
torch.Tensor
): the tensor to fold- nonbatch_ndims (
int
): the number of dimensions that is not folded as batch dimension.
- nonbatch_ndims (
- Returns:
x (
torch.Tensor
): the folded tensor- batch_dims: the folded dimensions of the original tensor, which can be used to
reverse the operation
- Examples:
>>> x = torch.ones(10, 20, 5, 4, 8) >>> x, batch_dim = fold_batch(x, 2) >>> x.shape == (1000, 4, 8) >>> batch_dim == (10, 20, 5)
unfold_batch¶
- ding.torch_utils.reshape_helper.unfold_batch(x: Tensor, batch_dims: Size | Tuple) Tensor [source]¶
- Overview:
Unfold the batch dimension of a tensor.
- Arguments:
x (
torch.Tensor
): the tensor to unfoldbatch_dims (
torch.Size
): the dimensions that are folded
- Returns:
x (
torch.Tensor
): the original unfolded tensor
- Examples:
>>> x = torch.ones(10, 20, 5, 4, 8) >>> x, batch_dim = fold_batch(x, 2) >>> x.shape == (1000, 4, 8) >>> batch_dim == (10, 20, 5) >>> x = unfold_batch(x, batch_dim) >>> x.shape == (10, 20, 5, 4, 8)
unsqueeze_repeat¶
- ding.torch_utils.reshape_helper.unsqueeze_repeat(x: Tensor, repeat_times: int, unsqueeze_dim: int = 0) Tensor [source]¶
- Overview:
Squeeze the tensor on unsqueeze_dim and then repeat in this dimension for repeat_times times. This is useful for preproprocessing the input to an model ensemble.
- Arguments:
x (
torch.Tensor
): the tensor to squeeze and repeatrepeat_times (
int
): the times that the tensor is repeatdunsqueeze_dim (
int
): the unsqueezed dimension
- Returns:
x (
torch.Tensor
): the unsqueezed and repeated tensor
- Examples:
>>> x = torch.ones(64, 6) >>> x = unsqueeze_repeat(x, 4) >>> x.shape == (4, 64, 6)
>>> x = torch.ones(64, 6) >>> x = unsqueeze_repeat(x, 4, -1) >>> x.shape == (64, 6, 4)