Shortcuts

ding.torch_utils

loss

Please refer to ding/torch_utils/loss for more details.

ContrastiveLoss

class ding.torch_utils.loss.ContrastiveLoss(x_size: int | SequenceType, y_size: int | SequenceType, heads: SequenceType = [1, 1], encode_shape: int = 64, loss_type: str = 'infoNCE', temperature: float = 1.0)[source]
Overview:

The class for contrastive learning losses. Only InfoNCE loss is supported currently. Code Reference: https://github.com/rdevon/DIM. Paper Reference: https://arxiv.org/abs/1808.06670.

Interfaces:

__init__, forward.

__init__(x_size: int | SequenceType, y_size: int | SequenceType, heads: SequenceType = [1, 1], encode_shape: int = 64, loss_type: str = 'infoNCE', temperature: float = 1.0) None[source]
Overview:

Initialize the ContrastiveLoss object using the given arguments.

Arguments:
  • x_size (Union[int, SequenceType]): input shape for x, both the obs shape and the encoding shape are supported.

  • y_size (Union[int, SequenceType]): Input shape for y, both the obs shape and the encoding shape are supported.

  • heads (SequenceType): A list of 2 int elems, heads[0] for x and head[1] for y. Used in multi-head, global-local, local-local MI maximization process.

  • encoder_shape (Union[int, SequenceType]): The dimension of encoder hidden state.

  • loss_type: Only the InfoNCE loss is available now.

  • temperature: The parameter to adjust the log_softmax.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_create_encoder(obs_size: int | SequenceType, heads: int) Module[source]
Overview:

Create the encoder for the input obs.

Arguments:
  • obs_size (Union[int, SequenceType]): input shape for x, both the obs shape and the encoding shape are supported. If the obs_size is an int, it means the obs is a 1D vector. If the obs_size is a list such as [1, 16, 16], it means the obs is a 3D image with shape [1, 16, 16].

  • heads (int): The number of heads.

Returns:
  • encoder (nn.Module): The encoder module.

Examples:
>>> obs_size = 16
or
>>> obs_size = [1, 16, 16]
>>> heads = 1
>>> encoder = self._create_encoder(obs_size, heads)
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, y: Tensor) Tensor[source]
Overview:

Computes the noise contrastive estimation-based loss, a.k.a. infoNCE.

Arguments:
  • x (torch.Tensor): The input x, both raw obs and encoding are supported.

  • y (torch.Tensor): The input y, both raw obs and encoding are supported.

Returns:

loss (torch.Tensor): The calculated loss value.

Examples:
>>> x_dim = [3, 16]
>>> encode_shape = 16
>>> x = np.random.normal(0, 1, size=x_dim)
>>> y = x ** 2 + 0.01 * np.random.normal(0, 1, size=x_dim)
>>> estimator = ContrastiveLoss(dims, dims, encode_shape=encode_shape)
>>> loss = estimator.forward(x, y)
Examples:
>>> x_dim = [3, 1, 16, 16]
>>> encode_shape = 16
>>> x = np.random.normal(0, 1, size=x_dim)
>>> y = x ** 2 + 0.01 * np.random.normal(0, 1, size=x_dim)
>>> estimator = ContrastiveLoss(dims, dims, encode_shape=encode_shape)
>>> loss = estimator.forward(x, y)
training: bool

LabelSmoothCELoss

class ding.torch_utils.loss.LabelSmoothCELoss(ratio: float)[source]
Overview:

Label smooth cross entropy loss.

Interfaces:

__init__, forward.

__init__(ratio: float) None[source]
Overview:

Initialize the LabelSmoothCELoss object using the given arguments.

Arguments:
  • ratio (float): The ratio of label-smoothing (the value is in 0-1). If the ratio is larger, the extent of label smoothing is larger.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(logits: Tensor, labels: LongTensor) Tensor[source]
Overview:

Calculate label smooth cross entropy loss.

Arguments:
  • logits (torch.Tensor): Predicted logits.

  • labels (torch.LongTensor): Ground truth.

Returns:
  • loss (torch.Tensor): Calculated loss.

training: bool

SoftFocalLoss

class ding.torch_utils.loss.SoftFocalLoss(gamma: int = 2, weight: Any | None = None, size_average: bool = True, reduce: bool | None = None)[source]
Overview:

Soft focal loss.

Interfaces:

__init__, forward.

__init__(gamma: int = 2, weight: Any | None = None, size_average: bool = True, reduce: bool | None = None) None[source]
Overview:

Initialize the SoftFocalLoss object using the given arguments.

Arguments:
  • gamma (int): The extent of focus on hard samples. A smaller gamma will lead to more focus on easy samples, while a larger gamma will lead to more focus on hard samples.

  • weight (Any): The weight for loss of each class.

  • size_average (bool): By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False.

  • reduce (Optional[bool]): By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss for each batch element instead and ignores size_average.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(inputs: Tensor, targets: LongTensor) Tensor[source]
Overview:

Calculate soft focal loss.

Arguments:
  • logits (torch.Tensor): Predicted logits.

  • labels (torch.LongTensor): Ground truth.

Returns:
  • loss (torch.Tensor): Calculated loss.

training: bool

build_ce_criterion

ding.torch_utils.loss.build_ce_criterion(cfg: dict) Module[source]
Overview:

Get a cross entropy loss instance according to given config.

Arguments:
  • cfg (dict)Config dict. It contains:
    • type (str): Type of loss function, now supports [‘cross_entropy’, ‘label_smooth_ce’, ‘soft_focal_loss’].

    • kwargs (dict): Arguments for the corresponding loss function.

Returns:
  • loss (nn.Module): loss function instance

MultiLogitsLoss

class ding.torch_utils.loss.MultiLogitsLoss(criterion: str | None = None, smooth_ratio: float = 0.1)[source]
Overview:

Base class for supervised learning on linklink, including basic processes.

Interfaces:

__init__, forward.

__init__(criterion: str | None = None, smooth_ratio: float = 0.1) None[source]
Overview:

Initialization method, use cross_entropy as default criterion.

Arguments:
  • criterion (str): Criterion type, supports [‘cross_entropy’, ‘label_smooth_ce’].

  • smooth_ratio (float): Smoothing ratio for label smoothing.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
static _get_distance_matrix(lx: ndarray, ly: ndarray, mat: ndarray, M: int) ndarray[source]
Overview:

Get distance matrix.

Arguments:
  • lx (np.ndarray): lx.

  • ly (np.ndarray): ly.

  • mat (np.ndarray): mat.

  • M (int): M.

_get_metric_matrix(logits: Tensor, labels: LongTensor) Tensor[source]
Overview:

Calculate the metric matrix.

Arguments:
  • logits (torch.Tensor): Predicted logits.

  • labels (torch.LongTensor): Ground truth.

Returns:
  • metric (torch.Tensor): Calculated metric matrix.

_is_full_backward_hook: bool | None
_label_process(logits: Tensor, labels: LongTensor) LongTensor[source]
Overview:

Process the label according to the criterion.

Arguments:
  • logits (torch.Tensor): Predicted logits.

  • labels (torch.LongTensor): Ground truth.

Returns:
  • ret (torch.LongTensor): Processed label.

_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_match(matrix: Tensor)[source]
Overview:

Match the metric matrix.

Arguments:
  • matrix (torch.Tensor): Metric matrix.

Returns:
  • index (np.ndarray): Matched index.

_modules: Dict[str, Module | None]
_nll_loss(nlls: Tensor, labels: LongTensor) Tensor[source]
Overview:

Calculate the negative log likelihood loss.

Arguments:
  • nlls (torch.Tensor): Negative log likelihood loss.

  • labels (torch.LongTensor): Ground truth.

Returns:
  • ret (torch.Tensor): Calculated loss.

_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(logits: Tensor, labels: LongTensor) Tensor[source]
Overview:

Calculate multiple logits loss.

Arguments:
  • logits (torch.Tensor): Predicted logits, whose shape must be 2-dim, like (B, N).

  • labels (torch.LongTensor): Ground truth.

Returns:
  • loss (torch.Tensor): Calculated loss.

training: bool

network.activation

Please refer to ding/torch_utils/network/activation for more details.

Lambda

class ding.torch_utils.network.activation.Lambda(f: Callable)[source]
Overview:

A custom lambda module for constructing custom layers.

Interfaces:

__init__, forward.

__init__(f: Callable)[source]
Overview:

Initialize the lambda module with a given function.

Arguments:
  • f (Callable): a python function

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Compute the function of the input tensor.

Arguments:
  • x (torch.Tensor): The input tensor.

training: bool

GLU

class ding.torch_utils.network.activation.GLU(input_dim: int, output_dim: int, context_dim: int, input_type: str = 'fc')[source]
Overview:

Gating Linear Unit (GLU), a specific type of activation function, which is first proposed in [Language Modeling with Gated Convolutional Networks](https://arxiv.org/pdf/1612.08083.pdf).

Interfaces:

__init__, forward.

__init__(input_dim: int, output_dim: int, context_dim: int, input_type: str = 'fc') None[source]
Overview:

Initialize the GLU module.

Arguments:
  • input_dim (int): The dimension of the input tensor.

  • output_dim (int): The dimension of the output tensor.

  • context_dim (int): The dimension of the context tensor.

  • input_type (str): The type of input, now supports [‘fc’, ‘conv2d’]

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, context: Tensor) Tensor[source]
Overview:

Compute the GLU transformation of the input tensor.

Arguments:
  • x (torch.Tensor): The input tensor.

  • context (torch.Tensor): The context tensor.

Returns:
  • x (torch.Tensor): The output tensor after GLU transformation.

training: bool

Swish

class ding.torch_utils.network.activation.Swish[source]
Overview:

Swish activation function, which is a smooth, non-monotonic activation function. For more details, please refer to [Searching for Activation Functions](https://arxiv.org/pdf/1710.05941.pdf).

Interfaces:

__init__, forward.

__init__()[source]
Overview:

Initialize the Swish module.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Compute the Swish transformation of the input tensor.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • x (torch.Tensor): The output tensor after Swish transformation.

training: bool

GELU

class ding.torch_utils.network.activation.GELU[source]
Overview:

Gaussian Error Linear Units (GELU) activation function, which is widely used in NLP models like GPT, BERT. For more details, please refer to the original paper: https://arxiv.org/pdf/1606.08415.pdf.

Interfaces:

__init__, forward.

__init__()[source]
Overview:

Initialize the GELU module.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Compute the GELU transformation of the input tensor.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • x (torch.Tensor): The output tensor after GELU transformation.

training: bool

build_activation

ding.torch_utils.network.activation.build_activation(activation: str, inplace: bool | None = None) Module[source]
Overview:

Build and return the activation module according to the given type.

Arguments:
  • activation (str): The type of activation module, now supports [‘relu’, ‘glu’, ‘prelu’, ‘swish’, ‘gelu’, ‘tanh’, ‘sigmoid’, ‘softplus’, ‘elu’, ‘square’, ‘identity’].

  • inplace (Optional[bool): Execute the operation in-place in activation, defaults to None.

Returns:
  • act_func (nn.module): The corresponding activation module.

network.diffusion

Please refer to ding/torch_utils/network/diffusion for more details.

extract

ding.torch_utils.network.diffusion.extract(a, t, x_shape)[source]
Overview:

extract output from a through index t.

Arguments:
  • a (torch.Tensor): input tensor

  • t (torch.Tensor): index tensor

  • x_shape (torch.Tensor): shape of x

cosine_beta_schedule

ding.torch_utils.network.diffusion.cosine_beta_schedule(timesteps: int, s: float = 0.008, dtype=torch.float32)[source]
Overview:

cosine schedule as proposed in https://openreview.net/forum?id=-NEXDKk8gZ

Arguments:
  • timesteps (int): timesteps of diffusion step

  • s (float): s

  • dtype (torch.dtype): dtype of beta

Return:

Tensor of beta [timesteps,], computing by cosine.

apply_conditioning

ding.torch_utils.network.diffusion.apply_conditioning(x, conditions, action_dim)[source]
Overview:

add condition into x

Arguments:
  • x (torch.Tensor): input tensor

  • conditions (dict): condition dict, key is timestep, value is condition

  • action_dim (int): action dim

DiffusionConv1d

class ding.torch_utils.network.diffusion.DiffusionConv1d(in_channels: int, out_channels: int, kernel_size: int, padding: int, activation: Module | None = None, n_groups: int = 8)[source]
Overview:

Conv1d with activation and normalization for diffusion models.

Interfaces:

__init__, forward

__init__(in_channels: int, out_channels: int, kernel_size: int, padding: int, activation: Module | None = None, n_groups: int = 8) None[source]
Overview:

Create a 1-dim convlution layer with activation and normalization. This Conv1d have GropuNorm. And need add 1-dim when compute norm

Arguments:
  • in_channels (int): Number of channels in the input tensor

  • out_channels (int): Number of channels in the output tensor

  • kernel_size (int): Size of the convolving kernel

  • padding (int): Zero-padding added to both sides of the input

  • activation (nn.Module): the optional activation function

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(inputs) Tensor[source]
Overview:

compute conv1d for inputs.

Arguments:
  • inputs (torch.Tensor): input tensor

Return:
  • out (torch.Tensor): output tensor

training: bool

SinusoidalPosEmb

class ding.torch_utils.network.diffusion.SinusoidalPosEmb(dim: int)[source]
Overview:

class for computing sin position embeding

Interfaces:

__init__, forward

__init__(dim: int) None[source]
Overview:

Initialization of SinusoidalPosEmb class

Arguments:
  • dim (int): dimension of embeding

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x) Tensor[source]
Overview:

compute sin position embeding

Arguments:
  • x (torch.Tensor): input tensor

Return:
  • emb (torch.Tensor): output tensor

training: bool

Residual

class ding.torch_utils.network.diffusion.Residual(fn)[source]
Overview:

Basic Residual block

Interfaces:

__init__, forward

__init__(fn)[source]
Overview:

Initialization of Residual class

Arguments:
  • fn (nn.Module): function of residual block

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x, *arg, **kwargs)[source]
Overview:

compute residual block

Arguments:
  • x (torch.Tensor): input tensor

training: bool

LayerNorm

class ding.torch_utils.network.diffusion.LayerNorm(dim, eps=1e-05)[source]
Overview:

LayerNorm, compute dim = 1, because Temporal input x [batch, dim, horizon]

Interfaces:

__init__, forward

__init__(dim, eps=1e-05) None[source]
Overview:

Initialization of LayerNorm class

Arguments:
  • dim (int): dimension of input

  • eps (float): eps of LayerNorm

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x)[source]
Overview:

compute LayerNorm

Arguments:
  • x (torch.Tensor): input tensor

training: bool

PreNorm

class ding.torch_utils.network.diffusion.PreNorm(dim, fn)[source]
Overview:

PreNorm, compute dim = 1, because Temporal input x [batch, dim, horizon]

Interfaces:

__init__, forward

__init__(dim, fn) None[source]
Overview:

Initialization of PreNorm class

Arguments:
  • dim (int): dimension of input

  • fn (nn.Module): function of residual block

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x)[source]
Overview:

compute PreNorm

Arguments:
  • x (torch.Tensor): input tensor

training: bool

LinearAttention

class ding.torch_utils.network.diffusion.LinearAttention(dim, heads=4, dim_head=32)[source]
Overview:

Linear Attention head

Interfaces:

__init__, forward

__init__(dim, heads=4, dim_head=32) None[source]
Overview:

Initialization of LinearAttention class

Arguments:
  • dim (int): dimension of input

  • heads (int): heads of attention

  • dim_head (int): dim of head

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x)[source]
Overview:

compute LinearAttention

Arguments:
  • x (torch.Tensor): input tensor

training: bool

ResidualTemporalBlock

class ding.torch_utils.network.diffusion.ResidualTemporalBlock(in_channels: int, out_channels: int, embed_dim: int, kernel_size: int = 5, mish: bool = True)[source]
Overview:

Residual block of temporal

Interfaces:

__init__, forward

__init__(in_channels: int, out_channels: int, embed_dim: int, kernel_size: int = 5, mish: bool = True) None[source]
Overview:

Initialization of ResidualTemporalBlock class

Arguments:
  • in_channels (:obj:’int’): dim of in_channels

  • out_channels (:obj:’int’): dim of out_channels

  • embed_dim (:obj:’int’): dim of embeding layer

  • kernel_size (:obj:’int’): kernel_size of conv1d

  • mish (:obj:’bool’): whether use mish as a activate function

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x, t)[source]
Overview:

compute residual block

Arguments:
  • x (:obj:’tensor’): input tensor

  • t (:obj:’tensor’): time tensor

training: bool

DiffusionUNet1d

class ding.torch_utils.network.diffusion.DiffusionUNet1d(transition_dim: int, dim: int = 32, dim_mults: SequenceType = [1, 2, 4, 8], returns_condition: bool = False, condition_dropout: float = 0.1, calc_energy: bool = False, kernel_size: int = 5, attention: bool = False)[source]
Overview:

Diffusion unet for 1d vector data

Interfaces:

__init__, forward, get_pred

__init__(transition_dim: int, dim: int = 32, dim_mults: SequenceType = [1, 2, 4, 8], returns_condition: bool = False, condition_dropout: float = 0.1, calc_energy: bool = False, kernel_size: int = 5, attention: bool = False) None[source]
Overview:

Initialization of DiffusionUNet1d class

Arguments:
  • transition_dim (:obj:’int’): dim of transition, it is obs_dim + action_dim

  • dim (:obj:’int’): dim of layer

  • dim_mults (:obj:’SequenceType’): mults of dim

  • returns_condition (:obj:’bool’): whether use return as a condition

  • condition_dropout (:obj:’float’): dropout of returns condition

  • calc_energy (:obj:’bool’): whether use calc_energy

  • kernel_size (:obj:’int’): kernel_size of conv1d

  • attention (:obj:’bool’): whether use attention

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x, cond, time, returns=None, use_dropout: bool = True, force_dropout: bool = False)[source]
Overview:

compute diffusion unet forward

Arguments:
  • x (:obj:’tensor’): noise trajectory

  • cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0

  • time (:obj:’int’): timestep of diffusion step

  • returns (:obj:’tensor’): condition returns of trajectory, returns is normal return

  • use_dropout (:obj:’bool’): Whether use returns condition mask

  • force_dropout (:obj:’bool’): Whether use returns condition

get_pred(x, cond, time, returns: bool | None = None, use_dropout: bool = True, force_dropout: bool = False)[source]
Overview:

compute diffusion unet forward

Arguments:
  • x (:obj:’tensor’): noise trajectory

  • cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0

  • time (:obj:’int’): timestep of diffusion step

  • returns (:obj:’tensor’): condition returns of trajectory, returns is normal return

  • use_dropout (:obj:’bool’): Whether use returns condition mask

  • force_dropout (:obj:’bool’): Whether use returns condition

training: bool

TemporalValue

class ding.torch_utils.network.diffusion.TemporalValue(horizon: int, transition_dim: int, dim: int = 32, time_dim: int | None = None, out_dim: int = 1, kernel_size: int = 5, dim_mults: SequenceType = [1, 2, 4, 8])[source]
Overview:

temporal net for value function

Interfaces:

__init__, forward

__init__(horizon: int, transition_dim: int, dim: int = 32, time_dim: int | None = None, out_dim: int = 1, kernel_size: int = 5, dim_mults: SequenceType = [1, 2, 4, 8]) None[source]
Overview:

Initialization of TemporalValue class

Arguments:
  • horizon (:obj:’int’): horizon of trajectory

  • transition_dim (:obj:’int’): dim of transition, it is obs_dim + action_dim

  • dim (:obj:’int’): dim of layer

  • time_dim (:obj:’int’): dim of time

  • out_dim (:obj:’int’): dim of output

  • kernel_size (:obj:’int’): kernel_size of conv1d

  • dim_mults (:obj:’SequenceType’): mults of dim

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x, cond, time, *args)[source]
Overview:

compute temporal value forward

Arguments:
  • x (:obj:’tensor’): noise trajectory

  • cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0

  • time (:obj:’int’): timestep of diffusion step

training: bool

network.dreamer

Please refer to ding/torch_utils/network/dreamer for more details.

Conv2dSame

class ding.torch_utils.network.dreamer.Conv2dSame(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] = 1, padding: str | int | Tuple[int, int] = 0, dilation: int | Tuple[int, int] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]
Overview:

Conv2dSame Network for dreamerv3.

Interfaces:

__init__, forward

_reversed_padding_repeated_twice: List[int]
bias: Tensor | None
calc_same_pad(i, k, s, d)[source]
Overview:

Calculate the same padding size.

Arguments:
  • i (int): Input size.

  • k (int): Kernel size.

  • s (int): Stride size.

  • d (int): Dilation size.

dilation: Tuple[int, ...]
forward(x)[source]
Overview:

compute the forward of Conv2dSame.

Arguments:
  • x (torch.Tensor): Input tensor.

groups: int
in_channels: int
kernel_size: Tuple[int, ...]
out_channels: int
output_padding: Tuple[int, ...]
padding: str | Tuple[int, ...]
padding_mode: str
stride: Tuple[int, ...]
transposed: bool
weight: Tensor

DreamerLayerNorm

class ding.torch_utils.network.dreamer.DreamerLayerNorm(ch, eps=0.001)[source]
Overview:

DreamerLayerNorm Network for dreamerv3.

Interfaces:

__init__, forward

__init__(ch, eps=0.001)[source]
Overview:

Init the DreamerLayerNorm class.

Arguments:
  • ch (int): Input channel.

  • eps (float): Epsilon.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x)[source]
Overview:

compute the forward of DreamerLayerNorm.

Arguments:
  • x (torch.Tensor): Input tensor.

training: bool

DenseHead

class ding.torch_utils.network.dreamer.DenseHead(inp_dim, shape, layer_num, units, act='SiLU', norm='LN', dist='normal', std=1.0, outscale=1.0, device='cpu')[source]
Overview:

DenseHead Network for value head, reward head, and discount head of dreamerv3.

Interfaces:

__init__, forward

__init__(inp_dim, shape, layer_num, units, act='SiLU', norm='LN', dist='normal', std=1.0, outscale=1.0, device='cpu')[source]
Overview:

Init the DenseHead class.

Arguments:
  • inp_dim (int): Input dimension.

  • shape (tuple): Output shape.

  • layer_num (int): Number of layers.

  • units (int): Number of units.

  • act (str): Activation function.

  • norm (str): Normalization function.

  • dist (str): Distribution function.

  • std (float): Standard deviation.

  • outscale (float): Output scale.

  • device (str): Device.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(features)[source]
Overview:

compute the forward of DenseHead.

Arguments:
  • features (torch.Tensor): Input tensor.

training: bool

ActionHead

class ding.torch_utils.network.dreamer.ActionHead(inp_dim, size, layers, units, act=<class 'torch.nn.modules.activation.ELU'>, norm=<class 'torch.nn.modules.normalization.LayerNorm'>, dist='trunc_normal', init_std=0.0, min_std=0.1, max_std=1.0, temp=0.1, outscale=1.0, unimix_ratio=0.01)[source]
Overview:

ActionHead Network for action head of dreamerv3.

Interfaces:

__init__, forward

__init__(inp_dim, size, layers, units, act=<class 'torch.nn.modules.activation.ELU'>, norm=<class 'torch.nn.modules.normalization.LayerNorm'>, dist='trunc_normal', init_std=0.0, min_std=0.1, max_std=1.0, temp=0.1, outscale=1.0, unimix_ratio=0.01)[source]
Overview:

Initialize the ActionHead class.

Arguments:
  • inp_dim (int): Input dimension.

  • size (int): Output size.

  • layers (int): Number of layers.

  • units (int): Number of units.

  • act (str): Activation function.

  • norm (str): Normalization function.

  • dist (str): Distribution function.

  • init_std (float): Initial standard deviation.

  • min_std (float): Minimum standard deviation.

  • max_std (float): Maximum standard deviation.

  • temp (float): Temperature.

  • outscale (float): Output scale.

  • unimix_ratio (float): Unimix ratio.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(features)[source]
Overview:

compute the forward of ActionHead.

Arguments:
  • features (torch.Tensor): Input tensor.

training: bool

SampleDist

class ding.torch_utils.network.dreamer.SampleDist(dist, samples=100)[source]
Overview:

A kind of sample Dist for ActionHead of dreamerv3.

Interfaces:

__init__, mean, mode, entropy

__init__(dist, samples=100)[source]
Overview:

Initialize the SampleDist class.

Arguments:
  • dist (torch.Tensor): Distribution.

  • samples (int): Number of samples.

entropy()[source]
Overview:

Calculate the entropy of the distribution.

mean()[source]
Overview:

Calculate the mean of the distribution.

mode()[source]
Overview:

Calculate the mode of the distribution.

OneHotDist

class ding.torch_utils.network.dreamer.OneHotDist(logits=None, probs=None, unimix_ratio=0.0)[source]
Overview:

A kind of onehot Dist for dreamerv3.

Interfaces:

__init__, mode, sample

__init__(logits=None, probs=None, unimix_ratio=0.0)[source]
Overview:

Initialize the OneHotDist class.

Arguments:
  • logits (torch.Tensor): Logits.

  • probs (torch.Tensor): Probabilities.

  • unimix_ratio (float): Unimix ratio.

mode()[source]
Overview:

Calculate the mode of the distribution.

sample(sample_shape=(), seed=None)[source]
Overview:

Sample from the distribution.

Arguments:
  • sample_shape (tuple): Sample shape.

  • seed (int): Seed.

TwoHotDistSymlog

class ding.torch_utils.network.dreamer.TwoHotDistSymlog(logits=None, low=-20.0, high=20.0, device='cpu')[source]
Overview:

A kind of twohotsymlog Dist for dreamerv3.

Interfaces:

__init__, mode, mean, log_prob, log_prob_target

__init__(logits=None, low=-20.0, high=20.0, device='cpu')[source]
Overview:

Initialize the TwoHotDistSymlog class.

Arguments:
  • logits (torch.Tensor): Logits.

  • low (float): Low.

  • high (float): High.

  • device (str): Device.

log_prob(x)[source]
Overview:

Calculate the log probability of the distribution.

Arguments:
  • x (torch.Tensor): Input tensor.

log_prob_target(target)[source]
Overview:

Calculate the log probability of the target.

Arguments:
  • target (torch.Tensor): Target tensor.

mean()[source]
Overview:

Calculate the mean of the distribution.

mode()[source]
Overview:

Calculate the mode of the distribution.

SymlogDist

class ding.torch_utils.network.dreamer.SymlogDist(mode, dist='mse', aggregation='sum', tol=1e-08, dim_to_reduce=[-1, -2, -3])[source]
Overview:

A kind of Symlog Dist for dreamerv3.

Interfaces:

__init__, entropy, mode, mean, log_prob

__init__(mode, dist='mse', aggregation='sum', tol=1e-08, dim_to_reduce=[-1, -2, -3])[source]
Overview:

Initialize the SymlogDist class.

Arguments:
  • mode (torch.Tensor): Mode.

  • dist (str): Distribution function.

  • aggregation (str): Aggregation function.

  • tol (float): Tolerance.

  • dim_to_reduce (list): Dimension to reduce.

log_prob(value)[source]
Overview:

Calculate the log probability of the distribution.

Arguments:
  • value (torch.Tensor): Input tensor.

mean()[source]
Overview:

Calculate the mean of the distribution.

mode()[source]
Overview:

Calculate the mode of the distribution.

ContDist

class ding.torch_utils.network.dreamer.ContDist(dist=None)[source]
Overview:

A kind of ordinary Dist for dreamerv3.

Interfaces:

__init__, entropy, mode, sample, log_prob

__init__(dist=None)[source]
Overview:

Initialize the ContDist class.

Arguments:
  • dist (torch.Tensor): Distribution.

entropy()[source]
Overview:

Calculate the entropy of the distribution.

log_prob(x)[source]
mode()[source]
Overview:

Calculate the mode of the distribution.

sample(sample_shape=())[source]
Overview:

Sample from the distribution.

Arguments:
  • sample_shape (tuple): Sample shape.

Bernoulli

class ding.torch_utils.network.dreamer.Bernoulli(dist=None)[source]
Overview:

A kind of Bernoulli Dist for dreamerv3.

Interfaces:

__init__, entropy, mode, sample, log_prob

__init__(dist=None)[source]
Overview:

Initialize the Bernoulli distribution.

Arguments:
  • dist (torch.Tensor): Distribution.

entropy()[source]
Overview:

Calculate the entropy of the distribution.

log_prob(x)[source]
Overview:

Calculate the log probability of the distribution.

Arguments:
  • x (torch.Tensor): Input tensor.

mode()[source]
Overview:

Calculate the mode of the distribution.

sample(sample_shape=())[source]
Overview:

Sample from the distribution.

Arguments:
  • sample_shape (tuple): Sample shape.

network.gtrxl

Please refer to ding/torch_utils/network/gtrxl for more details.

PositionalEmbedding

class ding.torch_utils.network.gtrxl.PositionalEmbedding(embedding_dim: int)[source]
Overview:

The PositionalEmbedding module implements the positional embedding used in the vanilla Transformer model.

Interfaces:

__init__, forward

Note

This implementation is adapted from https://github.com/kimiyoung/transformer-xl/blob/ master/pytorch/mem_transformer.py

__init__(embedding_dim: int)[source]
Overview:

Initialize the PositionalEmbedding module.

Arguments:
  • embedding_dim: (int): The dimensionality of the embeddings.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(pos_seq: Tensor) Tensor[source]
Overview:

Compute positional embedding given a sequence of positions.

Arguments:
  • pos_seq (torch.Tensor): The positional sequence, typically a 1D tensor of integers in the form of [seq_len-1, seq_len-2, …, 1, 0],

Returns:
  • pos_embedding (torch.Tensor): The computed positional embeddings. The shape of the tensor is (seq_len, 1, embedding_dim).

training: bool

GRUGatingUnit

class ding.torch_utils.network.gtrxl.GRUGatingUnit(input_dim: int, bg: float = 2.0)[source]
Overview:

The GRUGatingUnit module implements the GRU gating mechanism used in the GTrXL model.

Interfaces:

__init__, forward

__init__(input_dim: int, bg: float = 2.0)[source]
Overview:

Initialize the GRUGatingUnit module.

Arguments:
  • input_dim (int): The dimensionality of the input.

  • bg (bg): The gate bias. By setting bg > 0 we can explicitly initialize the gating mechanism to be close to the identity map. This can greatly improve the learning speed and stability since it initializes the agent close to a Markovian policy (ignore attention at the beginning).

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, y: Tensor)[source]
Overview:

Compute the output value using the GRU gating mechanism.

Arguments:
  • x: (torch.Tensor): The first input tensor.

  • y: (torch.Tensor): The second input tensor. x and y should have the same shape and their last dimension should match the input_dim.

Returns:
  • g: (torch.Tensor): The output of the GRU gating mechanism. The shape of g matches the shapes of x and y.

training: bool

Memory

class ding.torch_utils.network.gtrxl.Memory(memory_len: int = 20, batch_size: int = 64, embedding_dim: int = 256, layer_num: int = 3, memory: Tensor | None = None)[source]
Overview:

A class that stores the context used to add memory to Transformer.

Interfaces:

__init__, init, update, get, to

Note

For details, refer to Transformer-XL: https://arxiv.org/abs/1901.02860

__init__(memory_len: int = 20, batch_size: int = 64, embedding_dim: int = 256, layer_num: int = 3, memory: Tensor | None = None) None[source]
Overview:

Initialize the Memory module.

Arguments:
  • memory_len (int): The dimension of memory, i.e., how many past observations to use as memory.

  • batch_size (int): The dimension of each batch.

  • embedding_dim (int): The dimension of embedding, which is the dimension of a single observation after embedding.

  • layer_num (int): The number of transformer layers.

  • memory (Optional[torch.Tensor]): The initial memory. Default is None.

get()[source]
Overview:

Get the current memory.

Returns:
  • memory: (Optional[torch.Tensor]): The current memory, with shape (layer_num, memory_len, bs, embedding_dim).

init(memory: Tensor | None = None)[source]
Overview:

Initialize memory with an input list of tensors or create it automatically given its dimensions.

Arguments:
  • memory (Optional[torch.Tensor]): Input memory tensor with shape (layer_num, memory_len, bs, embedding_dim). Its shape is (layer_num, memory_len, bs, embedding_dim), where memory_len is length of memory, bs is batch size and embedding_dim is the dimension of embedding.

to(device: str = 'cpu')[source]
Overview:

Move the current memory to the specified device.

Arguments:

device (str): The device to move the memory to. Default is ‘cpu’.

update(hidden_state: List[Tensor])[source]
Overview:

Update the memory given a sequence of hidden states. Example for single layer: (memory_len=3, hidden_size_len=2, bs=3)

m00 m01 m02 h00 h01 h02 m20 m21 m22

m = m10 m11 m12 h = h10 h11 h12 => new_m = h00 h01 h02

m20 m21 m22 h10 h11 h12

Arguments:
  • hidden_state: (List[torch.Tensor]): The hidden states to update the memory. Each tensor in the list has shape (cur_seq, bs, embedding_dim), where cur_seq is the length of the sequence.

Returns:
  • memory: (Optional[torch.Tensor]): The updated memory, with shape (layer_num, memory_len, bs, embedding_dim).

AttentionXL

class ding.torch_utils.network.gtrxl.AttentionXL(input_dim: int, head_dim: int, head_num: int, dropout: Module)[source]
Overview:

An implementation of the Attention mechanism used in the TransformerXL model.

Interfaces:

__init__, forward

__init__(input_dim: int, head_dim: int, head_num: int, dropout: Module) None[source]
Overview:

Initialize the AttentionXL module.

Arguments:
  • input_dim (int): The dimensionality of the input features.

  • head_dim (int): The dimensionality of each attention head.

  • head_num (int): The number of attention heads.

  • dropout (nn.Module): The dropout layer to use

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_rel_shift(x: Tensor, zero_upper: bool = False) Tensor[source]
Overview:

Perform a relative shift operation on the attention score matrix. Example:

a00 a01 a02 0 a00 a01 a02 0 a00 a01 a02 0 a10 a02 0 0 a10 a11 a12 => 0 a10 a11 a12 => a02 0 a10 => a11 a12 0 => a11 a12 0 a20 a21 a22 0 a20 a21 a22 a11 a12 0 a20 a21 a22 a20 a21 a22

a20 a21 a22

  1. Append one “column” of zeros to the left

  2. Reshape the matrix from [3 x 4] into [4 x 3]

  3. Remove the first “row”

  4. Mask out the upper triangle (optional)

Note

See the following material for better understanding: https://github.com/kimiyoung/transformer-xl/issues/8 https://arxiv.org/pdf/1901.02860.pdf (Appendix B)

Arguments:
  • x (torch.Tensor): The input tensor with shape (cur_seq, full_seq, bs, head_num).

  • zero_upper (bool): If True, the upper-right triangle of the matrix is set to zero.

Returns:
  • x (torch.Tensor): The input tensor after the relative shift operation, with shape (cur_seq, full_seq, bs, head_num).

_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(inputs: Tensor, pos_embedding: Tensor, full_input: Tensor, u: Parameter, v: Parameter, mask: Tensor | None = None) Tensor[source]
Overview:

Compute the forward pass for the AttentionXL module.

Arguments:
  • inputs (torch.Tensor): The attention input with shape (cur_seq, bs, input_dim).

  • pos_embedding (torch.Tensor): The positional embedding with shape (full_seq, 1, full_seq).

  • full_input (torch.Tensor): The concatenated memory and input tensor with shape (full_seq, bs, input_dim).

  • u (torch.nn.Parameter): The content parameter with shape (head_num, head_dim).

  • v (torch.nn.Parameter): The position parameter with shape (head_num, head_dim).

  • mask (Optional[torch.Tensor]): The attention mask with shape (cur_seq, full_seq, 1). If None, no masking is applied.

Returns:
  • output (torch.Tensor): The output of the attention mechanism with shape (cur_seq, bs, input_dim).

training: bool

GatedTransformerXLLayer

class ding.torch_utils.network.gtrxl.GatedTransformerXLLayer(input_dim: int, head_dim: int, hidden_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module, gru_gating: bool = True, gru_bias: float = 2.0)[source]
Overview:

This class implements the attention layer of GTrXL (Gated Transformer-XL).

Interfaces:

__init__, forward

__init__(input_dim: int, head_dim: int, hidden_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module, gru_gating: bool = True, gru_bias: float = 2.0) None[source]
Overview:

Initialize GatedTransformerXLLayer.

Arguments:
  • input_dim (int): The dimension of the input tensor.

  • head_dim (int): The dimension of each head in the multi-head attention.

  • hidden_dim (int): The dimension of the hidden layer in the MLP.

  • head_num (int): The number of heads for the multi-head attention.

  • mlp_num (int): The number of MLP layers in the attention layer.

  • dropout (nn.Module): The dropout module used in the MLP and attention layers.

  • activation (nn.Module): The activation function to be used in the MLP layers.

  • gru_gating (bool, optional): Whether to use GRU gates. If False, replace GRU gates with residual connections. Default is True.

  • gru_bias (float, optional): The bias of the GRU gate. Default is 2.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(inputs: Tensor, pos_embedding: Tensor, u: Parameter, v: Parameter, memory: Tensor, mask: Tensor | None = None) Tensor[source]
Overview:

Compute forward pass of GTrXL layer.

Arguments:
  • inputs (torch.Tensor): The attention input tensor of shape (cur_seq, bs, input_dim).

  • pos_embedding (torch.Tensor): The positional embedding tensor of shape (full_seq, 1, full_seq).

  • u (torch.nn.Parameter): The content parameter tensor of shape (head_num, head_dim).

  • v (torch.nn.Parameter): The position parameter tensor of shape (head_num, head_dim).

  • memory (torch.Tensor): The memory tensor of shape (prev_seq, bs, input_dim).

  • mask (Optional[torch.Tensor]): The attention mask tensor of shape (cur_seq, full_seq, 1).

    Default is None.

Returns:
  • output (torch.Tensor): layer output of shape (cur_seq, bs, input_dim)

training: bool

GTrXL

class ding.torch_utils.network.gtrxl.GTrXL(input_dim: int, head_dim: int = 128, embedding_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, memory_len: int = 64, dropout_ratio: float = 0.0, activation: Module = ReLU(), gru_gating: bool = True, gru_bias: float = 2.0, use_embedding_layer: bool = True)[source]
Overview:

GTrXL Transformer implementation as described in “Stabilizing Transformer for Reinforcement Learning” (https://arxiv.org/abs/1910.06764).

Interfaces:

__init__, forward, reset_memory, get_memory

__init__(input_dim: int, head_dim: int = 128, embedding_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, memory_len: int = 64, dropout_ratio: float = 0.0, activation: Module = ReLU(), gru_gating: bool = True, gru_bias: float = 2.0, use_embedding_layer: bool = True) None[source]
Overview:

Init GTrXL Model.

Arguments:
  • input_dim (int): The dimension of the input observation.

  • head_dim (int, optional): The dimension of each head. Default is 128.

  • embedding_dim (int, optional): The dimension of the embedding. Default is 256.

  • head_num (int, optional): The number of heads for multi-head attention. Default is 2.

  • mlp_num (int, optional): The number of MLP layers in the attention layer. Default is 2.

  • layer_num (int, optional): The number of transformer layers. Default is 3.

  • memory_len (int, optional): The length of memory. Default is 64.

  • dropout_ratio (float, optional): The dropout ratio. Default is 0.

  • activation (nn.Module, optional): The activation function. Default is nn.ReLU().

  • gru_gating (bool, optional): If False, replace GRU gates with residual connections. Default is True.

  • gru_bias (float, optional): The GRU gate bias. Default is 2.0.

  • use_embedding_layer (bool, optional): If False, don’t use input embedding layer. Default is True.

Raises:
  • AssertionError: If embedding_dim is not an even number.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, batch_first: bool = False, return_mem: bool = True) Dict[str, Tensor][source]
Overview:

Performs a forward pass on the GTrXL.

Arguments:
  • x (torch.Tensor): The input tensor with shape (seq_len, bs, input_size).

  • batch_first (bool, optional): If the input data has shape (bs, seq_len, input_size), set this parameter to True to transpose along the first and second dimension and obtain shape (seq_len, bs, input_size). This does not affect the output memory. Default is False. - return_mem (bool, optional): If False, return only the output tensor without dict. Default is True.

Returns:
  • x (Dict[str, torch.Tensor]): A dictionary containing the transformer output of shape (seq_len, bs, embedding_size) and memory of shape (layer_num, seq_len, bs, embedding_size).

get_memory()[source]
Overview:

Returns the memory of GTrXL.

Returns:
  • memory (Optional[torch.Tensor]): The output memory or None if memory has not been initialized. The shape is (layer_num, memory_len, bs, embedding_dim).

reset_memory(batch_size: int | None = None, state: Tensor | None = None)[source]
Overview:

Clear or set the memory of GTrXL.

Arguments:
  • batch_size (Optional[int]): The batch size. Default is None.

  • state (Optional[torch.Tensor]): The input memory with shape (layer_num, memory_len, bs, embedding_dim). Default is None.

training: bool

network.gumbel_softmax

Please refer to ding/torch_utils/network/gumbel_softmax for more details.

GumbelSoftmax

class ding.torch_utils.network.gumbel_softmax.GumbelSoftmax[source]
Overview:

An nn.Module that computes GumbelSoftmax.

Interfaces:

__init__, forward, gumbel_softmax_sample

Note

For more information on GumbelSoftmax, refer to the paper [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144).

__init__() None[source]
Overview:

Initialize the GumbelSoftmax module.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, temperature: float = 1.0, hard: bool = False) Tensor[source]
Overview:

Forward pass for the GumbelSoftmax module.

Arguments:
  • x (torch.Tensor): Unnormalized log-probabilities.

  • temperature (float): Non-negative scalar controlling the sharpness of the distribution.

  • hard (bool): If True, returns one-hot encoded labels. Default is False.

Returns:
  • output (torch.Tensor): Sample from Gumbel-Softmax distribution.

Shapes:
  • x: its shape is \((B, N)\), where B is the batch size and N is the number of classes.

  • y: its shape is \((B, N)\), where B is the batch size and N is the number of classes.

gumbel_softmax_sample(x: Tensor, temperature: float, eps: float = 1e-08) Tensor[source]
Overview:

Draw a sample from the Gumbel-Softmax distribution.

Arguments:
  • x (torch.Tensor): Input tensor.

  • temperature (float): Non-negative scalar controlling the sharpness of the distribution.

  • eps (float): Small number to prevent division by zero, default is 1e-8.

Returns:
  • output (torch.Tensor): Sample from Gumbel-Softmax distribution.

training: bool

network.merge

Please refer to ding/torch_utils/network/merge for more details.

BilinearGeneral

class ding.torch_utils.network.merge.BilinearGeneral(in1_features: int, in2_features: int, out_features: int)[source]
Overview:

Bilinear implementation as in: Multiplicative Interactions and Where to Find Them, ICLR 2020, https://openreview.net/forum?id=rylnK6VtDH.

Interfaces:

__init__, forward

__init__(in1_features: int, in2_features: int, out_features: int)[source]
Overview:

Initialize the Bilinear layer.

Arguments:
  • in1_features (int): The size of each first input sample.

  • in2_features (int): The size of each second input sample.

  • out_features (int): The size of each output sample.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, z: Tensor)[source]
Overview:

compute the bilinear function.

Arguments:
  • x (torch.Tensor): The first input tensor.

  • z (torch.Tensor): The second input tensor.

reset_parameters()[source]
Overview:

Initialize the parameters of the Bilinear layer.

training: bool

TorchBilinearCustomized

class ding.torch_utils.network.merge.TorchBilinearCustomized(in1_features: int, in2_features: int, out_features: int)[source]
Overview:

Customized Torch Bilinear implementation.

Interfaces:

__init__, forward

__init__(in1_features: int, in2_features: int, out_features: int)[source]
Overview:

Initialize the Bilinear layer.

Arguments:
  • in1_features (int): The size of each first input sample.

  • in2_features (int): The size of each second input sample.

  • out_features (int): The size of each output sample.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x, z)[source]
Overview:

Compute the bilinear function.

Arguments:
  • x (torch.Tensor): The first input tensor.

  • z (torch.Tensor): The second input tensor.

reset_parameters()[source]
Overview:

Initialize the parameters of the Bilinear layer.

training: bool

FiLM

class ding.torch_utils.network.merge.FiLM(feature_dim: int, context_dim: int)[source]
Overview:

Feature-wise Linear Modulation (FiLM) Layer. This layer applies feature-wise affine transformation based on context.

Interfaces:

__init__, forward

__init__(feature_dim: int, context_dim: int)[source]
Overview:

Initialize the FiLM layer.

Arguments:
  • feature_dim (int). The dimension of the input feature vector.

  • context_dim (int). The dimension of the input context vector.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(feature: Tensor, context: Tensor)[source]
Overview:

Forward propagation.

Arguments:
  • feature (torch.Tensor). The input feature, shape (batch_size, feature_dim).

  • context (torch.Tensor). The input context, shape (batch_size, context_dim).

Returns:
  • conditioned_feature : torch.Tensor. The output feature after FiLM, shape (batch_size, feature_dim).

training: bool

GatingType

class ding.torch_utils.network.merge.GatingType(value)[source]
Overview:

Enum class defining different types of tensor gating and aggregation in modules.

GLOBAL = 'global'
NONE = 'none'
POINTWISE = 'pointwise'

SumMerge

class ding.torch_utils.network.merge.SumMerge(*args, **kwargs)[source]
Overview:

A PyTorch module that merges a list of tensors by computing their sum. All input tensors must have the same size. This module can work with any type of tensor (vector, units or visual).

Interfaces:

__init__, forward

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(tensors: List[Tensor]) Tensor[source]
Overview:

Forward pass of the SumMerge module, which sums the input tensors.

Arguments:
  • tensors (List[Tensor]): List of input tensors to be summed. All tensors must have the same size.

Returns:
  • summed (Tensor): Tensor resulting from the sum of all input tensors.

training: bool

VectorMerge

class ding.torch_utils.network.merge.VectorMerge(input_sizes: Dict[str, int], output_size: int, gating_type: GatingType = GatingType.NONE, use_layer_norm: bool = True)[source]
Overview:

Merges multiple vector streams. Streams are first transformed through layer normalization, relu, and linear layers, then summed. They don’t need to have the same size. Gating can also be used before the sum.

Interfaces:

__init__, encode, _compute_gate, forward

Note

For more details about the gating types, please refer to the GatingType enum class.

__init__(input_sizes: Dict[str, int], output_size: int, gating_type: GatingType = GatingType.NONE, use_layer_norm: bool = True)[source]
Overview:

Initialize the VectorMerge module.

Arguments:
  • input_sizes (Dict[str, int]): A dictionary mapping input names to their sizes. The size is a single integer for 1D inputs, or None for 0D inputs. If an input size is None, we assume it’s ().

  • output_size (int): The size of the output vector.

  • gating_type (GatingType): The type of gating mechanism to use. Default is GatingType.NONE.

  • use_layer_norm (bool): Whether to use layer normalization. Default is True.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_compute_gate(init_gate: List[Tensor]) List[Tensor][source]
Overview:

Compute the gate values based on the initial gate values.

Arguments:
  • init_gate (List[Tensor]): The initial gate values.

Returns:
  • gate (List[Tensor]): The computed gate values.

_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
encode(inputs: Dict[str, Tensor]) Tuple[List[Tensor], List[Tensor]][source]
Overview:

Encode the input tensors using layer normalization, relu, and linear transformations.

Arguments:
  • inputs (Dict[str, Tensor]): The input tensors.

Returns:
  • gates (List[Tensor]): The gate tensors after transformations.

  • outputs (List[Tensor]): The output tensors after transformations.

forward(inputs: Dict[str, Tensor]) Tensor[source]
Overview:

Forward pass through the VectorMerge module.

Arguments:
  • inputs (Dict[str, Tensor]): The input tensors.

Returns:
  • output (Tensor): The output tensor after passing through the module.

training: bool

network.nn_module

Please refer to ding/torch_utils/network/nn_module for more details.

weight_init

ding.torch_utils.network.nn_module.weight_init_(weight: Tensor, init_type: str = 'xavier', activation: str | None = None) None[source]
Overview:

Initialize weight according to the specified type.

Arguments:
  • weight (torch.Tensor): The weight that needs to be initialized.

  • init_type (str, optional): The type of initialization to implement, supports [“xavier”, “kaiming”, “orthogonal”].

  • activation (str, optional): The activation function name. Recommended to use only with [‘relu’, ‘leaky_relu’].

sequential_pack

ding.torch_utils.network.nn_module.sequential_pack(layers: List[Module]) Sequential[source]
Overview:

Pack the layers in the input list to a nn.Sequential module. If there is a convolutional layer in module, an extra attribute out_channels will be added to the module and set to the out_channel of the conv layer.

Arguments:
  • layers (List[nn.Module]): The input list of layers.

Returns:
  • seq (nn.Sequential): Packed sequential container.

conv1d_block

ding.torch_utils.network.nn_module.conv1d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, activation: Module | None = None, norm_type: str | None = None) Sequential[source]
Overview:

Create a 1-dimensional convolution layer with activation and normalization.

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • out_channels (int): Number of channels in the output tensor.

  • kernel_size (int): Size of the convolving kernel.

  • stride (int, optional): Stride of the convolution. Default is 1.

  • padding (int, optional): Zero-padding added to both sides of the input. Default is 0.

  • dilation (int, optional): Spacing between kernel elements. Default is 1.

  • groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.

  • activation (nn.Module, optional): The optional activation function.

  • norm_type (str, optional): Type of the normalization.

Returns:
  • block (nn.Sequential): A sequential list containing the torch layers of the 1-dimensional convolution layer.

conv2d_block

ding.torch_utils.network.nn_module.conv2d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, pad_type: str = 'zero', activation: Module | None = None, norm_type: str | None = None, num_groups_for_gn: int = 1, bias: bool = True) Sequential[source]
Overview:

Create a 2-dimensional convolution layer with activation and normalization.

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • out_channels (int): Number of channels in the output tensor.

  • kernel_size (int): Size of the convolving kernel.

  • stride (int, optional): Stride of the convolution. Default is 1.

  • padding (int, optional): Zero-padding added to both sides of the input. Default is 0.

  • dilation (int): Spacing between kernel elements.

  • groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.

  • pad_type (str, optional): The way to add padding, include [‘zero’, ‘reflect’, ‘replicate’]. Default is ‘zero’.

  • activation (nn.Module): the optional activation function.

  • norm_type (str): The type of the normalization, now support [‘BN’, ‘LN’, ‘IN’, ‘GN’, ‘SyncBN’], default set to None, which means no normalization.

  • num_groups_for_gn (int): Number of groups for GroupNorm.

  • bias (bool): whether to add a learnable bias to the nn.Conv2d. Default is True.

Returns:
  • block (nn.Sequential): A sequential list containing the torch layers of the 2-dimensional convolution layer.

deconv2d_block

ding.torch_utils.network.nn_module.deconv2d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, output_padding: int = 0, groups: int = 1, activation: int | None = None, norm_type: int | None = None) Sequential[source]
Overview:

Create a 2-dimensional transpose convolution layer with activation and normalization.

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • out_channels (int): Number of channels in the output tensor.

  • kernel_size (int): Size of the convolving kernel.

  • stride (int, optional): Stride of the convolution. Default is 1.

  • padding (int, optional): Zero-padding added to both sides of the input. Default is 0.

  • output_padding (int, optional): Additional size added to one side of the output shape. Default is 0.

  • groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.

  • activation (int, optional): The optional activation function.

  • norm_type (int, optional): Type of the normalization.

Returns:
  • block (nn.Sequential): A sequential list containing the torch layers of the 2-dimensional transpose convolution layer.

fc_block

ding.torch_utils.network.nn_module.fc_block(in_channels: int, out_channels: int, activation: Module | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5) Sequential[source]
Overview:

Create a fully-connected block with activation, normalization, and dropout. Optional normalization can be done to the dim 1 (across the channels). x -> fc -> norm -> act -> dropout -> out

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • out_channels (int): Number of channels in the output tensor.

  • activation (nn.Module, optional): The optional activation function.

  • norm_type (str, optional): Type of the normalization.

  • use_dropout (bool, optional): Whether to use dropout in the fully-connected block. Default is False.

  • dropout_probability (float, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.

Returns:
  • block (nn.Sequential): A sequential list containing the torch layers of the fully-connected block.

normed_linear

ding.torch_utils.network.nn_module.normed_linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None, scale: float = 1.0) Linear[source]
Overview:

Create a nn.Linear module but with normalized fan-in init.

Arguments:
  • in_features (int): Number of features in the input tensor.

  • out_features (int): Number of features in the output tensor.

  • bias (bool, optional): Whether to add a learnable bias to the nn.Linear. Default is True.

  • device (torch.device, optional): The device to put the created module on. Default is None.

  • dtype (torch.dtype, optional): The desired data type of created module. Default is None.

  • scale (float, optional): The scale factor for initialization. Default is 1.0.

Returns:
  • out (nn.Linear): A nn.Linear module with normalized fan-in init.

normed_conv2d

ding.torch_utils.network.nn_module.normed_conv2d(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] = 1, padding: int | Tuple[int, int] = 0, dilation: int | Tuple[int, int] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None, scale: float = 1) Conv2d[source]
Overview:

Create a nn.Conv2d module but with normalized fan-in init.

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • out_channels (int): Number of channels in the output tensor.

  • kernel_size (Union[int, Tuple[int, int]]): Size of the convolving kernel.

  • stride (Union[int, Tuple[int, int]], optional): Stride of the convolution. Default is 1.

  • padding (Union[int, Tuple[int, int]], optional): Zero-padding added to both sides of the input. Default is 0.

  • dilation (:Union[int, Tuple[int, int]], optional): Spacing between kernel elements. Default is 1.

  • groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.

  • bias (bool, optional): Whether to add a learnable bias to the nn.Conv2d. Default is True.

  • padding_mode (str, optional): The type of padding algorithm to use. Default is ‘zeros’.

  • device (torch.device, optional): The device to put the created module on. Default is None.

  • dtype (torch.dtype, optional): The desired data type of created module. Default is None.

  • scale (float, optional): The scale factor for initialization. Default is 1.

Returns:
  • out (nn.Conv2d): A nn.Conv2d module with normalized fan-in init.

MLP

ding.torch_utils.network.nn_module.MLP(in_channels: int, hidden_channels: int, out_channels: int, layer_num: int, layer_fn: Callable | None = None, activation: Module | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5, output_activation: bool = True, output_norm: bool = True, last_linear_layer_init_zero: bool = False)[source]
Overview:

Create a multi-layer perceptron using fully-connected blocks with activation, normalization, and dropout, optional normalization can be done to the dim 1 (across the channels). x -> fc -> norm -> act -> dropout -> out

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • hidden_channels (int): Number of channels in the hidden tensor.

  • out_channels (int): Number of channels in the output tensor.

  • layer_num (int): Number of layers.

  • layer_fn (Callable, optional): Layer function.

  • activation (nn.Module, optional): The optional activation function.

  • norm_type (str, optional): The type of the normalization.

  • use_dropout (bool, optional): Whether to use dropout in the fully-connected block. Default is False.

  • dropout_probability (float, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.

  • output_activation (bool, optional): Whether to use activation in the output layer. If True, we use the same activation as front layers. Default is True.

  • output_norm (bool, optional): Whether to use normalization in the output layer. If True, we use the same normalization as front layers. Default is True.

  • last_linear_layer_init_zero (bool, optional): Whether to use zero initializations for the last linear layer (including w and b), which can provide stable zero outputs in the beginning, usually used in the policy network in RL settings.

Returns:
  • block (nn.Sequential): A sequential list containing the torch layers of the multi-layer perceptron.

ChannelShuffle

class ding.torch_utils.network.nn_module.ChannelShuffle(group_num: int)[source]
Overview:

Apply channel shuffle to the input tensor. For more details about the channel shuffle, please refer to the ‘ShuffleNet’ paper: https://arxiv.org/abs/1707.01083

Interfaces:

__init__, forward

__init__(group_num: int) None[source]
Overview:

Initialize the ChannelShuffle class.

Arguments:
  • group_num (int): The number of groups to exchange.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Forward pass through the ChannelShuffle module.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • x (torch.Tensor): The shuffled input tensor.

training: bool

one_hot

ding.torch_utils.network.nn_module.one_hot(val: LongTensor, num: int, num_first: bool = False) FloatTensor[source]
Overview:

Convert a torch.LongTensor to one-hot encoding. This implementation can be slightly faster than torch.nn.functional.one_hot.

Arguments:
  • val (torch.LongTensor): Each element contains the state to be encoded, the range should be [0, num-1]

  • num (int): Number of states of the one-hot encoding

  • num_first (bool, optional): If False, the one-hot encoding is added as the last dimension; otherwise, it is added as the first dimension. Default is False.

Returns:
  • one_hot (torch.FloatTensor): The one-hot encoded tensor.

Example:
>>> one_hot(2*torch.ones([2,2]).long(),3)
tensor([[[0., 0., 1.],
         [0., 0., 1.]],
        [[0., 0., 1.],
         [0., 0., 1.]]])
>>> one_hot(2*torch.ones([2,2]).long(),3,num_first=True)
tensor([[[0., 0.], [1., 0.]],
        [[0., 1.], [0., 0.]],
        [[1., 0.], [0., 1.]]])

NearestUpsample

class ding.torch_utils.network.nn_module.NearestUpsample(scale_factor: float | List[float])[source]
Overview:

This module upsamples the input to the given scale_factor using the nearest mode.

Interfaces:

__init__, forward

__init__(scale_factor: float | List[float]) None[source]
Overview:

Initialize the NearestUpsample class.

Arguments:
  • scale_factor (Union[float, List[float]]): The multiplier for the spatial size.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Return the upsampled input tensor.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • upsample(torch.Tensor): The upsampled input tensor.

training: bool

BilinearUpsample

class ding.torch_utils.network.nn_module.BilinearUpsample(scale_factor: float | List[float])[source]
Overview:

This module upsamples the input to the given scale_factor using the bilinear mode.

Interfaces:

__init__, forward

__init__(scale_factor: float | List[float]) None[source]
Overview:

Initialize the BilinearUpsample class.

Arguments:
  • scale_factor (Union[float, List[float]]): The multiplier for the spatial size.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Return the upsampled input tensor.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • upsample(torch.Tensor): The upsampled input tensor.

training: bool

binary_encode

ding.torch_utils.network.nn_module.binary_encode(y: Tensor, max_val: Tensor) Tensor[source]
Overview:

Convert elements in a tensor to its binary representation.

Arguments:
  • y (torch.Tensor): The tensor to be converted into its binary representation.

  • max_val (torch.Tensor): The maximum value of the elements in the tensor.

Returns:
  • binary (torch.Tensor): The input tensor in its binary representation.

Example:
>>> binary_encode(torch.tensor([3,2]),torch.tensor(8))
tensor([[0, 0, 1, 1],[0, 0, 1, 0]])

NoiseLinearLayer

class ding.torch_utils.network.nn_module.NoiseLinearLayer(in_channels: int, out_channels: int, sigma0: int = 0.4)[source]
Overview:

This is a linear layer with random noise.

Interfaces:

__init__, reset_noise, reset_parameters, forward

__init__(in_channels: int, out_channels: int, sigma0: int = 0.4) None[source]
Overview:

Initialize the NoiseLinearLayer class.

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • out_channels (int): Number of channels in the output tensor.

  • sigma0 (int, optional): Default noise volume when initializing NoiseLinearLayer. Default is 0.4.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_scale_noise(size: int | Tuple)[source]
Overview:

Scale the noise.

Arguments:
  • size (Union[int, Tuple]): The size of the noise.

_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor)[source]
Overview:

Perform the forward pass with noise.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • output (torch.Tensor): The output tensor with noise.

reset_noise()[source]
Overview:

Reset the noise settings in the layer.

reset_parameters()[source]
Overview:

Reset the parameters in the layer.

training: bool

noise_block

ding.torch_utils.network.nn_module.noise_block(in_channels: int, out_channels: int, activation: str | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5, sigma0: float = 0.4)[source]
Overview:

Create a fully-connected noise layer with activation, normalization, and dropout. Optional normalization can be done to the dim 1 (across the channels).

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • out_channels (int): Number of channels in the output tensor.

  • activation (str, optional): The optional activation function. Default is None.

  • norm_type (str, optional): Type of normalization. Default is None.

  • use_dropout (bool, optional): Whether to use dropout in the fully-connected block.

  • dropout_probability (float, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.

  • sigma0 (float, optional): The sigma0 is the default noise volume when initializing NoiseLinearLayer. Default is 0.4.

Returns:
  • block (nn.Sequential): A sequential list containing the torch layers of the fully-connected block.

NaiveFlatten

class ding.torch_utils.network.nn_module.NaiveFlatten(start_dim: int = 1, end_dim: int = -1)[source]
Overview:

This module is a naive implementation of the flatten operation.

Interfaces:

__init__, forward

__init__(start_dim: int = 1, end_dim: int = -1) None[source]
Overview:

Initialize the NaiveFlatten class.

Arguments:
  • start_dim (int, optional): The first dimension to flatten. Default is 1.

  • end_dim (int, optional): The last dimension to flatten. Default is -1.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Perform the flatten operation on the input tensor.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • output (torch.Tensor): The flattened output tensor.

training: bool

network.normalization

Please refer to ding/torch_utils/network/normalization for more details.

build_normalization

ding.torch_utils.network.normalization.build_normalization(norm_type: str, dim: int | None = None) Module[source]
Overview:

Construct the corresponding normalization module. For beginners, refer to [this article](https://zhuanlan.zhihu.com/p/34879333) to learn more about batch normalization.

Arguments:
  • norm_type (str): Type of the normalization. Currently supports [‘BN’, ‘LN’, ‘IN’, ‘SyncBN’].

  • dim (Optional[int]): Dimension of the normalization, applicable when norm_type is in [‘BN’, ‘IN’].

Returns:
  • norm_func (nn.Module): The corresponding batch normalization function.

network.popart

Please refer to ding/torch_utils/network/popart for more details.

PopArt

class ding.torch_utils.network.popart.PopArt(input_features: int | None = None, output_features: int | None = None, beta: float = 0.5)[source]
Overview:

A linear layer with popart normalization. This class implements a linear transformation followed by PopArt normalization, which is a method to automatically adapt the contribution of each task to the agent’s updates in multi-task learning, as described in the paper <https://arxiv.org/abs/1809.04474>.

Interfaces:

__init__, reset_parameters, forward, update_parameters

__init__(input_features: int | None = None, output_features: int | None = None, beta: float = 0.5) None[source]
Overview:

Initialize the class with input features, output features, and the beta parameter.

Arguments:
  • input_features (Union[int, None]): The size of each input sample.

  • output_features (Union[int, None]): The size of each output sample.

  • beta (float): The parameter for moving average.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Dict[str, Tensor][source]
Overview:

Implement the forward computation of the linear layer and return both the output and the normalized output of the layer.

Arguments:
  • x (torch.Tensor): Input tensor which is to be normalized.

Returns:
  • output (Dict[str, torch.Tensor]): A dictionary contains ‘pred’ and ‘unnormalized_pred’.

reset_parameters()[source]
Overview:

Reset the parameters including weights and bias using kaiming_uniform_ and uniform_ initialization.

training: bool
update_parameters(value: Tensor) Dict[str, Tensor][source]
Overview:

Update the normalization parameters based on the given value and return the new mean and standard deviation after the update.

Arguments:
  • value (torch.Tensor): The tensor to be used for updating parameters.

Returns:
  • update_results (Dict[str, torch.Tensor]): A dictionary contains ‘new_mean’ and ‘new_std’.

network.res_block

Please refer to ding/torch_utils/network/res_block for more details.

ResBlock

class ding.torch_utils.network.res_block.ResBlock(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', res_type: str = 'basic', bias: bool = True, out_channels: int | None = None)[source]
Overview:
Residual Block with 2D convolution layers, including 3 types:
basic block:

input channel: C x -> 3*3*C -> norm -> act -> 3*3*C -> norm -> act -> out __________________________________________/+

bottleneck block:

x -> 1*1*(1/4*C) -> norm -> act -> 3*3*(1/4*C) -> norm -> act -> 1*1*C -> norm -> act -> out _____________________________________________________________________________/+

downsample block: used in EfficientZero

input channel: C x -> 3*3*C -> norm -> act -> 3*3*C -> norm -> act -> out __________________ 3*3*C ____________________/+

Note

You can refer to Deep Residual Learning for Image Recognition for more details.

Interfaces:

__init__, forward

__init__(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', res_type: str = 'basic', bias: bool = True, out_channels: int | None = None) None[source]
Overview:

Init the 2D convolution residual block.

Arguments:
  • in_channels (int): Number of channels in the input tensor.

  • activation (nn.Module): The optional activation function.

  • norm_type (str): Type of the normalization, default set to ‘BN’(Batch Normalization), supports [‘BN’, ‘LN’, ‘IN’, ‘GN’, ‘SyncBN’, None].

  • res_type (str): Type of residual block, supports [‘basic’, ‘bottleneck’, ‘downsample’]

  • bias (bool): Whether to add a learnable bias to the conv2d_block. default set to True.

  • out_channels (int): Number of channels in the output tensor, default set to None, which means out_channels = in_channels.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Return the redisual block output.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • x (torch.Tensor): The resblock output tensor.

training: bool

ResFCBlock

class ding.torch_utils.network.res_block.ResFCBlock(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', dropout: float | None = None)[source]
Overview:

Residual Block with 2 fully connected layers. x -> fc1 -> norm -> act -> fc2 -> norm -> act -> out _____________________________________/+

Interfaces:

__init__, forward

__init__(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', dropout: float | None = None)[source]
Overview:

Init the fully connected layer residual block.

Arguments:
  • in_channels (int): The number of channels in the input tensor.

  • activation (nn.Module): The optional activation function.

  • norm_type (str): The type of the normalization, default set to ‘BN’.

  • dropout (float): The dropout rate, default set to None.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Return the output of the redisual block.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • x (torch.Tensor): The resblock output tensor.

training: bool

network.resnet

Please refer to ding/torch_utils/network/resnet for more details.

to_2tuple

ding.torch_utils.network.resnet.to_2tuple(item: int) tuple[source]
Overview:

Convert a scalar to a 2-tuple or return the item if it’s not a scalar.

Arguments:
  • item (int): An item to be converted to a 2-tuple.

Returns:
  • (tuple): A 2-tuple of the item.

get_same_padding

ding.torch_utils.network.resnet.get_same_padding(x: int, k: int, s: int, d: int) int[source]
Overview:

Calculate asymmetric TensorFlow-like ‘SAME’ padding for a convolution.

Arguments:
  • x (int): The size of the input.

  • k (int): The size of the kernel.

  • s (int): The stride of the convolution.

  • d (int): The dilation of the convolution.

Returns:
  • (int): The size of the padding.

pad_same

ding.torch_utils.network.resnet.pad_same(x, k: List[int], s: List[int], d: List[int] = (1, 1), value: float = 0)[source]
Overview:

Dynamically pad input x with ‘SAME’ padding for conv with specified args.

Arguments:
  • x (Tensor): The input tensor.

  • k (List[int]): The size of the kernel.

  • s (List[int]): The stride of the convolution.

  • d (List[int]): The dilation of the convolution.

  • value (float): Value to fill the padding.

Returns:
  • (Tensor): The padded tensor.

avg_pool2d_same

ding.torch_utils.network.resnet.avg_pool2d_same(x, kernel_size: List[int], stride: List[int], padding: List[int] = (0, 0), ceil_mode: bool = False, count_include_pad: bool = True)[source]
Overview:

Apply average pooling with ‘SAME’ padding on the input tensor.

Arguments:
  • x (Tensor): The input tensor.

  • kernel_size (List[int]): The size of the kernel.

  • stride (List[int]): The stride of the convolution.

  • padding (List[int]): The size of the padding.

  • ceil_mode (bool): When True, will use ceil instead of floor to compute the output shape.

  • count_include_pad (bool): When True, will include the zero-padding in the averaging calculation.

Returns:
  • (Tensor): The tensor after average pooling.

AvgPool2dSame

class ding.torch_utils.network.resnet.AvgPool2dSame(kernel_size: int, stride: Tuple[int, int] | None = None, padding: int = 0, ceil_mode: bool = False, count_include_pad: bool = True)[source]
Overview:

Tensorflow-like ‘SAME’ wrapper for 2D average pooling.

Interfaces:

__init__, forward

__init__(kernel_size: int, stride: Tuple[int, int] | None = None, padding: int = 0, ceil_mode: bool = False, count_include_pad: bool = True) None[source]
Overview:

Initialize the AvgPool2dSame with given arguments.

Arguments:
  • kernel_size (int): The size of the window to take an average over.

  • stride (Optional[Tuple[int, int]]): The stride of the window. If None, default to kernel_size.

  • padding (int): Implicit zero padding to be added on both sides.

  • ceil_mode (bool): When True, will use ceil instead of floor to compute the output shape.

  • count_include_pad (bool): When True, will include the zero-padding in the averaging calculation.

ceil_mode: bool
count_include_pad: bool
forward(x: Tensor) Tensor[source]
Overview:

Forward pass of the AvgPool2dSame.

Argument:
  • x (torch.Tensor): Input tensor.

Returns:
  • (torch.Tensor): Output tensor after average pooling.

kernel_size: int | Tuple[int, int]
padding: int | Tuple[int, int]
stride: int | Tuple[int, int]

create_classifier

ding.torch_utils.network.resnet.create_classifier(num_features: int, num_classes: int, pool_type: str = 'avg', use_conv: bool = False) Tuple[Module, Module][source]
Overview:

Create a classifier with global pooling layer and fully connected layer.

Arguments:
  • num_features (int): The number of features.

  • num_classes (int): The number of classes for the final classification.

  • pool_type (str): The type of pooling to use; ‘avg’ for Average Pooling.

  • use_conv (bool): Whether to use convolution or not.

Returns:
  • global_pool (nn.Module): The created global pooling layer.

  • fc (nn.Module): The created fully connected layer.

ClassifierHead

class ding.torch_utils.network.resnet.ClassifierHead(in_chs: int, num_classes: int, pool_type: str = 'avg', drop_rate: float = 0.0, use_conv: bool = False)[source]
Overview:

Classifier head with configurable global pooling and dropout.

Interfaces:

__init__, forward

__init__(in_chs: int, num_classes: int, pool_type: str = 'avg', drop_rate: float = 0.0, use_conv: bool = False) None[source]
Overview:

Initialize the ClassifierHead with given arguments.

Arguments:
  • in_chs (int): Number of input channels.

  • num_classes (int): Number of classes for the final classification.

  • pool_type (str): The type of pooling to use; ‘avg’ for Average Pooling.

  • drop_rate (float): The dropout rate.

  • use_conv (bool): Whether to use convolution or not.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Forward pass of the ClassifierHead.

Argument:
  • x (torch.Tensor): Input tensor.

Returns:
  • (torch.Tensor): Output tensor after classification.

training: bool

create_attn

ding.torch_utils.network.resnet.create_attn(layer: Module, plane: int) None[source]
Overview:

Create an attention mechanism.

Arguments:
  • layer (nn.Module): The layer where the attention is to be applied.

  • plane (int): The plane on which the attention is to be applied.

Returns:
  • None

get_padding

ding.torch_utils.network.resnet.get_padding(kernel_size: int, stride: int, dilation: int = 1) int[source]
Overview:

Compute the padding based on the kernel size, stride and dilation.

Arguments:
  • kernel_size (int): The size of the kernel.

  • stride (int): The stride of the convolution.

  • dilation (int): The dilation factor.

Returns:
  • padding (int): The computed padding.

BasicBlock

class ding.torch_utils.network.resnet.BasicBlock(inplanes: int, planes: int, stride: int = 1, downsample: ~typing.Callable | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Callable = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Callable | None = None, aa_layer: ~typing.Callable | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None)[source]
Overview:

The basic building block for models like ResNet. This class extends pytorch’s Module class. It represents a standard block of layers including two convolutions, batch normalization, an optional attention mechanism, and activation functions.

Interfaces:

__init__, forward, zero_init_last_bn

Properties:
  • expansion (:obj:int): Specifies the expansion factor for the planes of the conv layers.

__init__(inplanes: int, planes: int, stride: int = 1, downsample: ~typing.Callable | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Callable = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Callable | None = None, aa_layer: ~typing.Callable | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None) None[source]
Overview:

Initialize the BasicBlock with given parameters.

Arguments:
  • inplanes (int): Number of input channels.

  • planes (int): Number of output channels.

  • stride (int): The stride of the convolutional layer.

  • downsample (Callable): Function for downsampling the inputs.

  • cardinality (int): Group size for grouped convolution.

  • base_width (int): Base width of the convolutions.

  • reduce_first (int): Reduction factor for first convolution of each block.

  • dilation (int): Spacing between kernel points.

  • first_dilation (int): First dilation value.

  • act_layer (Callable): Function for activation layer.

  • norm_layer (Callable): Function for normalization layer.

  • attn_layer (Callable): Function for attention layer.

  • aa_layer (Callable): Function for anti-aliasing layer.

  • drop_block (Callable): Method for dropping block.

  • drop_path (Callable): Method for dropping path.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
expansion = 1
forward(x: Tensor) Tensor[source]
Overview:

Defines the computation performed at every call.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • output (torch.Tensor): The output tensor after passing through the BasicBlock.

training: bool
zero_init_last_bn() None[source]
Overview:

Initialize the batch normalization layer with zeros.

Bottleneck

class ding.torch_utils.network.resnet.Bottleneck(inplanes: int, planes: int, stride: int = 1, downsample: ~torch.nn.modules.module.Module | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, aa_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None)[source]
Overview:

The Bottleneck class is a basic block used to build ResNet networks. It is a part of the PyTorch’s implementation of ResNet. This block is designed with several layers including a convolutional layer, normalization layer, activation layer, attention layer, anti-aliasing layer, and a dropout layer.

Interfaces:

__init__, forward, zero_init_last_bn

Properties:

expansion, inplanes, planes, stride, downsample, cardinality, base_width, reduce_first, dilation, first_dilation, act_layer, norm_layer, attn_layer, aa_layer, drop_block, drop_path

__init__(inplanes: int, planes: int, stride: int = 1, downsample: ~torch.nn.modules.module.Module | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, aa_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None) None[source]
Overview:

Initialize the Bottleneck class with various parameters.

Arguments:
  • inplanes (int): The number of input planes.

  • planes (int): The number of output planes.

  • stride (int, optional): The stride size, defaults to 1.

  • downsample (nn.Module, optional): The downsample method, defaults to None.

  • cardinality (int, optional): The size of the group convolutions, defaults to 1.

  • base_width (int, optional): The base width, defaults to 64.

  • reduce_first (int, optional): The first reduction factor, defaults to 1.

  • dilation (int, optional): The dilation factor, defaults to 1.

  • first_dilation (int, optional): The first dilation factor, defaults to None.

  • act_layer (Type[nn.Module], optional): The activation layer type, defaults to nn.ReLU.

  • norm_layer (Type[nn.Module], optional): The normalization layer type, defaults to nn.BatchNorm2d.

  • attn_layer (Type[nn.Module], optional): The attention layer type, defaults to None.

  • aa_layer (Type[nn.Module], optional): The anti-aliasing layer type, defaults to None.

  • drop_block (Callable): The dropout block, defaults to None.

  • drop_path (Callable): The drop path, defaults to None.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
expansion = 4
forward(x: Tensor) Tensor[source]
Overview:

Defines the computation performed at every call.

Arguments:
  • x (Tensor): The input tensor.

Returns:
  • x (Tensor): The output tensor resulting from the computation.

training: bool
zero_init_last_bn() None[source]
Overview:

Initialize the last batch normalization layer with zero.

downsample_conv

ding.torch_utils.network.resnet.downsample_conv(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, first_dilation: int | None = None, norm_layer: Type[Module] | None = None) Sequential[source]
Overview:

Create a sequential module for downsampling that includes a convolution layer and a normalization layer.

Arguments:
  • in_channels (int): The number of input channels.

  • out_channels (int): The number of output channels.

  • kernel_size (int): The size of the kernel.

  • stride (int, optional): The stride size, defaults to 1.

  • dilation (int, optional): The dilation factor, defaults to 1.

  • first_dilation (int, optional): The first dilation factor, defaults to None.

  • norm_layer (Type[nn.Module], optional): The normalization layer type, defaults to nn.BatchNorm2d.

Returns:
  • nn.Sequential: A sequence of layers performing downsampling through convolution.

downsample_avg

ding.torch_utils.network.resnet.downsample_avg(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, first_dilation: int | None = None, norm_layer: Type[Module] | None = None) Sequential[source]
Overview:

Create a sequential module for downsampling that includes an average pooling layer, a convolution layer, and a normalization layer.

Arguments:
  • in_channels (int): The number of input channels.

  • out_channels (int): The number of output channels.

  • kernel_size (int): The size of the kernel.

  • stride (int, optional): The stride size, defaults to 1.

  • dilation (int, optional): The dilation factor, defaults to 1.

  • first_dilation (int, optional): The first dilation factor, defaults to None.

  • norm_layer (Type[nn.Module], optional): The normalization layer type, defaults to nn.BatchNorm2d.

Returns:
  • nn.Sequential: A sequence of layers performing downsampling through average pooling.

drop_blocks

ding.torch_utils.network.resnet.drop_blocks(drop_block_rate: float = 0.0) List[None][source]
Overview:

Generate a list of None values based on the drop block rate.

Arguments:
  • drop_block_rate (float, optional): The drop block rate, defaults to 0.

Returns:
  • List[None]: A list of None values.

make_blocks

ding.torch_utils.network.resnet.make_blocks(block_fn: Type[Module], channels: List[int], block_repeats: List[int], inplanes: int, reduce_first: int = 1, output_stride: int = 32, down_kernel_size: int = 1, avg_down: bool = False, drop_block_rate: float = 0.0, drop_path_rate: float = 0.0, **kwargs) Tuple[List[Tuple[str, Module]], List[Dict[str, int | str]]][source]
Overview:

Create a list of blocks for the network, with each block having a given number of repeats. Also, create a feature info list that contains information about the output of each block.

Arguments:
  • block_fn (Type[nn.Module]): The type of block to use.

  • channels (List[int]): The list of output channels for each block.

  • block_repeats (List[int]): The list of number of repeats for each block.

  • inplanes (int): The number of input planes.

  • reduce_first (int, optional): The first reduction factor, defaults to 1.

  • output_stride (int, optional): The total stride of the network, defaults to 32.

  • down_kernel_size (int, optional): The size of the downsample kernel, defaults to 1.

  • avg_down (bool, optional): Whether to use average pooling for downsampling, defaults to False.

  • drop_block_rate (float, optional): The drop block rate, defaults to 0.

  • drop_path_rate (float, optional): The drop path rate, defaults to 0.

Returns:
  • Tuple[List[Tuple[str, nn.Module]], List[Dict[str, Union[int, str]]]]: A tuple that includes a list of blocks for the network and a feature info list.

ResNet

class ding.torch_utils.network.resnet.ResNet(block: ~torch.nn.modules.module.Module, layers: ~typing.List[int], num_classes: int = 1000, in_chans: int = 3, cardinality: int = 1, base_width: int = 64, stem_width: int = 64, stem_type: str = '', replace_stem_pool: bool = False, output_stride: int = 32, block_reduce_first: int = 1, down_kernel_size: int = 1, avg_down: bool = False, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, aa_layer: ~torch.nn.modules.module.Module | None = None, drop_rate: float = 0.0, drop_path_rate: float = 0.0, drop_block_rate: float = 0.0, global_pool: str = 'avg', zero_init_last_bn: bool = True, block_args: dict | None = None)[source]
Overview:

Implements ResNet, ResNeXt, SE-ResNeXt, and SENet models. This implementation supports various modifications based on the v1c, v1d, v1e, and v1s variants included in the MXNet Gluon ResNetV1b model. For more details about the variants and options, please refer to the ‘Bag of Tricks’ paper: https://arxiv.org/pdf/1812.01187.

Interfaces:

__init__, forward, zero_init_last_bn, get_classifier

__init__(block: ~torch.nn.modules.module.Module, layers: ~typing.List[int], num_classes: int = 1000, in_chans: int = 3, cardinality: int = 1, base_width: int = 64, stem_width: int = 64, stem_type: str = '', replace_stem_pool: bool = False, output_stride: int = 32, block_reduce_first: int = 1, down_kernel_size: int = 1, avg_down: bool = False, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, aa_layer: ~torch.nn.modules.module.Module | None = None, drop_rate: float = 0.0, drop_path_rate: float = 0.0, drop_block_rate: float = 0.0, global_pool: str = 'avg', zero_init_last_bn: bool = True, block_args: dict | None = None) None[source]
Overview:

Initialize the ResNet model with given block, layers and other configuration options.

Arguments:
  • block (nn.Module): Class for the residual block.

  • layers (List[int]): Numbers of layers in each block.

  • num_classes (int, optional): Number of classification classes. Default is 1000.

  • in_chans (int, optional): Number of input (color) channels. Default is 3.

  • cardinality (int, optional): Number of convolution groups for 3x3 conv in Bottleneck. Default is 1.

  • base_width (int, optional): Factor determining bottleneck channels. Default is 64.

  • stem_width (int, optional): Number of channels in stem convolutions. Default is 64.

  • stem_type (str, optional): The type of stem. Default is ‘’.

  • replace_stem_pool (bool, optional): Whether to replace stem pooling. Default is False.

  • output_stride (int, optional): Output stride of the network. Default is 32.

  • block_reduce_first (int, optional): Reduction factor for first convolution output width of residual blocks. Default is 1.

  • down_kernel_size (int, optional): Kernel size of residual block downsampling path. Default is 1.

  • avg_down (bool, optional): Whether to use average pooling for projection skip connection between

    stages/downsample. Default is False.

  • act_layer (nn.Module, optional): Activation layer. Default is nn.ReLU.

  • norm_layer (nn.Module, optional): Normalization layer. Default is nn.BatchNorm2d.

  • aa_layer (Optional[nn.Module], optional): Anti-aliasing layer. Default is None.

  • drop_rate (float, optional): Dropout probability before classifier, for training. Default is 0.0.

  • drop_path_rate (float, optional): Drop path rate. Default is 0.0.

  • drop_block_rate (float, optional): Drop block rate. Default is 0.0.

  • global_pool (str, optional): Global pooling type. Default is ‘avg’.

  • zero_init_last_bn (bool, optional): Whether to initialize last batch normalization with zero. Default is True.

  • block_args (Optional[dict], optional): Additional arguments for block. Default is None.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Full forward pass through the model.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • x (torch.Tensor): The output tensor after passing through the model.

forward_features(x: Tensor) Tensor[source]
Overview:

Forward pass through the feature layers of the model.

Arguments:
  • x (torch.Tensor): The input tensor.

Returns:
  • x (torch.Tensor): The output tensor after passing through feature layers.

get_classifier() Module[source]
Overview:

Get the classifier module from the model.

Returns:
  • classifier (nn.Module): The classifier module in the model.

init_weights(zero_init_last_bn: bool = True) None[source]
Overview:

Initialize the weights in the model.

Arguments:
  • zero_init_last_bn (bool, optional): Whether to initialize last batch normalization with zero.

    Default is True.

reset_classifier(num_classes: int, global_pool: str = 'avg') None[source]
Overview:

Reset the classifier with a new number of classes and pooling type.

Arguments:
  • num_classes (int): New number of classification classes.

  • global_pool (str, optional): New global pooling type. Default is ‘avg’.

training: bool

resnet18

ding.torch_utils.network.resnet.resnet18() Module[source]
Overview:

Creates a ResNet18 model.

Returns:
  • model (nn.Module): ResNet18 model.

network.rnn

Please refer to ding/torch_utils/network/rnn for more details.

is_sequence

ding.torch_utils.network.rnn.is_sequence(data)[source]
Overview:

Determines if the input data is of type list or tuple.

Arguments:
  • data: The input data to be checked.

Returns:
  • boolean: True if the input is a list or a tuple, False otherwise.

sequence_mask

ding.torch_utils.network.rnn.sequence_mask(lengths: Tensor, max_len: int | None = None) BoolTensor[source]
Overview:

Generates a boolean mask for a batch of sequences with differing lengths.

Arguments:
  • lengths (torch.Tensor): A tensor with the lengths of each sequence. Shape could be (n, 1) or (n).

  • max_len (int, optional): The padding size. If max_len is None, the padding size is the max length of sequences.

Returns:
  • masks (torch.BoolTensor): A boolean mask tensor. The mask has the same device as lengths.

LSTMForwardWrapper

class ding.torch_utils.network.rnn.LSTMForwardWrapper[source]
Overview:

Class providing methods to use before and after the LSTM forward method. Wraps the LSTM forward method.

Interfaces:

_before_forward, _after_forward

_after_forward(next_state: Tuple[Tensor], list_next_state: bool = False) List[Dict] | Dict[str, Tensor][source]
Overview:

Post-processes the next_state after the LSTM forward method.

Arguments:
  • next_state (Tuple[torch.Tensor]): Tuple containing the next state (h, c).

  • list_next_state (bool, optional): Determines the format of the returned next_state. If True, returns next_state in list format. Default is False.

Returns:
  • next_state(Union[List[Dict], Dict[str, torch.Tensor]]): The post-processed next_state.

_before_forward(inputs: Tensor, prev_state: None | List[Dict]) Tensor[source]
Overview:

Preprocesses the inputs and previous states before the LSTM forward method.

Arguments:
  • inputs (torch.Tensor): Input vector of the LSTM cell. Shape: [seq_len, batch_size, input_size]

  • prev_state (Union[None, List[Dict]]): Previous state tensor. Shape: [num_directions*num_layers, batch_size, hidden_size]. If None, prv_state will be initialized to all zeros.

Returns:
  • prev_state (torch.Tensor): Preprocessed previous state for the LSTM batch.

LSTM

class ding.torch_utils.network.rnn.LSTM(input_size: int, hidden_size: int, num_layers: int, norm_type: str | None = None, dropout: float = 0.0)[source]
Overview:

Implementation of an LSTM cell with Layer Normalization (LN).

Interfaces:

__init__, forward

Note

For a primer on LSTM, refer to https://zhuanlan.zhihu.com/p/32085405.

__init__(input_size: int, hidden_size: int, num_layers: int, norm_type: str | None = None, dropout: float = 0.0) None[source]
Overview:

Initialize LSTM cell parameters.

Arguments:
  • input_size (int): Size of the input vector.

  • hidden_size (int): Size of the hidden state vector.

  • num_layers (int): Number of LSTM layers.

  • norm_type (Optional[str]): Normalization type, default is None.

  • dropout (float): Dropout rate, default is 0.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_init()[source]
Overview:

Initialize the parameters of the LSTM cell.

_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(inputs: Tensor, prev_state: Tensor, list_next_state: bool = True) Tuple[Tensor, Tensor | list][source]
Overview:

Compute output and next state given previous state and input.

Arguments:
  • inputs (torch.Tensor): Input vector of cell, size [seq_len, batch_size, input_size].

  • prev_state (torch.Tensor): Previous state, size [num_directions*num_layers, batch_size, hidden_size].

  • list_next_state (bool): Whether to return next_state in list format, default is True.

Returns:
  • x (torch.Tensor): Output from LSTM.

  • next_state (Union[torch.Tensor, list]): Hidden state from LSTM.

training: bool

PytorchLSTM

class ding.torch_utils.network.rnn.PytorchLSTM(input_size: int, hidden_size: int, num_layers: int = 1, bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, proj_size: int = 0, device=None, dtype=None)[source]
class ding.torch_utils.network.rnn.PytorchLSTM(*args, **kwargs)
Overview:

Wrapper class for PyTorch’s nn.LSTM, formats the input and output. For more details on nn.LSTM, refer to https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM

Interfaces:

forward

batch_first: bool
bias: bool
bidirectional: bool
dropout: float
forward(inputs: Tensor, prev_state: Tensor, list_next_state: bool = True) Tuple[Tensor, Tensor | list][source]
Overview:

Executes nn.LSTM.forward with preprocessed input.

Arguments:
  • inputs (torch.Tensor): Input vector of cell, size [seq_len, batch_size, input_size].

  • prev_state (torch.Tensor): Previous state, size [num_directions*num_layers, batch_size, hidden_size].

  • list_next_state (bool): Whether to return next_state in list format, default is True.

Returns:
  • output (torch.Tensor): Output from LSTM.

  • next_state (Union[torch.Tensor, list]): Hidden state from LSTM.

hidden_size: int
input_size: int
mode: str
num_layers: int
proj_size: int

GRU

class ding.torch_utils.network.rnn.GRU(input_size: int, hidden_size: int, num_layers: int)[source]
Overview:

This class extends the torch.nn.GRUCell and LSTMForwardWrapper classes, and formats inputs and outputs accordingly.

Interfaces:

__init__, forward

Properties:

hidden_size, num_layers

Note

For further details, refer to the official PyTorch documentation: <https://pytorch.org/docs/stable/generated/torch.nn.GRU.html#torch.nn.GRU>

__init__(input_size: int, hidden_size: int, num_layers: int) None[source]
Overview:

Initialize the GRU class with input size, hidden size, and number of layers.

Arguments:
  • input_size (int): The size of the input vector.

  • hidden_size (int): The size of the hidden state vector.

  • num_layers (int): The number of GRU layers.

bias: bool
forward(inputs: Tensor, prev_state: Tensor | None = None, list_next_state: bool = True) Tuple[Tensor, Tensor | List][source]
Overview:

Wrap the nn.GRU.forward method.

Arguments:
  • inputs (torch.Tensor): Input vector of cell, tensor of size [seq_len, batch_size, input_size].

  • prev_state (Optional[torch.Tensor]): None or tensor of size [num_directions*num_layers, batch_size, hidden_size].

  • list_next_state (bool): Whether to return next_state in list format (default is True).

Returns:
  • output (torch.Tensor): Output from GRU.

  • next_state (torch.Tensor or list): Hidden state from GRU.

hidden_size: int
input_size: int
weight_hh: Tensor
weight_ih: Tensor

get_lstm

ding.torch_utils.network.rnn.get_lstm(lstm_type: str, input_size: int, hidden_size: int, num_layers: int = 1, norm_type: str = 'LN', dropout: float = 0.0, seq_len: int | None = None, batch_size: int | None = None) LSTM | PytorchLSTM[source]
Overview:

Build and return the corresponding LSTM cell based on the provided parameters.

Arguments:
  • lstm_type (str): Version of RNN cell. Supported options are [‘normal’, ‘pytorch’, ‘hpc’, ‘gru’].

  • input_size (int): Size of the input vector.

  • hidden_size (int): Size of the hidden state vector.

  • num_layers (int): Number of LSTM layers (default is 1).

  • norm_type (str): Type of normalization (default is ‘LN’).

  • dropout (float): Dropout rate (default is 0.0).

  • seq_len (Optional[int]): Sequence length (default is None).

  • batch_size (Optional[int]): Batch size (default is None).

Returns:
  • lstm (Union[LSTM, PytorchLSTM]): The corresponding LSTM cell.

network.scatter_connection

Please refer to ding/torch_utils/network/scatter_connection for more details.

shape_fn_scatter_connection

ding.torch_utils.network.scatter_connection.shape_fn_scatter_connection(args, kwargs) List[int][source]
Overview:

Return the shape of scatter_connection for HPC.

Arguments:
  • args (Tuple): The arguments passed to the scatter_connection function.

  • kwargs (Dict): The keyword arguments passed to the scatter_connection function.

Returns:
  • shape (List[int]): A list representing the shape of scatter_connection, in the form of [B, M, N, H, W, scatter_type].

ScatterConnection

class ding.torch_utils.network.scatter_connection.ScatterConnection(scatter_type: str)[source]
Overview:

Scatter feature to its corresponding location. In AlphaStar, each entity is embedded into a tensor, and these tensors are scattered into a feature map with map size.

Interfaces:

__init__, forward, xy_forward

__init__(scatter_type: str) None[source]
Overview:

Initialize the ScatterConnection object.

Arguments:
  • scatter_type (str): The scatter type, which decides the behavior when two entities have the same location. It can be either ‘add’ or ‘cover’. If ‘add’, the first one will be added to the second one. If ‘cover’, the first one will be covered by the second one.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, spatial_size: Tuple[int, int], location: Tensor) Tensor[source]
Overview:

Scatter input tensor ‘x’ into a spatial feature map.

Arguments:
  • x (torch.Tensor): The input tensor of shape (B, M, N), where B is the batch size, M is the number of entities, and N is the dimension of entity attributes.

  • spatial_size (Tuple[int, int]): The size (H, W) of the spatial feature map into which ‘x’ will be scattered, where H is the height and W is the width.

  • location (torch.Tensor): The tensor of locations of shape (B, M, 2). Each location should be (y, x).

Returns:
  • output (torch.Tensor): The scattered feature map of shape (B, N, H, W).

Note:

When there are some overlapping in locations, ‘cover’ mode will result in the loss of information. ‘add’ mode is used as a temporary substitute.

training: bool
xy_forward(x: Tensor, spatial_size: Tuple[int, int], coord_x: Tensor, coord_y) Tensor[source]
Overview:

Scatter input tensor ‘x’ into a spatial feature map using separate x and y coordinates.

Arguments:
  • x (torch.Tensor): The input tensor of shape (B, M, N), where B is the batch size, M is the number of entities, and N is the dimension of entity attributes.

  • spatial_size (Tuple[int, int]): The size (H, W) of the spatial feature map into which ‘x’ will be scattered, where H is the height and W is the width.

  • coord_x (torch.Tensor): The x-coordinates tensor of shape (B, M).

  • coord_y (torch.Tensor): The y-coordinates tensor of shape (B, M).

Returns:
  • output (torch.Tensor): The scattered feature map of shape (B, N, H, W).

Note:

When there are some overlapping in locations, ‘cover’ mode will result in the loss of information. ‘add’ mode is used as a temporary substitute.

network.soft_argmax

Please refer to ding/torch_utils/network/soft_argmax for more details.

SoftArgmax

class ding.torch_utils.network.soft_argmax.SoftArgmax[source]
Overview:

A neural network module that computes the SoftArgmax operation (essentially a 2-dimensional spatial softmax), which is often used for location regression tasks. It converts a feature map (such as a heatmap) into precise coordinate locations.

Interfaces:

__init__, forward

Note

For more information on SoftArgmax, you can refer to <https://en.wikipedia.org/wiki/Softmax_function> and the paper <https://arxiv.org/pdf/1504.00702.pdf>.

__init__()[source]
Overview:

Initialize the SoftArgmax module.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor) Tensor[source]
Overview:

Perform the forward pass of the SoftArgmax operation.

Arguments:
  • x (torch.Tensor): The input tensor, typically a heatmap representing predicted locations.

Returns:
  • location (torch.Tensor): The predicted coordinates as a result of the SoftArgmax operation.

Shapes:
  • x: \((B, C, H, W)\), where B is the batch size, C is the number of channels, and H and W represent height and width respectively.

  • location: \((B, 2)\), where B is the batch size and 2 represents the coordinates (height, width).

training: bool

network.transformer

Please refer to ding/torch_utils/network/transformer for more details.

Attention

class ding.torch_utils.network.transformer.Attention(input_dim: int, head_dim: int, output_dim: int, head_num: int, dropout: Module)[source]
Overview:

For each entry embedding, compute individual attention across all entries, add them up to get output attention.

Interfaces:

__init__, split, forward

__init__(input_dim: int, head_dim: int, output_dim: int, head_num: int, dropout: Module) None[source]
Overview:

Initialize the Attention module with the provided dimensions and dropout layer.

Arguments:
  • input_dim (int): The dimension of the input.

  • head_dim (int): The dimension of each head in the multi-head attention mechanism.

  • output_dim (int): The dimension of the output.

  • head_num (int): The number of heads in the multi-head attention mechanism.

  • dropout (nn.Module): The dropout layer used in the attention mechanism.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, mask: Tensor | None = None) Tensor[source]
Overview:

Compute the attention from the input tensor.

Arguments:
  • x (torch.Tensor): The input tensor for the forward computation.

  • mask (Optional[torch.Tensor], optional): Optional mask to exclude invalid entries.

    Defaults to None.

Returns:
  • attention (torch.Tensor): The computed attention tensor.

split(x: Tensor, T: bool = False) List[Tensor][source]
Overview:

Split the input to get multi-head queries, keys, and values.

Arguments:
  • x (torch.Tensor): The tensor to be split, which could be a query, key, or value.

  • T (bool, optional): If True, transpose the output tensors. Defaults to False.

Returns:
  • x (List[torch.Tensor]): A list of output tensors for each head.

training: bool

TransformerLayer

class ding.torch_utils.network.transformer.TransformerLayer(input_dim: int, head_dim: int, hidden_dim: int, output_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module)[source]
Overview:

In transformer layer, first computes entries’s attention and applies a feedforward layer.

Interfaces:

__init__, forward

__init__(input_dim: int, head_dim: int, hidden_dim: int, output_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module) None[source]
Overview:

Initialize the TransformerLayer with the provided dimensions, dropout layer, and activation function.

Arguments:
  • input_dim (int): The dimension of the input.

  • head_dim (int): The dimension of each head in the multi-head attention mechanism.

  • hidden_dim (int): The dimension of the hidden layer in the MLP (Multi-Layer Perceptron).

  • output_dim (int): The dimension of the output.

  • head_num (int): The number of heads in the multi-head attention mechanism.

  • mlp_num (int): The number of layers in the MLP.

  • dropout (nn.Module): The dropout layer used in the attention mechanism.

  • activation (nn.Module): The activation function used in the MLP.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(inputs: Tuple[Tensor, Tensor]) Tuple[Tensor, Tensor][source]
Overview:

Compute the forward pass through the Transformer layer.

Arguments:
  • inputs (Tuple[torch.Tensor, torch.Tensor]): A tuple containing the input tensor x and

    the mask tensor.

Returns:
  • output (Tuple[torch.Tensor, torch.Tensor]): A tuple containing the predicted value tensor and

    the mask tensor.

training: bool

Transformer

class ding.torch_utils.network.transformer.Transformer(input_dim: int, head_dim: int = 128, hidden_dim: int = 1024, output_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, dropout_ratio: float = 0.0, activation: Module = ReLU())[source]
Overview:

Implementation of the Transformer model.

Note

For more details, refer to “Attention is All You Need”: http://arxiv.org/abs/1706.03762.

Interfaces:

__init__, forward

__init__(input_dim: int, head_dim: int = 128, hidden_dim: int = 1024, output_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, dropout_ratio: float = 0.0, activation: Module = ReLU())[source]
Overview:

Initialize the Transformer with the provided dimensions, dropout layer, activation function, and layer numbers.

Arguments:
  • input_dim (int): The dimension of the input.

  • head_dim (int): The dimension of each head in the multi-head attention mechanism.

  • hidden_dim (int): The dimension of the hidden layer in the MLP (Multi-Layer Perceptron).

  • output_dim (int): The dimension of the output.

  • head_num (int): The number of heads in the multi-head attention mechanism.

  • mlp_num (int): The number of layers in the MLP.

  • layer_num (int): The number of Transformer layers.

  • dropout_ratio (float): The dropout ratio for the dropout layer.

  • activation (nn.Module): The activation function used in the MLP.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(x: Tensor, mask: Tensor | None = None) Tensor[source]
Overview:

Perform the forward pass through the Transformer.

Arguments:
  • x (torch.Tensor): The input tensor, with shape (B, N, C), where B is batch size, N is the number of entries, and C is the feature dimension.

  • mask (Optional[torch.Tensor], optional): The mask tensor (bool), used to mask out invalid entries in attention. It has shape (B, N), where B is batch size and N is number of entries. Defaults to None.

Returns:
  • x (torch.Tensor): The output tensor from the Transformer.

training: bool

ScaledDotProductAttention

class ding.torch_utils.network.transformer.ScaledDotProductAttention(d_k: int, dropout: float = 0.0)[source]
Overview:

Implementation of Scaled Dot Product Attention, a key component of Transformer models. This class performs the dot product of the query, key and value tensors, scales it with the square root of the dimension of the key vector (d_k) and applies dropout for regularization.

Interfaces:

__init__, forward

__init__(d_k: int, dropout: float = 0.0) None[source]
Overview:

Initialize the ScaledDotProductAttention module with the dimension of the key vector and the dropout rate.

Arguments:
  • d_k (int): The dimension of the key vector. This will be used to scale the dot product of the query and key.

  • dropout (float, optional): The dropout rate to be applied after the softmax operation. Defaults to 0.0.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward(q: Tensor, k: Tensor, v: Tensor, mask: Tensor | None = None) Tensor[source]
Overview:

Perform the Scaled Dot Product Attention operation on the query, key and value tensors.

Arguments:
  • q (torch.Tensor): The query tensor.

  • k (torch.Tensor): The key tensor.

  • v (torch.Tensor): The value tensor.

  • mask (Optional[torch.Tensor]): An optional mask tensor to be applied on the attention scores.

    Defaults to None.

Returns:
  • output (torch.Tensor): The output tensor after the attention operation.

training: bool

backend_helper

Please refer to ding/torch_utils/backend_helper for more details.

enable_tf32

ding.torch_utils.backend_helper.enable_tf32() None[source]
Overview:

Enable tf32 on matmul and cudnn for faster computation. This only works on Ampere GPU devices. For detailed information, please refer to: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices.

checkpoint_helper

Please refer to ding/torch_utils/checkpoint_helper for more details.

build_checkpoint_helper

ding.torch_utils.checkpoint_helper.build_checkpoint_helper(cfg)[source]
Overview:

Use config to build checkpoint helper.

Arguments:
  • cfg (dict): ckpt_helper config

Returns:

CheckpointHelper

class ding.torch_utils.checkpoint_helper.CheckpointHelper[source]
Overview:

Help to save or load checkpoint by give args.

Interfaces:

__init__, save, load, _remove_prefix, _add_prefix, _load_matched_model_state_dict

__init__()[source]
_add_prefix(state_dict: dict, prefix: str = 'module.') dict[source]
Overview:

Add prefix in state_dict

Arguments:
  • state_dict (dict): model’s state_dict

  • prefix (str): this prefix will be added in keys

Returns:
  • (dict): new state_dict after adding prefix

_load_matched_model_state_dict(model: Module, ckpt_state_dict: dict) None[source]
Overview:

Load matched model state_dict, and show mismatch keys between model’s state_dict and checkpoint’s state_dict

Arguments:
  • model (torch.nn.Module): model

  • ckpt_state_dict (dict): checkpoint’s state_dict

_remove_prefix(state_dict: dict, prefix: str = 'module.') dict[source]
Overview:

Remove prefix in state_dict

Arguments:
  • state_dict (dict): model’s state_dict

  • prefix (str): this prefix will be removed in keys

Returns:
  • new_state_dict (dict): new state_dict after removing prefix

load(load_path: str, model: Module, optimizer: Optimizer = None, last_iter: CountVar = None, last_epoch: CountVar = None, last_frame: CountVar = None, lr_schduler: Scheduler = None, dataset: Dataset = None, collector_info: Module = None, prefix_op: str = None, prefix: str = None, strict: bool = True, logger_prefix: str = '', state_dict_mask: list = [])[source]
Overview:

Load checkpoint by given path

Arguments:
  • load_path (str): checkpoint’s path

  • model (torch.nn.Module): model definition

  • optimizer (torch.optim.Optimizer): optimizer obj

  • last_iter (CountVar): iter num, default None

  • last_epoch (CountVar): epoch num, default None

  • last_frame (CountVar): frame num, default None

  • lr_schduler (Schduler): lr_schduler obj

  • dataset (torch.utils.data.Dataset): dataset, should be replaydataset

  • collector_info (torch.nn.Module): attr of checkpoint, save collector info

  • prefix_op (str): should be [‘remove’, ‘add’], process on state_dict

  • prefix (str): prefix to be processed on state_dict

  • strict (bool): args of model.load_state_dict

  • logger_prefix (str): prefix of logger

  • state_dict_mask (list): A list containing state_dict keys, which shouldn’t be loaded into model(after prefix op)

Note

The checkpoint loaded from load_path is a dict, whose format is like ‘{‘state_dict’: OrderedDict(), …}’

save(path: str, model: Module, optimizer: Optimizer | None = None, last_iter: CountVar | None = None, last_epoch: CountVar | None = None, last_frame: CountVar | None = None, dataset: Dataset | None = None, collector_info: Module | None = None, prefix_op: str | None = None, prefix: str | None = None) None[source]
Overview:

Save checkpoint by given args

Arguments:
  • path (str): the path of saving checkpoint

  • model (torch.nn.Module): model to be saved

  • optimizer (torch.optim.Optimizer): optimizer obj

  • last_iter (CountVar): iter num, default None

  • last_epoch (CountVar): epoch num, default None

  • last_frame (CountVar): frame num, default None

  • dataset (torch.utils.data.Dataset): dataset, should be replaydataset

  • collector_info (torch.nn.Module): attr of checkpoint, save collector info

  • prefix_op (str): should be [‘remove’, ‘add’], process on state_dict

  • prefix (str): prefix to be processed on state_dict

CountVar

class ding.torch_utils.checkpoint_helper.CountVar(init_val: int)[source]
Overview:

Number counter

Interfaces:

__init__, update, add

Properties:
  • val (int): the value of the counter

__init__(init_val: int) None[source]
Overview:

Init the var counter

Arguments:
  • init_val (int): the init value of the counter

add(add_num: int)[source]
Overview:

Add the number to counter

Arguments:
  • add_num (int): the number added to the counter

update(val: int) None[source]
Overview:

Update the var counter

Arguments:
  • val (int): the update value of the counter

property val: int
Overview:

Get the var counter

auto_checkpoint

ding.torch_utils.checkpoint_helper.auto_checkpoint(func: Callable) Callable[source]
Overview:

Create a wrapper to wrap function, and the wrapper will call the save_checkpoint method whenever an exception happens.

Arguments:
  • func(Callable): the function to be wrapped

Returns:
  • wrapper (Callable): the wrapped function

data_helper

Please refer to ding/torch_utils/data_helper for more details.

to_device

ding.torch_utils.data_helper.to_device(item: Any, device: str, ignore_keys: list = []) Any[source]
Overview:

Transfer data to certain device.

Arguments:
  • item (Any): The item to be transferred.

  • device (str): The device wanted.

  • ignore_keys (list): The keys to be ignored in transfer, default set to empty.

Returns:
  • item (Any): The transferred item.

Examples:
>>> setup_data_dict['module'] = nn.Linear(3, 5)
>>> device = 'cuda'
>>> cuda_d = to_device(setup_data_dict, device, ignore_keys=['module'])
>>> assert cuda_d['module'].weight.device == torch.device('cpu')
Examples:
>>> setup_data_dict['module'] = nn.Linear(3, 5)
>>> device = 'cuda'
>>> cuda_d = to_device(setup_data_dict, device)
>>> assert cuda_d['module'].weight.device == torch.device('cuda:0')

to_dtype

ding.torch_utils.data_helper.to_dtype(item: Any, dtype: type) Any[source]
Overview:

Change data to certain dtype.

Arguments:
  • item (Any): The item for changing the dtype.

  • dtype (type): The type wanted.

Returns:
  • item (object): The item with changed dtype.

Examples (tensor):
>>> t = torch.randint(0, 10, (3, 5))
>>> tfloat = to_dtype(t, torch.float)
>>> assert tfloat.dtype == torch.float
Examples (list):
>>> tlist = [torch.randint(0, 10, (3, 5))]
>>> tlfloat = to_dtype(tlist, torch.float)
>>> assert tlfloat[0].dtype == torch.float
Examples (dict):
>>> tdict = {'t': torch.randint(0, 10, (3, 5))}
>>> tdictf = to_dtype(tdict, torch.float)
>>> assert tdictf['t'].dtype == torch.float

to_tensor

ding.torch_utils.data_helper.to_tensor(item: Any, dtype: dtype | None = None, ignore_keys: list = [], transform_scalar: bool = True) Any[source]
Overview:

Convert numpy.ndarray object to torch.Tensor.

Arguments:
  • item (Any): The numpy.ndarray objects to be converted. It can be exactly a numpy.ndarray object or a container (list, tuple or dict) that contains several numpy.ndarray objects.

  • dtype (torch.dtype): The type of wanted tensor. If set to None, its dtype will be unchanged.

  • ignore_keys (list): If the item is a dict, values whose keys are in ignore_keys will not be converted.

  • transform_scalar (bool): If set to True, a scalar will be also converted to a tensor object.

Returns:
  • item (Any): The converted tensors.

Examples (scalar):
>>> i = 10
>>> t = to_tensor(i)
>>> assert t.item() == i
Examples (dict):
>>> d = {'i': i}
>>> dt = to_tensor(d, torch.int)
>>> assert dt['i'].item() == i
Examples (named tuple):
>>> data_type = namedtuple('data_type', ['x', 'y'])
>>> inputs = data_type(np.random.random(3), 4)
>>> outputs = to_tensor(inputs, torch.float32)
>>> assert type(outputs) == data_type
>>> assert isinstance(outputs.x, torch.Tensor)
>>> assert isinstance(outputs.y, torch.Tensor)
>>> assert outputs.x.dtype == torch.float32
>>> assert outputs.y.dtype == torch.float32

to_ndarray

ding.torch_utils.data_helper.to_ndarray(item: Any, dtype: dtype | None = None) Any[source]
Overview:

Convert torch.Tensor to numpy.ndarray.

Arguments:
  • item (Any): The torch.Tensor objects to be converted. It can be exactly a torch.Tensor object or a container (list, tuple or dict) that contains several torch.Tensor objects.

  • dtype (np.dtype): The type of wanted array. If set to None, its dtype will be unchanged.

Returns:
  • item (object): The changed arrays.

Examples (ndarray):
>>> t = torch.randn(3, 5)
>>> tarray1 = to_ndarray(t)
>>> assert tarray1.shape == (3, 5)
>>> assert isinstance(tarray1, np.ndarray)
Examples (list):
>>> t = [torch.randn(5, ) for i in range(3)]
>>> tarray1 = to_ndarray(t, np.float32)
>>> assert isinstance(tarray1, list)
>>> assert tarray1[0].shape == (5, )
>>> assert isinstance(tarray1[0], np.ndarray)

to_list

ding.torch_utils.data_helper.to_list(item: Any) Any[source]
Overview:

Convert torch.Tensor, numpy.ndarray objects to list objects, and keep their dtypes unchanged.

Arguments:
  • item (Any): The item to be converted.

Returns:
  • item (Any): The list after conversion.

Examples:
>>> data = {                 'tensor': torch.randn(4),                 'list': [True, False, False],                 'tuple': (4, 5, 6),                 'bool': True,                 'int': 10,                 'float': 10.,                 'array': np.random.randn(4),                 'str': "asdf",                 'none': None,             }         >>> transformed_data = to_list(data)

Note

Now supports item type: torch.Tensor, numpy.ndarray, dict, list, tuple and None.

tensor_to_list

ding.torch_utils.data_helper.tensor_to_list(item: Any) Any[source]
Overview:

Convert torch.Tensor objects to list, and keep their dtypes unchanged.

Arguments:
  • item (Any): The item to be converted.

Returns:
  • item (Any): The lists after conversion.

Examples (2d-tensor):
>>> t = torch.randn(3, 5)
>>> tlist1 = tensor_to_list(t)
>>> assert len(tlist1) == 3
>>> assert len(tlist1[0]) == 5
Examples (1d-tensor):
>>> t = torch.randn(3, )
>>> tlist1 = tensor_to_list(t)
>>> assert len(tlist1) == 3
Examples (list)
>>> t = [torch.randn(5, ) for i in range(3)]
>>> tlist1 = tensor_to_list(t)
>>> assert len(tlist1) == 3
>>> assert len(tlist1[0]) == 5
Examples (dict):
>>> td = {'t': torch.randn(3, 5)}
>>> tdlist1 = tensor_to_list(td)
>>> assert len(tdlist1['t']) == 3
>>> assert len(tdlist1['t'][0]) == 5

Note

Now supports item type: torch.Tensor, dict, list, tuple and None.

to_item

ding.torch_utils.data_helper.to_item(data: Any, ignore_error: bool = True) Any[source]
Overview:

Convert data to python native scalar (i.e. data item), and keep their dtypes unchanged.

Arguments:
  • data (Any): The data that needs to be converted.

  • ignore_error (bool): Whether to ignore the error when the data type is not supported. That is to say, only the data can be transformed into a python native scalar will be returned.

Returns:
  • data (Any): Converted data.

Examples:

>>>> data = { ‘tensor’: torch.randn(1), ‘list’: [True, False, torch.randn(1)], ‘tuple’: (4, 5, 6), ‘bool’: True, ‘int’: 10, ‘float’: 10., ‘array’: np.random.randn(1), ‘str’: “asdf”, ‘none’: None, } >>>> new_data = to_item(data) >>>> assert np.isscalar(new_data[‘tensor’]) >>>> assert np.isscalar(new_data[‘array’]) >>>> assert np.isscalar(new_data[‘list’][-1])

Note

Now supports item type: torch.Tensor, torch.Tensor, ttorch.Tensor, bool, str, dict, list, tuple and None.

same_shape

ding.torch_utils.data_helper.same_shape(data: list) bool[source]
Overview:

Judge whether all data elements in a list have the same shapes.

Arguments:
  • data (list): The list of data.

Returns:
  • same (bool): Whether the list of data all have the same shape.

Examples:
>>> tlist = [torch.randn(3, 5) for i in range(5)]
>>> assert same_shape(tlist)
>>> tlist = [torch.randn(3, 5), torch.randn(4, 5)]
>>> assert not same_shape(tlist)

LogDict

class ding.torch_utils.data_helper.LogDict[source]
Overview:

Derived from dict. Would convert torch.Tensor to list for convenient logging.

Interfaces:

_transform, __setitem__, update.

_transform(data: Any) None[source]
Overview:

Convert tensor objects to lists for better logging.

Arguments:
  • data (Any): The input data to be converted.

update(data: dict) None[source]
Overview:

Override the update function of built-in dict.

Arguments:
  • data (dict): The dict for updating current object.

build_log_buffer

ding.torch_utils.data_helper.build_log_buffer() LogDict[source]
Overview:

Build log buffer, a subclass of dict, which can convert the input data into log format.

Returns:
  • log_buffer (LogDict): Log buffer dict.

Examples:
>>> log_buffer = build_log_buffer()
>>> log_buffer['not_tensor'] = torch.randn(3)
>>> assert isinstance(log_buffer['not_tensor'], list)
>>> assert len(log_buffer['not_tensor']) == 3
>>> log_buffer.update({'not_tensor': 4, 'a': 5})
>>> assert log_buffer['not_tensor'] == 4

CudaFetcher

class ding.torch_utils.data_helper.CudaFetcher(data_source: Iterable, device: str, queue_size: int = 4, sleep: float = 0.1)[source]
Overview:

Fetch data from source, and transfer it to a specified device.

Interfaces:

__init__, __next__, run, close.

__init__(data_source: Iterable, device: str, queue_size: int = 4, sleep: float = 0.1) None[source]
Overview:

Initialize the CudaFetcher object using the given arguments.

Arguments:
  • data_source (Iterable): The iterable data source.

  • device (str): The device to put data to, such as “cuda:0”.

  • queue_size (int): The internal size of queue, such as 4.

  • sleep (float): Sleeping time when the internal queue is full.

_producer() None[source]
Overview:

Keep fetching data from source, change the device, and put into queue for request.

close() None[source]
Overview:

Stop producer thread by setting end_flag to True .

run() None[source]
Overview:

Start producer thread: Keep fetching data from source, change the device, and put into queue for request.

Examples:
>>> timer = EasyTimer()
>>> dataloader = iter([torch.randn(3, 3) for _ in range(10)])
>>> dataloader = CudaFetcher(dataloader, device='cuda', sleep=0.1)
>>> dataloader.run()
>>> data = next(dataloader)

get_tensor_data

ding.torch_utils.data_helper.get_tensor_data(data: Any) Any[source]
Overview:

Get pure tensor data from the given data (without disturbing grad computation graph).

Arguments:
  • data (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).

Returns:
  • output (Any): The output data.

Examples:
>>> a = {                 'tensor': torch.tensor([1, 2, 3.], requires_grad=True),                 'list': [torch.tensor([1, 2, 3.], requires_grad=True) for _ in range(2)],                 'none': None             }
>>> tensor_a = get_tensor_data(a)
>>> assert not tensor_a['tensor'].requires_grad
>>> for t in tensor_a['list']:
>>>     assert not t.requires_grad

unsqueeze

ding.torch_utils.data_helper.unsqueeze(data: Any, dim: int = 0) Any[source]
Overview:

Unsqueeze the tensor data.

Arguments:
  • data (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).

  • dim (int): The dimension to be unsqueezed.

Returns:
  • output (Any): The output data.

Examples (tensor):
>>> t = torch.randn(3, 3)
>>> tt = unsqueeze(t, dim=0)
>>> assert tt.shape == torch.Shape([1, 3, 3])
Examples (list):
>>> t = [torch.randn(3, 3)]
>>> tt = unsqueeze(t, dim=0)
>>> assert tt[0].shape == torch.Shape([1, 3, 3])
Examples (dict):
>>> t = {"t": torch.randn(3, 3)}
>>> tt = unsqueeze(t, dim=0)
>>> assert tt["t"].shape == torch.Shape([1, 3, 3])

squeeze

ding.torch_utils.data_helper.squeeze(data: Any, dim: int = 0) Any[source]
Overview:

Squeeze the tensor data.

Arguments:
  • data (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).

  • dim (int): The dimension to be Squeezed.

Returns:
  • output (Any): The output data.

Examples (tensor):
>>> t = torch.randn(1, 3, 3)
>>> tt = squeeze(t, dim=0)
>>> assert tt.shape == torch.Shape([3, 3])
Examples (list):
>>> t = [torch.randn(1, 3, 3)]
>>> tt = squeeze(t, dim=0)
>>> assert tt[0].shape == torch.Shape([3, 3])
Examples (dict):
>>> t = {"t": torch.randn(1, 3, 3)}
>>> tt = squeeze(t, dim=0)
>>> assert tt["t"].shape == torch.Shape([3, 3])

get_null_data

ding.torch_utils.data_helper.get_null_data(template: Any, num: int) List[Any][source]
Overview:

Get null data given an input template.

Arguments:
  • template (Any): The template data.

  • num (int): The number of null data items to generate.

Returns:
  • output (List[Any]): The generated null data.

Examples:
>>> temp = {'obs': [1, 2, 3], 'action': 1, 'done': False, 'reward': torch.tensor(1.)}
>>> null_data = get_null_data(temp, 2)
>>> assert len(null_data) ==2
>>> assert null_data[0]['null'] and null_data[0]['done']

zeros_like

ding.torch_utils.data_helper.zeros_like(h: Any) Any[source]
Overview:

Generate zero-tensors like the input data.

Arguments:
  • h (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).

Returns:
  • output (Any): The output zero-tensors.

Examples (tensor):
>>> t = torch.randn(3, 3)
>>> tt = zeros_like(t)
>>> assert tt.shape == torch.Shape([3, 3])
>>> assert torch.sum(torch.abs(tt)) < 1e-8
Examples (list):
>>> t = [torch.randn(3, 3)]
>>> tt = zeros_like(t)
>>> assert tt[0].shape == torch.Shape([3, 3])
>>> assert torch.sum(torch.abs(tt[0])) < 1e-8
Examples (dict):
>>> t = {"t": torch.randn(3, 3)}
>>> tt = zeros_like(t)
>>> assert tt["t"].shape == torch.Shape([3, 3])
>>> assert torch.sum(torch.abs(tt["t"])) < 1e-8

dataparallel

Please refer to ding/torch_utils/dataparallel for more details.

DataParallel

class ding.torch_utils.dataparallel.DataParallel(module, device_ids=None, output_device=None, dim=0)[source]
Overview:

A wrapper class for nn.DataParallel.

Interfaces:

__init__, parameters

__init__(module, device_ids=None, output_device=None, dim=0)[source]
Overview:

Initialize the DataParallel object.

Arguments:
  • module (nn.Module): The module to be parallelized.

  • device_ids (list): The list of GPU ids.

  • output_device (int): The output GPU id.

  • dim (int): The dimension to be parallelized.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
parameters(recurse: bool = True)[source]
Overview:

Return the parameters of the module.

Arguments:
  • recurse (bool): Whether to return the parameters of the submodules.

Returns:
  • params (generator): The generator of the parameters.

training: bool

distribution

Please refer to ding/torch_utils/distribution for more details.

Pd

class ding.torch_utils.distribution.Pd[source]
Overview:

Abstract class for parameterizable probability distributions and sampling functions.

Interfaces:

neglogp, entropy, noise_mode, mode, sample

Tip

In dereived classes, logits should be an attribute member stored in class.

entropy() Tensor[source]
Overview:

Calculate the softmax entropy of logits

Arguments:
  • reduction (str): support [None, ‘mean’], default set to ‘mean’

Returns:
  • entropy (torch.Tensor): the calculated entropy

mode()[source]
Overview:

Return logits argmax result. This method is designed for deterministic.

neglogp(x: Tensor) Tensor[source]
Overview:

Calculate cross_entropy between input x and logits

Arguments:
  • x (torch.Tensor): the input tensor

Return:
  • cross_entropy (torch.Tensor): the returned cross_entropy loss

noise_mode()[source]
Overview:

Add noise to logits. This method is designed for randomness

sample()[source]
Overview:

Sample from logits’s distribution by using softmax. This method is designed for multinomial.

CategoricalPd

class ding.torch_utils.distribution.CategoricalPd(logits: Tensor | None = None)[source]
Overview:

Catagorical probility distribution sampler

Interfaces:

__init__, neglogp, entropy, noise_mode, mode, sample

__init__(logits: Tensor | None = None) None[source]
Overview:

Init the Pd with logits

Arguments:
  • logits (:obj:torch.Tensor): logits to sample from

entropy(reduction: str = 'mean') Tensor[source]
Overview:

Calculate the softmax entropy of logits

Arguments:
  • reduction (str): support [None, ‘mean’], default set to mean

Returns:
  • entropy (torch.Tensor): the calculated entropy

mode(viz: bool = False) Tuple[Tensor, Dict[str, ndarray]][source]
Overview:

return logits argmax result

Arguments:
  • viz (bool): Whether to return numpy from of logits, noise and noise_logits;

    Short for visualize . (Because tensor type cannot visualize in tb or text log)

Returns:
  • result (torch.Tensor): the logits argmax result

  • viz_feature (Dict[str, np.ndarray]): ndarray type data for visualization.

neglogp(x, reduction: str = 'mean') Tensor[source]
Overview:

Calculate cross_entropy between input x and logits

Arguments:
  • x (torch.Tensor): the input tensor

  • reduction (str): support [None, ‘mean’], default set to mean

Return:
  • cross_entropy (torch.Tensor): the returned cross_entropy loss

noise_mode(viz: bool = False) Tuple[Tensor, Dict[str, ndarray]][source]
Overview:

add noise to logits

Arguments:
  • viz (bool): Whether to return numpy from of logits, noise and noise_logits; Short for visualize . (Because tensor type cannot visualize in tb or text log)

Returns:
  • result (torch.Tensor): noised logits

  • viz_feature (Dict[str, np.ndarray]): ndarray type data for visualization.

sample(viz: bool = False) Tuple[Tensor, Dict[str, ndarray]][source]
Overview:

Sample from logits’s distribution by using softmax

Arguments:
  • viz (bool): Whether to return numpy from of logits, noise and noise_logits; Short for visualize . (Because tensor type cannot visualize in tb or text log)

Returns:
  • result (torch.Tensor): the logits sampled result

  • viz_feature (Dict[str, np.ndarray]): ndarray type data for visualization.

update_logits(logits: Tensor) None[source]
Overview:

Updata logits

Arguments:
  • logits (torch.Tensor): logits to update

CategoricalPdPytorch

class ding.torch_utils.distribution.CategoricalPdPytorch(probs: Tensor | None = None)[source]
Overview:

Wrapped torch.distributions.Categorical

Interfaces:

__init__, update_logits, update_probs, sample, neglogp, mode, entropy

__init__(probs: Tensor | None = None) None[source]
Overview:

Initialize the CategoricalPdPytorch object.

Arguments:
  • probs (torch.Tensor): The tensor of probabilities.

entropy(reduction: str | None = None) Tensor[source]
Overview:

Calculate the softmax entropy of logits

Arguments:
  • reduction (str): support [None, ‘mean’], default set to mean

Returns:
  • entropy (torch.Tensor): the calculated entropy

mode() Tensor[source]
Overview:

Return logits argmax result

Return:
  • result(torch.Tensor): the logits argmax result

neglogp(actions: Tensor, reduction: str = 'mean') Tensor[source]
Overview:

Calculate cross_entropy between input x and logits

Arguments:
  • actions (torch.Tensor): the input action tensor

  • reduction (str): support [None, ‘mean’], default set to mean

Return:
  • cross_entropy (torch.Tensor): the returned cross_entropy loss

sample() Tensor[source]
Overview:

Sample from logits’s distribution by using softmax

Return:
  • result (torch.Tensor): the logits sampled result

update_logits(logits: Tensor) None[source]
Overview:

Updata logits

Arguments:
  • logits (torch.Tensor): logits to update

update_probs(probs: Tensor) None[source]
Overview:

Updata probs

Arguments:
  • probs (torch.Tensor): probs to update

lr_scheduler

Please refer to ding/torch_utils/lr_scheduler for more details.

get_lr_ratio

ding.torch_utils.lr_scheduler.get_lr_ratio(epoch: int, warmup_epochs: int, learning_rate: float, lr_decay_epochs: int, min_lr: float) float[source]
Overview:

Get learning rate ratio for each epoch.

Arguments:
  • epoch (int): Current epoch.

  • warmup_epochs (int): Warmup epochs.

  • learning_rate (float): Learning rate.

  • lr_decay_epochs (int): Learning rate decay epochs.

  • min_lr (float): Minimum learning rate.

cos_lr_scheduler

ding.torch_utils.lr_scheduler.cos_lr_scheduler(optimizer: Optimizer, learning_rate: float, warmup_epochs: float = 5, lr_decay_epochs: float = 100, min_lr: float = 6e-05) LambdaLR[source]
Overview:

Cosine learning rate scheduler.

Arguments:
  • optimizer (torch.optim.Optimizer): Optimizer.

  • learning_rate (float): Learning rate.

  • warmup_epochs (float): Warmup epochs.

  • lr_decay_epochs (float): Learning rate decay epochs.

  • min_lr (float): Minimum learning rate.

math_helper

Please refer to ding/torch_utils/math_helper for more details.

cov

ding.torch_utils.math_helper.cov(x: Tensor, rowvar: bool = False, bias: bool = False, ddof: int | None = None, aweights: Tensor | None = None) Tensor[source]
Overview:

Estimates covariance matrix like numpy.cov.

Arguments:
  • x (torch.Tensor): A 1-D or 2-D tensor containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables.

  • rowvar (bool): If rowvar is True by default, and each column is a single observation of all those variables. Otherwise, each column represents a variable, while the rows contain observations.

  • bias (bool): Default normalization (False) is by dividing N - 1, where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N.

  • ddof (Optional[int]): If ddof is not None, it implies that the argument bias is overridden. Note that ddof=1 will return the unbiased estimate (equals to bias=False), and ddof=0 will return the biased estimation (equals to bias=True).

  • aweights (Optional[torch.Tensor]): 1-D tensor of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0, the tensor of weights can be used to assign weights to observation vectors.

Returns:
  • cov_mat (torch.Tensor): Covariance matrix calculated.

metric

Please refer to ding/torch_utils/metric for more details.

levenshtein_distance

ding.torch_utils.metric.levenshtein_distance(pred: LongTensor, target: LongTensor, pred_extra: Tensor | None = None, target_extra: Tensor | None = None, extra_fn: Callable | None = None) FloatTensor[source]
Overview:

Levenshtein Distance, i.e. Edit Distance.

Arguments:
  • pred (torch.LongTensor): The first tensor to calculate the distance, shape: (N1, ) (N1 >= 0).

  • target (torch.LongTensor): The second tensor to calculate the distance, shape: (N2, ) (N2 >= 0).

  • pred_extra (Optional[torch.Tensor]): Extra tensor to calculate the distance, only works when extra_fn is not None.

  • target_extra (Optional[torch.Tensor]): Extra tensor to calculate the distance, only works when extra_fn is not None.

  • extra_fn (Optional[Callable]): The distance function for pred_extra and target_extra. If set to None, this distance will not be considered.

Returns:
  • distance (torch.FloatTensor): distance(scalar), shape: (1, ).

hamming_distance

ding.torch_utils.metric.hamming_distance(pred: LongTensor, target: LongTensor, weight=1.0) LongTensor[source]
Overview:

Hamming Distance.

Arguments:
  • pred (torch.LongTensor): Pred input, boolean vector(0 or 1).

  • target (torch.LongTensor): Target input, boolean vector(0 or 1).

  • weight (torch.LongTensor): Weight to multiply.

Returns:
  • distance(torch.LongTensor): Distance (scalar), shape (1, ).

Shapes:
  • pred & target (torch.LongTensor): shape \((B, N)\), while B is the batch size, N is the dimension

model_helper

Please refer to ding/torch_utils/model_helper for more details.

get_num_params

ding.torch_utils.model_helper.get_num_params(model: Module) int[source]
Overview:

Return the number of parameters in the model.

Arguments:
  • model (torch.nn.Module): The model object to calculate the parameter number.

Returns:
  • n_params (int): The calculated number of parameters.

Examples:
>>> model = torch.nn.Linear(3, 5)
>>> num = get_num_params(model)
>>> assert num == 15

nn_test_helper

Please refer to ding/torch_utils/nn_test_helper for more details.

is_differentiable

ding.torch_utils.nn_test_helper.is_differentiable(loss: Tensor, model: Module | List[Module], print_instead: bool = False) None[source]
Overview:

Judge whether the model/models are differentiable. First check whether module’s grad is None, then do loss’s back propagation, finally check whether module’s grad are torch.Tensor.

Arguments:
  • loss (torch.Tensor): loss tensor of the model

  • model (Union[torch.nn.Module, List[torch.nn.Module]]): model or models to be checked

  • print_instead (bool): Whether to print module’s final grad result, instead of asserting. Default set to False.

optimizer_helper

Please refer to ding/torch_utils/optimizer_helper for more details.

calculate_grad_norm

ding.torch_utils.optimizer_helper.calculate_grad_norm(model: Module, norm_type=2) float[source]
Overview:

calculate grad norm of the parameters whose grad norms are not None in the model.

Arguments:
  • model: torch.nn.Module

  • norm_type (int or inf)

calculate_grad_norm_without_bias_two_norm

ding.torch_utils.optimizer_helper.calculate_grad_norm_without_bias_two_norm(model: Module) float[source]
Overview:

calculate grad norm of the parameters whose grad norms are not None in the model.

Arguments:
  • model: torch.nn.Module

grad_ignore_norm

ding.torch_utils.optimizer_helper.grad_ignore_norm(parameters, max_norm, norm_type=2)[source]
Overview:

Clip the gradient norm of an iterable of parameters.

Arguments:
  • parameters (Iterable): an iterable of torch.Tensor

  • max_norm (float): the max norm of the gradients

  • norm_type (float): 2.0 means use norm2 to clip

grad_ignore_value

ding.torch_utils.optimizer_helper.grad_ignore_value(parameters, clip_value)[source]
Overview:

Clip the gradient value of an iterable of parameters.

Arguments:
  • parameters (Iterable): an iterable of torch.Tensor

  • clip_value (float): the value to start clipping

Adam

class ding.torch_utils.optimizer_helper.Adam(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, optim_type: str = 'adam', grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]
Overview:

Rewrited Adam optimizer to support more features.

Interfaces:

__init__, step, _state_init, get_grad

__init__(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, optim_type: str = 'adam', grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]
Overview:

init method of refactored Adam class

Arguments:
  • params (iterable): – an iterable of torch.Tensor s or dict s. Specifies what Tensors should be optimized

  • lr (float): learning rate, default set to 1e-3

  • betas (Tuple[float, float]): coefficients used for computing running averages of gradient and its square, default set to (0.9, 0.999))

  • eps (float): term added to the denominator to improve numerical stability, default set to 1e-8

  • weight_decay (float): weight decay coefficient, deault set to 0

  • amsgrad (bool): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond <https://arxiv.org/abs/1904.09237>

  • optim_type (:obj:str): support [“adam”, “adamw”]

  • grad_clip_type (str): support [None, ‘clip_momentum’, ‘clip_value’, ‘clip_norm’, ‘clip_momentum_norm’]

  • clip_value (float): the value to start clipping

  • clip_coef (float): the cliping coefficient

  • clip_norm_type (float): 2.0 means use norm2 to clip

  • clip_momentum_timestep (int): after how many step should we start the momentum clipping

  • grad_ignore_type (str): support [None, ‘ignore_momentum’, ‘ignore_value’, ‘ignore_norm’, ‘ignore_momentum_norm’]

  • ignore_value (float): the value to start ignoring

  • ignore_coef (float): the ignoreing coefficient

  • ignore_norm_type (float): 2.0 means use norm2 to ignore

  • ignore_momentum_timestep (int): after how many step should we start the momentum ignoring

_optimizer_load_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer'], None]]
_optimizer_load_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]
_optimizer_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]
_optimizer_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer'], None]]
_optimizer_step_post_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], None]]
_optimizer_step_pre_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], Tuple[Tuple[Any, ...], Dict[str, Any]] | None]]
_state_init(p, amsgrad)[source]
Overview:

Initialize the state of the optimizer

Arguments:
  • p (torch.Tensor): the parameter to be optimized

  • amsgrad (bool): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond <https://arxiv.org/abs/1904.09237>

get_grad() float[source]
step(closure: Callable | None = None)[source]
Overview:

Performs a single optimization step

Arguments:
  • closure (callable): A closure that reevaluates the model and returns the loss, default set to None

RMSprop

class ding.torch_utils.optimizer_helper.RMSprop(params: Iterable, lr: float = 0.01, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0, centered: bool = False, grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]
Overview:

Rewrited RMSprop optimizer to support more features.

Interfaces:

__init__, step, _state_init, get_grad

__init__(params: Iterable, lr: float = 0.01, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0, centered: bool = False, grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]
Overview:

init method of refactored Adam class

Arguments:
  • params (iterable): – an iterable of torch.Tensor s or dict s. Specifies what Tensors should be optimized

  • lr (float): learning rate, default set to 1e-3

  • alpha (float): smoothing constant, default set to 0.99

  • eps (float): term added to the denominator to improve numerical stability, default set to 1e-8

  • weight_decay (float): weight decay coefficient, deault set to 0

  • centred (bool): if True, compute the centered RMSprop, the gradient is normalized by an estimation of its variance

  • grad_clip_type (str): support [None, ‘clip_momentum’, ‘clip_value’, ‘clip_norm’, ‘clip_momentum_norm’]

  • clip_value (float): the value to start clipping

  • clip_coef (float): the cliping coefficient

  • clip_norm_type (float): 2.0 means use norm2 to clip

  • clip_momentum_timestep (int): after how many step should we start the momentum clipping

  • grad_ignore_type (str): support [None, ‘ignore_momentum’, ‘ignore_value’, ‘ignore_norm’, ‘ignore_momentum_norm’]

  • ignore_value (float): the value to start ignoring

  • ignore_coef (float): the ignoreing coefficient

  • ignore_norm_type (float): 2.0 means use norm2 to ignore

  • ignore_momentum_timestep (int): after how many step should we start the momentum ignoring

_optimizer_load_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer'], None]]
_optimizer_load_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]
_optimizer_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]
_optimizer_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer'], None]]
_optimizer_step_post_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], None]]
_optimizer_step_pre_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], Tuple[Tuple[Any, ...], Dict[str, Any]] | None]]
_state_init(p, momentum, centered)[source]
Overview:

Initialize the state of the optimizer

Arguments:
  • p (torch.Tensor): the parameter to be optimized

  • momentum (float): the momentum coefficient

  • centered (bool): if True, compute the centered RMSprop, the gradient is normalized by an estimation of its variance

get_grad() float[source]
Overview:

calculate grad norm of the parameters whose grad norms are not None in the model.

step(closure: Callable | None = None)[source]
Overview:

Performs a single optimization step

Arguments:
  • closure (callable): A closure that reevaluates the model and returns the loss, default set to None

PCGrad

class ding.torch_utils.optimizer_helper.PCGrad(optimizer, reduction='mean')[source]
Overview:

PCGrad optimizer to support multi-task. you can view the paper in the following link https://arxiv.org/pdf/2001.06782.pdf

Interfaces:

__init__, zero_grad, step, pc_backward

Properties:
  • optimizer (torch.optim): the optimizer to be used

__init__(optimizer, reduction='mean')[source]
Overview:

Initialization of PCGrad optimizer

Arguments:
  • optimizer (torch.optim): the optimizer to be used

  • reduction (str): the reduction method, support [‘mean’, ‘sum’]

_flatten_grad(grads, shapes)[source]
Overview:

flatten the gradient of the parameters of the network

Arguments:
  • grads (list): a list of the gradient of the parameters

  • shapes (list): a list of the shape of the parameters

_pack_grad(objectives)[source]
Overview:

pack the gradient of the parameters of the network for each objective

Arguments:
  • objectives: a list of objectives

Returns:
  • grad: a list of the gradient of the parameters

  • shape: a list of the shape of the parameters

  • has_grad: a list of mask represent whether the parameter has gradient

_project_conflicting(grads, has_grads, shapes=None)[source]
Overview:

project the conflicting gradient to the orthogonal space

Arguments:
  • grads (list): a list of the gradient of the parameters

  • has_grads (list): a list of mask represent whether the parameter has gradient

  • shapes (list): a list of the shape of the parameters

_retrieve_grad()[source]
Overview:

get the gradient of the parameters of the network with specific objective

Returns:
  • grad: a list of the gradient of the parameters

  • shape: a list of the shape of the parameters

  • has_grad: a list of mask represent whether the parameter has gradient

_set_grad(grads)[source]
Overview:

set the modified gradients to the network

Arguments:
  • grads (list): a list of the gradient of the parameters

_unflatten_grad(grads, shapes)[source]
Overview:

unflatten the gradient of the parameters of the network

Arguments:
  • grads (list): a list of the gradient of the parameters

  • shapes (list): a list of the shape of the parameters

property optimizer
Overview:

get the optimizer

pc_backward(objectives)[source]
Overview:

calculate the gradient of the parameters

Arguments:
  • objectives: a list of objectives

step()[source]
Overview:

update the parameters with the gradient

zero_grad()[source]
Overview:

clear the gradient of the parameters

configure_weight_decay

ding.torch_utils.optimizer_helper.configure_weight_decay(model: Module, weight_decay: float) List[source]
Overview:

Separating out all parameters of the model into two buckets: those that will experience weight decay for regularization and those that won’t (biases, and layer-norm or embedding weights).

Arguments:
  • model (nn.Module): The given PyTorch model.

  • weight_decay (float): Weight decay value for optimizer.

Returns:
  • optim groups (List): The parameter groups to be set in the latter optimizer.

parameter

Please refer to ding/torch_utils/parameter for more details.

NonegativeParameter

class ding.torch_utils.parameter.NonegativeParameter(data: Tensor | None = None, requires_grad: bool = True, delta: float = 1e-08)[source]
Overview:

This module will output a non-negative parameter during the forward process.

Interfaces:

__init__, forward, set_data.

__init__(data: Tensor | None = None, requires_grad: bool = True, delta: float = 1e-08)[source]
Overview:

Initialize the NonegativeParameter object using the given arguments.

Arguments:
  • data (Optional[torch.Tensor]): The initial value of generated parameter. If set to None, the default value is 0.

  • requires_grad (bool): Whether this parameter requires grad.

  • delta (Any): The delta of log function.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward() Tensor[source]
Overview:

Output the non-negative parameter during the forward process.

Returns:

parameter (torch.Tensor): The generated parameter.

set_data(data: Tensor) None[source]
Overview:

Set the value of the non-negative parameter.

Arguments:

data (torch.Tensor): The new value of the non-negative parameter.

training: bool

TanhParameter

class ding.torch_utils.parameter.TanhParameter(data: Tensor | None = None, requires_grad: bool = True)[source]
Overview:

This module will output a tanh parameter during the forward process.

Interfaces:

__init__, forward, set_data.

__init__(data: Tensor | None = None, requires_grad: bool = True)[source]
Overview:

Initialize the TanhParameter object using the given arguments.

Arguments:
  • data (Optional[torch.Tensor]): The initial value of generated parameter. If set to None, the default value is 1.

  • requires_grad (bool): Whether this parameter requires grad.

_backward_hooks: Dict[int, Callable]
_backward_pre_hooks: Dict[int, Callable]
_buffers: Dict[str, Tensor | None]
_forward_hooks: Dict[int, Callable]
_forward_hooks_always_called: Dict[int, bool]
_forward_hooks_with_kwargs: Dict[int, bool]
_forward_pre_hooks: Dict[int, Callable]
_forward_pre_hooks_with_kwargs: Dict[int, bool]
_is_full_backward_hook: bool | None
_load_state_dict_post_hooks: Dict[int, Callable]
_load_state_dict_pre_hooks: Dict[int, Callable]
_modules: Dict[str, Module | None]
_non_persistent_buffers_set: Set[str]
_parameters: Dict[str, Parameter | None]
_state_dict_hooks: Dict[int, Callable]
_state_dict_pre_hooks: Dict[int, Callable]
forward() Tensor[source]
Overview:

Output the tanh parameter during the forward process.

Returns:

parameter (torch.Tensor): The generated parameter.

set_data(data: Tensor) None[source]
Overview:

Set the value of the tanh parameter.

Arguments:

data (torch.Tensor): The new value of the tanh parameter.

training: bool

reshape_helper

Please refer to ding/torch_utils/reshape_helper for more details.

fold_batch

ding.torch_utils.reshape_helper.fold_batch(x: Tensor, nonbatch_ndims: int = 1) Tuple[Tensor, Size][source]
Overview:

\((T, B, X) \leftarrow (T*B, X)\) Fold the first (ndim - nonbatch_ndims) dimensions of a tensor as batch dimension. This operation is similar to torch.flatten but provides an inverse function unfold_batch to restore the folded dimensions.

Arguments:
  • x (torch.Tensor): the tensor to fold

  • nonbatch_ndims (int): the number of dimensions that is not folded as

    batch dimension.

Returns:
  • x (torch.Tensor): the folded tensor

  • batch_dims: the folded dimensions of the original tensor, which can be used to

    reverse the operation

Examples:
>>> x = torch.ones(10, 20, 5, 4, 8)
>>> x, batch_dim = fold_batch(x, 2)
>>> x.shape == (1000, 4, 8)
>>> batch_dim == (10, 20, 5)

unfold_batch

ding.torch_utils.reshape_helper.unfold_batch(x: Tensor, batch_dims: Size | Tuple) Tensor[source]
Overview:

Unfold the batch dimension of a tensor.

Arguments:
  • x (torch.Tensor): the tensor to unfold

  • batch_dims (torch.Size): the dimensions that are folded

Returns:
  • x (torch.Tensor): the original unfolded tensor

Examples:
>>> x = torch.ones(10, 20, 5, 4, 8)
>>> x, batch_dim = fold_batch(x, 2)
>>> x.shape == (1000, 4, 8)
>>> batch_dim == (10, 20, 5)
>>> x = unfold_batch(x, batch_dim)
>>> x.shape == (10, 20, 5, 4, 8)

unsqueeze_repeat

ding.torch_utils.reshape_helper.unsqueeze_repeat(x: Tensor, repeat_times: int, unsqueeze_dim: int = 0) Tensor[source]
Overview:

Squeeze the tensor on unsqueeze_dim and then repeat in this dimension for repeat_times times. This is useful for preproprocessing the input to an model ensemble.

Arguments:
  • x (torch.Tensor): the tensor to squeeze and repeat

  • repeat_times (int): the times that the tensor is repeatd

  • unsqueeze_dim (int): the unsqueezed dimension

Returns:
  • x (torch.Tensor): the unsqueezed and repeated tensor

Examples:
>>> x = torch.ones(64, 6)
>>> x = unsqueeze_repeat(x, 4)
>>> x.shape == (4, 64, 6)
>>> x = torch.ones(64, 6)
>>> x = unsqueeze_repeat(x, 4, -1)
>>> x.shape == (64, 6, 4)