ding.torch_utils¶

loss¶

Please refer to ding/torch_utils/loss for more details.

ContrastiveLoss¶

class ding.torch_utils.loss.ContrastiveLoss(x_size: int | SequenceType, y_size: int | SequenceType, heads: SequenceType = [1, 1], encode_shape: int = 64, loss_type: str = 'infoNCE', temperature: float = 1.0)[source]¶

Overview:: The class for contrastive learning losses. Only InfoNCE loss is supported currently. Code Reference: https://github.com/rdevon/DIM. Paper Reference: https://arxiv.org/abs/1808.06670.
Interfaces:: __init__, forward.

__init__(x_size: int | SequenceType, y_size: int | SequenceType, heads: SequenceType = [1, 1], encode_shape: int = 64, loss_type: str = 'infoNCE', temperature: float = 1.0) → None[source]¶

Overview:

Initialize the ContrastiveLoss object using the given arguments.

Arguments:

x_size (Union[int, SequenceType]): input shape for x, both the obs shape and the encoding shape are supported.
y_size (Union[int, SequenceType]): Input shape for y, both the obs shape and the encoding shape are supported.
heads (SequenceType): A list of 2 int elems, heads[0] for x and head[1] for y. Used in multi-head, global-local, local-local MI maximization process.
encoder_shape (Union[int, SequenceType]): The dimension of encoder hidden state.
loss_type: Only the InfoNCE loss is available now.
temperature: The parameter to adjust the log_softmax.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_create_encoder(obs_size: int | SequenceType, heads: int) → Module[source]¶

Overview:

Create the encoder for the input obs.

Arguments:

obs_size (Union[int, SequenceType]): input shape for x, both the obs shape and the encoding shape are supported. If the obs_size is an int, it means the obs is a 1D vector. If the obs_size is a list such as [1, 16, 16], it means the obs is a 3D image with shape [1, 16, 16].
heads (int): The number of heads.

Returns:

encoder (nn.Module): The encoder module.

Examples:

>>> obs_size = 16
or
>>> obs_size = [1, 16, 16]
>>> heads = 1
>>> encoder = self._create_encoder(obs_size, heads)

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, y: Tensor) → Tensor[source]¶

Overview:

Computes the noise contrastive estimation-based loss, a.k.a. infoNCE.

Arguments:

x (torch.Tensor): The input x, both raw obs and encoding are supported.
y (torch.Tensor): The input y, both raw obs and encoding are supported.

Returns:

loss (torch.Tensor): The calculated loss value.

Examples:

>>> x_dim = [3, 16]
>>> encode_shape = 16
>>> x = np.random.normal(0, 1, size=x_dim)
>>> y = x ** 2 + 0.01 * np.random.normal(0, 1, size=x_dim)
>>> estimator = ContrastiveLoss(dims, dims, encode_shape=encode_shape)
>>> loss = estimator.forward(x, y)

Examples:

>>> x_dim = [3, 1, 16, 16]
>>> encode_shape = 16
>>> x = np.random.normal(0, 1, size=x_dim)
>>> y = x ** 2 + 0.01 * np.random.normal(0, 1, size=x_dim)
>>> estimator = ContrastiveLoss(dims, dims, encode_shape=encode_shape)
>>> loss = estimator.forward(x, y)

training: bool¶

LabelSmoothCELoss¶

class ding.torch_utils.loss.LabelSmoothCELoss(ratio: float)[source]¶

Overview:: Label smooth cross entropy loss.
Interfaces:: __init__, forward.

__init__(ratio: float) → None[source]¶

Overview:

Initialize the LabelSmoothCELoss object using the given arguments.

Arguments:

ratio (float): The ratio of label-smoothing (the value is in 0-1). If the ratio is larger, the extent of label smoothing is larger.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(logits: Tensor, labels: LongTensor) → Tensor[source]¶

Overview:

Calculate label smooth cross entropy loss.

Arguments:

logits (torch.Tensor): Predicted logits.
labels (torch.LongTensor): Ground truth.

Returns:

loss (torch.Tensor): Calculated loss.

training: bool¶

SoftFocalLoss¶

class ding.torch_utils.loss.SoftFocalLoss(gamma: int = 2, weight: Any | None = None, size_average: bool = True, reduce: bool | None = None)[source]¶

Overview:: Soft focal loss.
Interfaces:: __init__, forward.

__init__(gamma: int = 2, weight: Any | None = None, size_average: bool = True, reduce: bool | None = None) → None[source]¶

Overview:

Initialize the SoftFocalLoss object using the given arguments.

Arguments:

gamma (int): The extent of focus on hard samples. A smaller gamma will lead to more focus on easy samples, while a larger gamma will lead to more focus on hard samples.
weight (Any): The weight for loss of each class.
size_average (bool): By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False.
reduce (Optional[bool]): By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss for each batch element instead and ignores size_average.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(inputs: Tensor, targets: LongTensor) → Tensor[source]¶

Overview:

Calculate soft focal loss.

Arguments:

logits (torch.Tensor): Predicted logits.
labels (torch.LongTensor): Ground truth.

Returns:

loss (torch.Tensor): Calculated loss.

training: bool¶

build_ce_criterion¶

ding.torch_utils.loss.build_ce_criterion(cfg: dict) → Module[source]¶

Overview:

Get a cross entropy loss instance according to given config.

Arguments:

cfg (dict)Config dict. It contains:
- type (str): Type of loss function, now supports [‘cross_entropy’, ‘label_smooth_ce’, ‘soft_focal_loss’].
- kwargs (dict): Arguments for the corresponding loss function.

Returns:

loss (nn.Module): loss function instance

MultiLogitsLoss¶

class ding.torch_utils.loss.MultiLogitsLoss(criterion: str | None = None, smooth_ratio: float = 0.1)[source]¶

Overview:: Base class for supervised learning on linklink, including basic processes.
Interfaces:: __init__, forward.

__init__(criterion: str | None = None, smooth_ratio: float = 0.1) → None[source]¶

Overview:

Initialization method, use cross_entropy as default criterion.

Arguments:

criterion (str): Criterion type, supports [‘cross_entropy’, ‘label_smooth_ce’].
smooth_ratio (float): Smoothing ratio for label smoothing.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

static _get_distance_matrix(lx: ndarray, ly: ndarray, mat: ndarray, M: int) → ndarray[source]¶

Overview:

Get distance matrix.

Arguments:

lx (np.ndarray): lx.
ly (np.ndarray): ly.
mat (np.ndarray): mat.
M (int): M.

_get_metric_matrix(logits: Tensor, labels: LongTensor) → Tensor[source]¶

Overview:

Calculate the metric matrix.

Arguments:

logits (torch.Tensor): Predicted logits.
labels (torch.LongTensor): Ground truth.

Returns:

metric (torch.Tensor): Calculated metric matrix.

_is_full_backward_hook: bool | None¶

_label_process(logits: Tensor, labels: LongTensor) → LongTensor[source]¶

Overview:

Process the label according to the criterion.

Arguments:

logits (torch.Tensor): Predicted logits.
labels (torch.LongTensor): Ground truth.

Returns:

ret (torch.LongTensor): Processed label.

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_match(matrix: Tensor)[source]¶

Overview:

Match the metric matrix.

Arguments:

matrix (torch.Tensor): Metric matrix.

Returns:

index (np.ndarray): Matched index.

_modules: Dict[str, Module | None]¶

_nll_loss(nlls: Tensor, labels: LongTensor) → Tensor[source]¶

Overview:

Calculate the negative log likelihood loss.

Arguments:

nlls (torch.Tensor): Negative log likelihood loss.
labels (torch.LongTensor): Ground truth.

Returns:

ret (torch.Tensor): Calculated loss.

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(logits: Tensor, labels: LongTensor) → Tensor[source]¶

Overview:

Calculate multiple logits loss.

Arguments:

logits (torch.Tensor): Predicted logits, whose shape must be 2-dim, like (B, N).
labels (torch.LongTensor): Ground truth.

Returns:

loss (torch.Tensor): Calculated loss.

training: bool¶

network.activation¶

Please refer to ding/torch_utils/network/activation for more details.

Lambda¶

class ding.torch_utils.network.activation.Lambda(f: Callable)[source]¶

Overview:: A custom lambda module for constructing custom layers.
Interfaces:: __init__, forward.

__init__(f: Callable)[source]¶

Overview:

Initialize the lambda module with a given function.

Arguments:

f (Callable): a python function

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Compute the function of the input tensor.

Arguments:

x (torch.Tensor): The input tensor.

training: bool¶

GLU¶

class ding.torch_utils.network.activation.GLU(input_dim: int, output_dim: int, context_dim: int, input_type: str = 'fc')[source]¶

Overview:: Gating Linear Unit (GLU), a specific type of activation function, which is first proposed in [Language Modeling with Gated Convolutional Networks](https://arxiv.org/pdf/1612.08083.pdf).
Interfaces:: __init__, forward.

__init__(input_dim: int, output_dim: int, context_dim: int, input_type: str = 'fc') → None[source]¶

Overview:

Initialize the GLU module.

Arguments:

input_dim (int): The dimension of the input tensor.
output_dim (int): The dimension of the output tensor.
context_dim (int): The dimension of the context tensor.
input_type (str): The type of input, now supports [‘fc’, ‘conv2d’]

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, context: Tensor) → Tensor[source]¶

Overview:

Compute the GLU transformation of the input tensor.

Arguments:

x (torch.Tensor): The input tensor.
context (torch.Tensor): The context tensor.

Returns:

x (torch.Tensor): The output tensor after GLU transformation.

training: bool¶

Swish¶

class ding.torch_utils.network.activation.Swish[source]¶

Overview:: Swish activation function, which is a smooth, non-monotonic activation function. For more details, please refer to [Searching for Activation Functions](https://arxiv.org/pdf/1710.05941.pdf).
Interfaces:: __init__, forward.

__init__()[source]¶

Overview:: Initialize the Swish module.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Compute the Swish transformation of the input tensor.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

x (torch.Tensor): The output tensor after Swish transformation.

training: bool¶

GELU¶

class ding.torch_utils.network.activation.GELU[source]¶

Overview:: Gaussian Error Linear Units (GELU) activation function, which is widely used in NLP models like GPT, BERT. For more details, please refer to the original paper: https://arxiv.org/pdf/1606.08415.pdf.
Interfaces:: __init__, forward.

__init__()[source]¶

Overview:: Initialize the GELU module.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Compute the GELU transformation of the input tensor.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

x (torch.Tensor): The output tensor after GELU transformation.

training: bool¶

build_activation¶

ding.torch_utils.network.activation.build_activation(activation: str, inplace: bool | None = None) → Module[source]¶

Overview:

Build and return the activation module according to the given type.

Arguments:

activation (str): The type of activation module, now supports [‘relu’, ‘glu’, ‘prelu’, ‘swish’, ‘gelu’, ‘tanh’, ‘sigmoid’, ‘softplus’, ‘elu’, ‘square’, ‘identity’].
inplace (Optional[bool): Execute the operation in-place in activation, defaults to None.

Returns:

act_func (nn.module): The corresponding activation module.

network.diffusion¶

Please refer to ding/torch_utils/network/diffusion for more details.

extract¶

ding.torch_utils.network.diffusion.extract(a, t, x_shape)[source]¶

Overview:

extract output from a through index t.

Arguments:

a (torch.Tensor): input tensor
t (torch.Tensor): index tensor
x_shape (torch.Tensor): shape of x

cosine_beta_schedule¶

ding.torch_utils.network.diffusion.cosine_beta_schedule(timesteps: int, s: float = 0.008, dtype=torch.float32)[source]¶

Overview:

cosine schedule as proposed in https://openreview.net/forum?id=-NEXDKk8gZ

Arguments:

timesteps (int): timesteps of diffusion step
s (float): s
dtype (torch.dtype): dtype of beta

Return:

Tensor of beta [timesteps,], computing by cosine.

apply_conditioning¶

ding.torch_utils.network.diffusion.apply_conditioning(x, conditions, action_dim)[source]¶

Overview:

add condition into x

Arguments:

x (torch.Tensor): input tensor
conditions (dict): condition dict, key is timestep, value is condition
action_dim (int): action dim

DiffusionConv1d¶

class ding.torch_utils.network.diffusion.DiffusionConv1d(in_channels: int, out_channels: int, kernel_size: int, padding: int, activation: Module | None = None, n_groups: int = 8)[source]¶

Overview:: Conv1d with activation and normalization for diffusion models.
Interfaces:: __init__, forward

__init__(in_channels: int, out_channels: int, kernel_size: int, padding: int, activation: Module | None = None, n_groups: int = 8) → None[source]¶

Overview:

Create a 1-dim convlution layer with activation and normalization. This Conv1d have GropuNorm. And need add 1-dim when compute norm

Arguments:

in_channels (int): Number of channels in the input tensor
out_channels (int): Number of channels in the output tensor
kernel_size (int): Size of the convolving kernel
padding (int): Zero-padding added to both sides of the input
activation (nn.Module): the optional activation function

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(inputs) → Tensor[source]¶

Overview:

compute conv1d for inputs.

Arguments:

inputs (torch.Tensor): input tensor

Return:

out (torch.Tensor): output tensor

training: bool¶

SinusoidalPosEmb¶

class ding.torch_utils.network.diffusion.SinusoidalPosEmb(dim: int)[source]¶

Overview:: class for computing sin position embeding
Interfaces:: __init__, forward

__init__(dim: int) → None[source]¶

Overview:

Initialization of SinusoidalPosEmb class

Arguments:

dim (int): dimension of embeding

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x) → Tensor[source]¶

Overview:

compute sin position embeding

Arguments:

x (torch.Tensor): input tensor

Return:

emb (torch.Tensor): output tensor

training: bool¶

Residual¶

class ding.torch_utils.network.diffusion.Residual(fn)[source]¶

Overview:: Basic Residual block
Interfaces:: __init__, forward

__init__(fn)[source]¶

Overview:

Initialization of Residual class

Arguments:

fn (nn.Module): function of residual block

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x, *arg, **kwargs)[source]¶

Overview:

compute residual block

Arguments:

x (torch.Tensor): input tensor

training: bool¶

LayerNorm¶

class ding.torch_utils.network.diffusion.LayerNorm(dim, eps=1e-05)[source]¶

Overview:: LayerNorm, compute dim = 1, because Temporal input x [batch, dim, horizon]
Interfaces:: __init__, forward

__init__(dim, eps=1e-05) → None[source]¶

Overview:

Initialization of LayerNorm class

Arguments:

dim (int): dimension of input
eps (float): eps of LayerNorm

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x)[source]¶

Overview:

compute LayerNorm

Arguments:

x (torch.Tensor): input tensor

training: bool¶

PreNorm¶

class ding.torch_utils.network.diffusion.PreNorm(dim, fn)[source]¶

Overview:: PreNorm, compute dim = 1, because Temporal input x [batch, dim, horizon]
Interfaces:: __init__, forward

__init__(dim, fn) → None[source]¶

Overview:

Initialization of PreNorm class

Arguments:

dim (int): dimension of input
fn (nn.Module): function of residual block

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x)[source]¶

Overview:

compute PreNorm

Arguments:

x (torch.Tensor): input tensor

training: bool¶

LinearAttention¶

class ding.torch_utils.network.diffusion.LinearAttention(dim, heads=4, dim_head=32)[source]¶

Overview:: Linear Attention head
Interfaces:: __init__, forward

__init__(dim, heads=4, dim_head=32) → None[source]¶

Overview:

Initialization of LinearAttention class

Arguments:

dim (int): dimension of input
heads (int): heads of attention
dim_head (int): dim of head

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x)[source]¶

Overview:

compute LinearAttention

Arguments:

x (torch.Tensor): input tensor

training: bool¶

ResidualTemporalBlock¶

class ding.torch_utils.network.diffusion.ResidualTemporalBlock(in_channels: int, out_channels: int, embed_dim: int, kernel_size: int = 5, mish: bool = True)[source]¶

Overview:: Residual block of temporal
Interfaces:: __init__, forward

__init__(in_channels: int, out_channels: int, embed_dim: int, kernel_size: int = 5, mish: bool = True) → None[source]¶

Overview:

Initialization of ResidualTemporalBlock class

Arguments:

in_channels (:obj:’int’): dim of in_channels
out_channels (:obj:’int’): dim of out_channels
embed_dim (:obj:’int’): dim of embeding layer
kernel_size (:obj:’int’): kernel_size of conv1d
mish (:obj:’bool’): whether use mish as a activate function

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x, t)[source]¶

Overview:

compute residual block

Arguments:

x (:obj:’tensor’): input tensor
t (:obj:’tensor’): time tensor

training: bool¶

DiffusionUNet1d¶

class ding.torch_utils.network.diffusion.DiffusionUNet1d(transition_dim: int, dim: int = 32, dim_mults: SequenceType = [1, 2, 4, 8], returns_condition: bool = False, condition_dropout: float = 0.1, calc_energy: bool = False, kernel_size: int = 5, attention: bool = False)[source]¶

Overview:: Diffusion unet for 1d vector data
Interfaces:: __init__, forward, get_pred

__init__(transition_dim: int, dim: int = 32, dim_mults: SequenceType = [1, 2, 4, 8], returns_condition: bool = False, condition_dropout: float = 0.1, calc_energy: bool = False, kernel_size: int = 5, attention: bool = False) → None[source]¶

Overview:

Initialization of DiffusionUNet1d class

Arguments:

transition_dim (:obj:’int’): dim of transition, it is obs_dim + action_dim
dim (:obj:’int’): dim of layer
dim_mults (:obj:’SequenceType’): mults of dim
returns_condition (:obj:’bool’): whether use return as a condition
condition_dropout (:obj:’float’): dropout of returns condition
calc_energy (:obj:’bool’): whether use calc_energy
kernel_size (:obj:’int’): kernel_size of conv1d
attention (:obj:’bool’): whether use attention

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x, cond, time, returns=None, use_dropout: bool = True, force_dropout: bool = False)[source]¶

Overview:

compute diffusion unet forward

Arguments:

x (:obj:’tensor’): noise trajectory
cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0
time (:obj:’int’): timestep of diffusion step
returns (:obj:’tensor’): condition returns of trajectory, returns is normal return
use_dropout (:obj:’bool’): Whether use returns condition mask
force_dropout (:obj:’bool’): Whether use returns condition

get_pred(x, cond, time, returns: bool | None = None, use_dropout: bool = True, force_dropout: bool = False)[source]¶

Overview:

compute diffusion unet forward

Arguments:

x (:obj:’tensor’): noise trajectory
cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0
time (:obj:’int’): timestep of diffusion step
returns (:obj:’tensor’): condition returns of trajectory, returns is normal return
use_dropout (:obj:’bool’): Whether use returns condition mask
force_dropout (:obj:’bool’): Whether use returns condition

training: bool¶

TemporalValue¶

class ding.torch_utils.network.diffusion.TemporalValue(horizon: int, transition_dim: int, dim: int = 32, time_dim: int | None = None, out_dim: int = 1, kernel_size: int = 5, dim_mults: SequenceType = [1, 2, 4, 8])[source]¶

Overview:: temporal net for value function
Interfaces:: __init__, forward

__init__(horizon: int, transition_dim: int, dim: int = 32, time_dim: int | None = None, out_dim: int = 1, kernel_size: int = 5, dim_mults: SequenceType = [1, 2, 4, 8]) → None[source]¶

Overview:

Initialization of TemporalValue class

Arguments:

horizon (:obj:’int’): horizon of trajectory
transition_dim (:obj:’int’): dim of transition, it is obs_dim + action_dim
dim (:obj:’int’): dim of layer
time_dim (:obj:’int’): dim of time
out_dim (:obj:’int’): dim of output
kernel_size (:obj:’int’): kernel_size of conv1d
dim_mults (:obj:’SequenceType’): mults of dim

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x, cond, time, *args)[source]¶

Overview:

compute temporal value forward

Arguments:

x (:obj:’tensor’): noise trajectory
cond (:obj:’tuple’): [ (time, state), … ] state is init state of env, time = 0
time (:obj:’int’): timestep of diffusion step

training: bool¶

network.dreamer¶

Please refer to ding/torch_utils/network/dreamer for more details.

Conv2dSame¶

class ding.torch_utils.network.dreamer.Conv2dSame(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] = 1, padding: str | int | Tuple[int, int] = 0, dilation: int | Tuple[int, int] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶

Overview:: Conv2dSame Network for dreamerv3.
Interfaces:: __init__, forward

_reversed_padding_repeated_twice: List[int]¶

bias: Tensor | None¶

calc_same_pad(i, k, s, d)[source]¶

Overview:

Calculate the same padding size.

Arguments:

i (int): Input size.
k (int): Kernel size.
s (int): Stride size.
d (int): Dilation size.

dilation: Tuple[int, ...]¶

forward(x)[source]¶

Overview:

compute the forward of Conv2dSame.

Arguments:

x (torch.Tensor): Input tensor.

groups: int¶

in_channels: int¶

kernel_size: Tuple[int, ...]¶

out_channels: int¶

output_padding: Tuple[int, ...]¶

padding: str | Tuple[int, ...]¶

padding_mode: str¶

stride: Tuple[int, ...]¶

transposed: bool¶

weight: Tensor¶

DreamerLayerNorm¶

class ding.torch_utils.network.dreamer.DreamerLayerNorm(ch, eps=0.001)[source]¶

Overview:: DreamerLayerNorm Network for dreamerv3.
Interfaces:: __init__, forward

__init__(ch, eps=0.001)[source]¶

Overview:

Init the DreamerLayerNorm class.

Arguments:

ch (int): Input channel.
eps (float): Epsilon.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x)[source]¶

Overview:

compute the forward of DreamerLayerNorm.

Arguments:

x (torch.Tensor): Input tensor.

training: bool¶

DenseHead¶

class ding.torch_utils.network.dreamer.DenseHead(inp_dim, shape, layer_num, units, act='SiLU', norm='LN', dist='normal', std=1.0, outscale=1.0, device='cpu')[source]¶

Overview:: DenseHead Network for value head, reward head, and discount head of dreamerv3.
Interfaces:: __init__, forward

__init__(inp_dim, shape, layer_num, units, act='SiLU', norm='LN', dist='normal', std=1.0, outscale=1.0, device='cpu')[source]¶

Overview:

Init the DenseHead class.

Arguments:

inp_dim (int): Input dimension.
shape (tuple): Output shape.
layer_num (int): Number of layers.
units (int): Number of units.
act (str): Activation function.
norm (str): Normalization function.
dist (str): Distribution function.
std (float): Standard deviation.
outscale (float): Output scale.
device (str): Device.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(features)[source]¶

Overview:

compute the forward of DenseHead.

Arguments:

features (torch.Tensor): Input tensor.

training: bool¶

ActionHead¶

class ding.torch_utils.network.dreamer.ActionHead(inp_dim, size, layers, units, act=<class 'torch.nn.modules.activation.ELU'>, norm=<class 'torch.nn.modules.normalization.LayerNorm'>, dist='trunc_normal', init_std=0.0, min_std=0.1, max_std=1.0, temp=0.1, outscale=1.0, unimix_ratio=0.01)[source]¶

Overview:: ActionHead Network for action head of dreamerv3.
Interfaces:: __init__, forward

__init__(inp_dim, size, layers, units, act=<class 'torch.nn.modules.activation.ELU'>, norm=<class 'torch.nn.modules.normalization.LayerNorm'>, dist='trunc_normal', init_std=0.0, min_std=0.1, max_std=1.0, temp=0.1, outscale=1.0, unimix_ratio=0.01)[source]¶

Overview:

Initialize the ActionHead class.

Arguments:

inp_dim (int): Input dimension.
size (int): Output size.
layers (int): Number of layers.
units (int): Number of units.
act (str): Activation function.
norm (str): Normalization function.
dist (str): Distribution function.
init_std (float): Initial standard deviation.
min_std (float): Minimum standard deviation.
max_std (float): Maximum standard deviation.
temp (float): Temperature.
outscale (float): Output scale.
unimix_ratio (float): Unimix ratio.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(features)[source]¶

Overview:

compute the forward of ActionHead.

Arguments:

features (torch.Tensor): Input tensor.

training: bool¶

SampleDist¶

class ding.torch_utils.network.dreamer.SampleDist(dist, samples=100)[source]¶

Overview:: A kind of sample Dist for ActionHead of dreamerv3.
Interfaces:: __init__, mean, mode, entropy

__init__(dist, samples=100)[source]¶

Overview:

Initialize the SampleDist class.

Arguments:

dist (torch.Tensor): Distribution.
samples (int): Number of samples.

entropy()[source]¶

Overview:: Calculate the entropy of the distribution.

mean()[source]¶

Overview:: Calculate the mean of the distribution.

mode()[source]¶

Overview:: Calculate the mode of the distribution.

OneHotDist¶

class ding.torch_utils.network.dreamer.OneHotDist(logits=None, probs=None, unimix_ratio=0.0)[source]¶

Overview:: A kind of onehot Dist for dreamerv3.
Interfaces:: __init__, mode, sample

__init__(logits=None, probs=None, unimix_ratio=0.0)[source]¶

Overview:

Initialize the OneHotDist class.

Arguments:

logits (torch.Tensor): Logits.
probs (torch.Tensor): Probabilities.
unimix_ratio (float): Unimix ratio.

mode()[source]¶

Overview:: Calculate the mode of the distribution.

sample(sample_shape=(), seed=None)[source]¶

Overview:

Sample from the distribution.

Arguments:

sample_shape (tuple): Sample shape.
seed (int): Seed.

TwoHotDistSymlog¶

class ding.torch_utils.network.dreamer.TwoHotDistSymlog(logits=None, low=-20.0, high=20.0, device='cpu')[source]¶

Overview:: A kind of twohotsymlog Dist for dreamerv3.
Interfaces:: __init__, mode, mean, log_prob, log_prob_target

__init__(logits=None, low=-20.0, high=20.0, device='cpu')[source]¶

Overview:

Initialize the TwoHotDistSymlog class.

Arguments:

logits (torch.Tensor): Logits.
low (float): Low.
high (float): High.
device (str): Device.

log_prob(x)[source]¶

Overview:

Calculate the log probability of the distribution.

Arguments:

x (torch.Tensor): Input tensor.

log_prob_target(target)[source]¶

Overview:

Calculate the log probability of the target.

Arguments:

target (torch.Tensor): Target tensor.

mean()[source]¶

Overview:: Calculate the mean of the distribution.

mode()[source]¶

Overview:: Calculate the mode of the distribution.

SymlogDist¶

class ding.torch_utils.network.dreamer.SymlogDist(mode, dist='mse', aggregation='sum', tol=1e-08, dim_to_reduce=[-1, -2, -3])[source]¶

Overview:: A kind of Symlog Dist for dreamerv3.
Interfaces:: __init__, entropy, mode, mean, log_prob

__init__(mode, dist='mse', aggregation='sum', tol=1e-08, dim_to_reduce=[-1, -2, -3])[source]¶

Overview:

Initialize the SymlogDist class.

Arguments:

mode (torch.Tensor): Mode.
dist (str): Distribution function.
aggregation (str): Aggregation function.
tol (float): Tolerance.
dim_to_reduce (list): Dimension to reduce.

log_prob(value)[source]¶

Overview:

Calculate the log probability of the distribution.

Arguments:

value (torch.Tensor): Input tensor.

mean()[source]¶

Overview:: Calculate the mean of the distribution.

mode()[source]¶

Overview:: Calculate the mode of the distribution.

ContDist¶

class ding.torch_utils.network.dreamer.ContDist(dist=None)[source]¶

Overview:: A kind of ordinary Dist for dreamerv3.
Interfaces:: __init__, entropy, mode, sample, log_prob

__init__(dist=None)[source]¶

Overview:

Initialize the ContDist class.

Arguments:

dist (torch.Tensor): Distribution.

entropy()[source]¶

Overview:: Calculate the entropy of the distribution.

log_prob(x)[source]¶

mode()[source]¶

Overview:: Calculate the mode of the distribution.

sample(sample_shape=())[source]¶

Overview:

Sample from the distribution.

Arguments:

sample_shape (tuple): Sample shape.

Bernoulli¶

class ding.torch_utils.network.dreamer.Bernoulli(dist=None)[source]¶

Overview:: A kind of Bernoulli Dist for dreamerv3.
Interfaces:: __init__, entropy, mode, sample, log_prob

__init__(dist=None)[source]¶

Overview:

Initialize the Bernoulli distribution.

Arguments:

dist (torch.Tensor): Distribution.

entropy()[source]¶

Overview:: Calculate the entropy of the distribution.

log_prob(x)[source]¶

Overview:

Calculate the log probability of the distribution.

Arguments:

x (torch.Tensor): Input tensor.

mode()[source]¶

Overview:: Calculate the mode of the distribution.

sample(sample_shape=())[source]¶

Overview:

Sample from the distribution.

Arguments:

sample_shape (tuple): Sample shape.

network.gtrxl¶

Please refer to ding/torch_utils/network/gtrxl for more details.

PositionalEmbedding¶

class ding.torch_utils.network.gtrxl.PositionalEmbedding(embedding_dim: int)[source]¶

Overview:: The PositionalEmbedding module implements the positional embedding used in the vanilla Transformer model.
Interfaces:: __init__, forward

Note

This implementation is adapted from https://github.com/kimiyoung/transformer-xl/blob/ master/pytorch/mem_transformer.py

__init__(embedding_dim: int)[source]¶

Overview:

Initialize the PositionalEmbedding module.

Arguments:

embedding_dim: (int): The dimensionality of the embeddings.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(pos_seq: Tensor) → Tensor[source]¶

Overview:

Compute positional embedding given a sequence of positions.

Arguments:

pos_seq (torch.Tensor): The positional sequence, typically a 1D tensor of integers in the form of [seq_len-1, seq_len-2, …, 1, 0],

Returns:

pos_embedding (torch.Tensor): The computed positional embeddings. The shape of the tensor is (seq_len, 1, embedding_dim).

training: bool¶

GRUGatingUnit¶

class ding.torch_utils.network.gtrxl.GRUGatingUnit(input_dim: int, bg: float = 2.0)[source]¶

Overview:: The GRUGatingUnit module implements the GRU gating mechanism used in the GTrXL model.
Interfaces:: __init__, forward

__init__(input_dim: int, bg: float = 2.0)[source]¶

Overview:

Initialize the GRUGatingUnit module.

Arguments:

input_dim (int): The dimensionality of the input.
bg (bg): The gate bias. By setting bg > 0 we can explicitly initialize the gating mechanism to be close to the identity map. This can greatly improve the learning speed and stability since it initializes the agent close to a Markovian policy (ignore attention at the beginning).

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, y: Tensor)[source]¶

Overview:

Compute the output value using the GRU gating mechanism.

Arguments:

x: (torch.Tensor): The first input tensor.
y: (torch.Tensor): The second input tensor. x and y should have the same shape and their last dimension should match the input_dim.

Returns:

g: (torch.Tensor): The output of the GRU gating mechanism. The shape of g matches the shapes of x and y.

training: bool¶

Memory¶

class ding.torch_utils.network.gtrxl.Memory(memory_len: int = 20, batch_size: int = 64, embedding_dim: int = 256, layer_num: int = 3, memory: Tensor | None = None)[source]¶

Overview:: A class that stores the context used to add memory to Transformer.
Interfaces:: __init__, init, update, get, to

Note

For details, refer to Transformer-XL: https://arxiv.org/abs/1901.02860

__init__(memory_len: int = 20, batch_size: int = 64, embedding_dim: int = 256, layer_num: int = 3, memory: Tensor | None = None) → None[source]¶

Overview:

Initialize the Memory module.

Arguments:

memory_len (int): The dimension of memory, i.e., how many past observations to use as memory.
batch_size (int): The dimension of each batch.
embedding_dim (int): The dimension of embedding, which is the dimension of a single observation after embedding.
layer_num (int): The number of transformer layers.
memory (Optional[torch.Tensor]): The initial memory. Default is None.

get()[source]¶

Overview:

Get the current memory.

Returns:

memory: (Optional[torch.Tensor]): The current memory, with shape (layer_num, memory_len, bs, embedding_dim).

init(memory: Tensor | None = None)[source]¶

Overview:

Initialize memory with an input list of tensors or create it automatically given its dimensions.

Arguments:

memory (Optional[torch.Tensor]): Input memory tensor with shape (layer_num, memory_len, bs, embedding_dim). Its shape is (layer_num, memory_len, bs, embedding_dim), where memory_len is length of memory, bs is batch size and embedding_dim is the dimension of embedding.

to(device: str = 'cpu')[source]¶

Overview:: Move the current memory to the specified device.
Arguments:: device (str): The device to move the memory to. Default is ‘cpu’.

update(hidden_state: List[Tensor])[source]¶

Overview:

Update the memory given a sequence of hidden states. Example for single layer: (memory_len=3, hidden_size_len=2, bs=3)

m00 m01 m02 h00 h01 h02 m20 m21 m22

m = m10 m11 m12 h = h10 h11 h12 => new_m = h00 h01 h02
m20 m21 m22 h10 h11 h12

Arguments:

hidden_state: (List[torch.Tensor]): The hidden states to update the memory. Each tensor in the list has shape (cur_seq, bs, embedding_dim), where cur_seq is the length of the sequence.

Returns:

memory: (Optional[torch.Tensor]): The updated memory, with shape (layer_num, memory_len, bs, embedding_dim).

AttentionXL¶

class ding.torch_utils.network.gtrxl.AttentionXL(input_dim: int, head_dim: int, head_num: int, dropout: Module)[source]¶

Overview:: An implementation of the Attention mechanism used in the TransformerXL model.
Interfaces:: __init__, forward

__init__(input_dim: int, head_dim: int, head_num: int, dropout: Module) → None[source]¶

Overview:

Initialize the AttentionXL module.

Arguments:

input_dim (int): The dimensionality of the input features.
head_dim (int): The dimensionality of each attention head.
head_num (int): The number of attention heads.
dropout (nn.Module): The dropout layer to use

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_rel_shift(x: Tensor, zero_upper: bool = False) → Tensor[source]¶

Overview:

Perform a relative shift operation on the attention score matrix. Example:

a00 a01 a02 0 a00 a01 a02 0 a00 a01 a02 0 a10 a02 0 0 a10 a11 a12 => 0 a10 a11 a12 => a02 0 a10 => a11 a12 0 => a11 a12 0 a20 a21 a22 0 a20 a21 a22 a11 a12 0 a20 a21 a22 a20 a21 a22

a20 a21 a22

Append one “column” of zeros to the left

Reshape the matrix from [3 x 4] into [4 x 3]

Remove the first “row”

Mask out the upper triangle (optional)

Note

See the following material for better understanding: https://github.com/kimiyoung/transformer-xl/issues/8 https://arxiv.org/pdf/1901.02860.pdf (Appendix B)

Arguments:

x (torch.Tensor): The input tensor with shape (cur_seq, full_seq, bs, head_num).
zero_upper (bool): If True, the upper-right triangle of the matrix is set to zero.

Returns:

x (torch.Tensor): The input tensor after the relative shift operation, with shape (cur_seq, full_seq, bs, head_num).

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(inputs: Tensor, pos_embedding: Tensor, full_input: Tensor, u: Parameter, v: Parameter, mask: Tensor | None = None) → Tensor[source]¶

Overview:

Compute the forward pass for the AttentionXL module.

Arguments:

inputs (torch.Tensor): The attention input with shape (cur_seq, bs, input_dim).
pos_embedding (torch.Tensor): The positional embedding with shape (full_seq, 1, full_seq).
full_input (torch.Tensor): The concatenated memory and input tensor with shape (full_seq, bs, input_dim).
u (torch.nn.Parameter): The content parameter with shape (head_num, head_dim).
v (torch.nn.Parameter): The position parameter with shape (head_num, head_dim).
mask (Optional[torch.Tensor]): The attention mask with shape (cur_seq, full_seq, 1). If None, no masking is applied.

Returns:

output (torch.Tensor): The output of the attention mechanism with shape (cur_seq, bs, input_dim).

training: bool¶

GatedTransformerXLLayer¶

class ding.torch_utils.network.gtrxl.GatedTransformerXLLayer(input_dim: int, head_dim: int, hidden_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module, gru_gating: bool = True, gru_bias: float = 2.0)[source]¶

Overview:: This class implements the attention layer of GTrXL (Gated Transformer-XL).
Interfaces:: __init__, forward

__init__(input_dim: int, head_dim: int, hidden_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module, gru_gating: bool = True, gru_bias: float = 2.0) → None[source]¶

Overview:

Initialize GatedTransformerXLLayer.

Arguments:

input_dim (int): The dimension of the input tensor.
head_dim (int): The dimension of each head in the multi-head attention.
hidden_dim (int): The dimension of the hidden layer in the MLP.
head_num (int): The number of heads for the multi-head attention.
mlp_num (int): The number of MLP layers in the attention layer.
dropout (nn.Module): The dropout module used in the MLP and attention layers.
activation (nn.Module): The activation function to be used in the MLP layers.
gru_gating (bool, optional): Whether to use GRU gates. If False, replace GRU gates with residual connections. Default is True.
gru_bias (float, optional): The bias of the GRU gate. Default is 2.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(inputs: Tensor, pos_embedding: Tensor, u: Parameter, v: Parameter, memory: Tensor, mask: Tensor | None = None) → Tensor[source]¶

Overview:

Compute forward pass of GTrXL layer.

Arguments:

inputs (torch.Tensor): The attention input tensor of shape (cur_seq, bs, input_dim).
pos_embedding (torch.Tensor): The positional embedding tensor of shape (full_seq, 1, full_seq).
u (torch.nn.Parameter): The content parameter tensor of shape (head_num, head_dim).
v (torch.nn.Parameter): The position parameter tensor of shape (head_num, head_dim).
memory (torch.Tensor): The memory tensor of shape (prev_seq, bs, input_dim).
mask (Optional[torch.Tensor]): The attention mask tensor of shape (cur_seq, full_seq, 1).
Default is None.

Returns:

output (torch.Tensor): layer output of shape (cur_seq, bs, input_dim)

training: bool¶

GTrXL¶

class ding.torch_utils.network.gtrxl.GTrXL(input_dim: int, head_dim: int = 128, embedding_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, memory_len: int = 64, dropout_ratio: float = 0.0, activation: Module = ReLU(), gru_gating: bool = True, gru_bias: float = 2.0, use_embedding_layer: bool = True)[source]¶

Overview:: GTrXL Transformer implementation as described in “Stabilizing Transformer for Reinforcement Learning” (https://arxiv.org/abs/1910.06764).
Interfaces:: __init__, forward, reset_memory, get_memory

__init__(input_dim: int, head_dim: int = 128, embedding_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, memory_len: int = 64, dropout_ratio: float = 0.0, activation: Module = ReLU(), gru_gating: bool = True, gru_bias: float = 2.0, use_embedding_layer: bool = True) → None[source]¶

Overview:

Init GTrXL Model.

Arguments:

input_dim (int): The dimension of the input observation.
head_dim (int, optional): The dimension of each head. Default is 128.
embedding_dim (int, optional): The dimension of the embedding. Default is 256.
head_num (int, optional): The number of heads for multi-head attention. Default is 2.
mlp_num (int, optional): The number of MLP layers in the attention layer. Default is 2.
layer_num (int, optional): The number of transformer layers. Default is 3.
memory_len (int, optional): The length of memory. Default is 64.
dropout_ratio (float, optional): The dropout ratio. Default is 0.
activation (nn.Module, optional): The activation function. Default is nn.ReLU().
gru_gating (bool, optional): If False, replace GRU gates with residual connections. Default is True.
gru_bias (float, optional): The GRU gate bias. Default is 2.0.
use_embedding_layer (bool, optional): If False, don’t use input embedding layer. Default is True.

Raises:

AssertionError: If embedding_dim is not an even number.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, batch_first: bool = False, return_mem: bool = True) → Dict[str, Tensor][source]¶

Overview:

Performs a forward pass on the GTrXL.

Arguments:

x (torch.Tensor): The input tensor with shape (seq_len, bs, input_size).
batch_first (bool, optional): If the input data has shape (bs, seq_len, input_size), set this parameter to True to transpose along the first and second dimension and obtain shape (seq_len, bs, input_size). This does not affect the output memory. Default is False. - return_mem (bool, optional): If False, return only the output tensor without dict. Default is True.

Returns:

x (Dict[str, torch.Tensor]): A dictionary containing the transformer output of shape (seq_len, bs, embedding_size) and memory of shape (layer_num, seq_len, bs, embedding_size).

get_memory()[source]¶

Overview:

Returns the memory of GTrXL.

Returns:

memory (Optional[torch.Tensor]): The output memory or None if memory has not been initialized. The shape is (layer_num, memory_len, bs, embedding_dim).

reset_memory(batch_size: int | None = None, state: Tensor | None = None)[source]¶

Overview:

Clear or set the memory of GTrXL.

Arguments:

batch_size (Optional[int]): The batch size. Default is None.
state (Optional[torch.Tensor]): The input memory with shape (layer_num, memory_len, bs, embedding_dim). Default is None.

training: bool¶

network.gumbel_softmax¶

Please refer to ding/torch_utils/network/gumbel_softmax for more details.

GumbelSoftmax¶

class ding.torch_utils.network.gumbel_softmax.GumbelSoftmax[source]¶

Overview:: An nn.Module that computes GumbelSoftmax.
Interfaces:: __init__, forward, gumbel_softmax_sample

Note

For more information on GumbelSoftmax, refer to the paper [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144).

__init__() → None[source]¶

Overview:: Initialize the GumbelSoftmax module.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, temperature: float = 1.0, hard: bool = False) → Tensor[source]¶

Overview:

Forward pass for the GumbelSoftmax module.

Arguments:

x (torch.Tensor): Unnormalized log-probabilities.
temperature (float): Non-negative scalar controlling the sharpness of the distribution.
hard (bool): If True, returns one-hot encoded labels. Default is False.

Returns:

output (torch.Tensor): Sample from Gumbel-Softmax distribution.

Shapes:

x: its shape is \((B, N)\), where B is the batch size and N is the number of classes.
y: its shape is \((B, N)\), where B is the batch size and N is the number of classes.

gumbel_softmax_sample(x: Tensor, temperature: float, eps: float = 1e-08) → Tensor[source]¶

Overview:

Draw a sample from the Gumbel-Softmax distribution.

Arguments:

x (torch.Tensor): Input tensor.
temperature (float): Non-negative scalar controlling the sharpness of the distribution.
eps (float): Small number to prevent division by zero, default is 1e-8.

Returns:

output (torch.Tensor): Sample from Gumbel-Softmax distribution.

training: bool¶

network.merge¶

Please refer to ding/torch_utils/network/merge for more details.

BilinearGeneral¶

class ding.torch_utils.network.merge.BilinearGeneral(in1_features: int, in2_features: int, out_features: int)[source]¶

Overview:: Bilinear implementation as in: Multiplicative Interactions and Where to Find Them, ICLR 2020, https://openreview.net/forum?id=rylnK6VtDH.
Interfaces:: __init__, forward

__init__(in1_features: int, in2_features: int, out_features: int)[source]¶

Overview:

Initialize the Bilinear layer.

Arguments:

in1_features (int): The size of each first input sample.
in2_features (int): The size of each second input sample.
out_features (int): The size of each output sample.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, z: Tensor)[source]¶

Overview:

compute the bilinear function.

Arguments:

x (torch.Tensor): The first input tensor.
z (torch.Tensor): The second input tensor.

reset_parameters()[source]¶

Overview:: Initialize the parameters of the Bilinear layer.

training: bool¶

TorchBilinearCustomized¶

class ding.torch_utils.network.merge.TorchBilinearCustomized(in1_features: int, in2_features: int, out_features: int)[source]¶

Overview:: Customized Torch Bilinear implementation.
Interfaces:: __init__, forward

__init__(in1_features: int, in2_features: int, out_features: int)[source]¶

Overview:

Initialize the Bilinear layer.

Arguments:

in1_features (int): The size of each first input sample.
in2_features (int): The size of each second input sample.
out_features (int): The size of each output sample.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x, z)[source]¶

Overview:

Compute the bilinear function.

Arguments:

x (torch.Tensor): The first input tensor.
z (torch.Tensor): The second input tensor.

reset_parameters()[source]¶

Overview:: Initialize the parameters of the Bilinear layer.

training: bool¶

FiLM¶

class ding.torch_utils.network.merge.FiLM(feature_dim: int, context_dim: int)[source]¶

Overview:: Feature-wise Linear Modulation (FiLM) Layer. This layer applies feature-wise affine transformation based on context.
Interfaces:: __init__, forward

__init__(feature_dim: int, context_dim: int)[source]¶

Overview:

Initialize the FiLM layer.

Arguments:

feature_dim (int). The dimension of the input feature vector.
context_dim (int). The dimension of the input context vector.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(feature: Tensor, context: Tensor)[source]¶

Overview:

Forward propagation.

Arguments:

feature (torch.Tensor). The input feature, shape (batch_size, feature_dim).
context (torch.Tensor). The input context, shape (batch_size, context_dim).

Returns:

conditioned_feature : torch.Tensor. The output feature after FiLM, shape (batch_size, feature_dim).

training: bool¶

GatingType¶

class ding.torch_utils.network.merge.GatingType(value)[source]¶

Overview:: Enum class defining different types of tensor gating and aggregation in modules.

GLOBAL = 'global'¶

NONE = 'none'¶

POINTWISE = 'pointwise'¶

SumMerge¶

class ding.torch_utils.network.merge.SumMerge(*args, **kwargs)[source]¶

Overview:: A PyTorch module that merges a list of tensors by computing their sum. All input tensors must have the same size. This module can work with any type of tensor (vector, units or visual).
Interfaces:: __init__, forward

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(tensors: List[Tensor]) → Tensor[source]¶

Overview:

Forward pass of the SumMerge module, which sums the input tensors.

Arguments:

tensors (List[Tensor]): List of input tensors to be summed. All tensors must have the same size.

Returns:

summed (Tensor): Tensor resulting from the sum of all input tensors.

training: bool¶

VectorMerge¶

class ding.torch_utils.network.merge.VectorMerge(input_sizes: Dict[str, int], output_size: int, gating_type: GatingType = GatingType.NONE, use_layer_norm: bool = True)[source]¶

Overview:: Merges multiple vector streams. Streams are first transformed through layer normalization, relu, and linear layers, then summed. They don’t need to have the same size. Gating can also be used before the sum.
Interfaces:: __init__, encode, _compute_gate, forward

Note

For more details about the gating types, please refer to the GatingType enum class.

__init__(input_sizes: Dict[str, int], output_size: int, gating_type: GatingType = GatingType.NONE, use_layer_norm: bool = True)[source]¶

Overview:

Initialize the VectorMerge module.

Arguments:

input_sizes (Dict[str, int]): A dictionary mapping input names to their sizes. The size is a single integer for 1D inputs, or None for 0D inputs. If an input size is None, we assume it’s ().
output_size (int): The size of the output vector.
gating_type (GatingType): The type of gating mechanism to use. Default is GatingType.NONE.
use_layer_norm (bool): Whether to use layer normalization. Default is True.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_compute_gate(init_gate: List[Tensor]) → List[Tensor][source]¶

Overview:

Compute the gate values based on the initial gate values.

Arguments:

init_gate (List[Tensor]): The initial gate values.

Returns:

gate (List[Tensor]): The computed gate values.

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

encode(inputs: Dict[str, Tensor]) → Tuple[List[Tensor], List[Tensor]][source]¶

Overview:

Encode the input tensors using layer normalization, relu, and linear transformations.

Arguments:

inputs (Dict[str, Tensor]): The input tensors.

Returns:

gates (List[Tensor]): The gate tensors after transformations.
outputs (List[Tensor]): The output tensors after transformations.

forward(inputs: Dict[str, Tensor]) → Tensor[source]¶

Overview:

Forward pass through the VectorMerge module.

Arguments:

inputs (Dict[str, Tensor]): The input tensors.

Returns:

output (Tensor): The output tensor after passing through the module.

training: bool¶

network.nn_module¶

Please refer to ding/torch_utils/network/nn_module for more details.

weight_init ¶

ding.torch_utils.network.nn_module.weight_init_(weight: Tensor, init_type: str = 'xavier', activation: str | None = None) → None[source]¶

Overview:

Initialize weight according to the specified type.

Arguments:

weight (torch.Tensor): The weight that needs to be initialized.
init_type (str, optional): The type of initialization to implement, supports [“xavier”, “kaiming”, “orthogonal”].
activation (str, optional): The activation function name. Recommended to use only with [‘relu’, ‘leaky_relu’].

sequential_pack¶

ding.torch_utils.network.nn_module.sequential_pack(layers: List[Module]) → Sequential[source]¶

Overview:

Pack the layers in the input list to a nn.Sequential module. If there is a convolutional layer in module, an extra attribute out_channels will be added to the module and set to the out_channel of the conv layer.

Arguments:

layers (List[nn.Module]): The input list of layers.

Returns:

seq (nn.Sequential): Packed sequential container.

conv1d_block¶

ding.torch_utils.network.nn_module.conv1d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, activation: Module | None = None, norm_type: str | None = None) → Sequential[source]¶

Overview:

Create a 1-dimensional convolution layer with activation and normalization.

Arguments:

in_channels (int): Number of channels in the input tensor.
out_channels (int): Number of channels in the output tensor.
kernel_size (int): Size of the convolving kernel.
stride (int, optional): Stride of the convolution. Default is 1.
padding (int, optional): Zero-padding added to both sides of the input. Default is 0.
dilation (int, optional): Spacing between kernel elements. Default is 1.
groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.
activation (nn.Module, optional): The optional activation function.
norm_type (str, optional): Type of the normalization.

Returns:

block (nn.Sequential): A sequential list containing the torch layers of the 1-dimensional convolution layer.

Note

Conv1d (https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d)

conv2d_block¶

ding.torch_utils.network.nn_module.conv2d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, pad_type: str = 'zero', activation: Module | None = None, norm_type: str | None = None, num_groups_for_gn: int = 1, bias: bool = True) → Sequential[source]¶

Overview:

Create a 2-dimensional convolution layer with activation and normalization.

Arguments:

in_channels (int): Number of channels in the input tensor.
out_channels (int): Number of channels in the output tensor.
kernel_size (int): Size of the convolving kernel.
stride (int, optional): Stride of the convolution. Default is 1.
padding (int, optional): Zero-padding added to both sides of the input. Default is 0.
dilation (int): Spacing between kernel elements.
groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.
pad_type (str, optional): The way to add padding, include [‘zero’, ‘reflect’, ‘replicate’]. Default is ‘zero’.
activation (nn.Module): the optional activation function.
norm_type (str): The type of the normalization, now support [‘BN’, ‘LN’, ‘IN’, ‘GN’, ‘SyncBN’], default set to None, which means no normalization.
num_groups_for_gn (int): Number of groups for GroupNorm.
bias (bool): whether to add a learnable bias to the nn.Conv2d. Default is True.

Returns:

block (nn.Sequential): A sequential list containing the torch layers of the 2-dimensional convolution layer.

Note

Conv2d (https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d)

deconv2d_block¶

ding.torch_utils.network.nn_module.deconv2d_block(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, output_padding: int = 0, groups: int = 1, activation: int | None = None, norm_type: int | None = None) → Sequential[source]¶

Overview:

Create a 2-dimensional transpose convolution layer with activation and normalization.

Arguments:

in_channels (int): Number of channels in the input tensor.
out_channels (int): Number of channels in the output tensor.
kernel_size (int): Size of the convolving kernel.
stride (int, optional): Stride of the convolution. Default is 1.
padding (int, optional): Zero-padding added to both sides of the input. Default is 0.
output_padding (int, optional): Additional size added to one side of the output shape. Default is 0.
groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.
activation (int, optional): The optional activation function.
norm_type (int, optional): Type of the normalization.

Returns:

block (nn.Sequential): A sequential list containing the torch layers of the 2-dimensional transpose convolution layer.

Note

ConvTranspose2d (https://pytorch.org/docs/master/generated/torch.nn.ConvTranspose2d.html)

fc_block¶

ding.torch_utils.network.nn_module.fc_block(in_channels: int, out_channels: int, activation: Module | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5) → Sequential[source]¶

Overview:

Create a fully-connected block with activation, normalization, and dropout. Optional normalization can be done to the dim 1 (across the channels). x -> fc -> norm -> act -> dropout -> out

Arguments:

in_channels (int): Number of channels in the input tensor.
out_channels (int): Number of channels in the output tensor.
activation (nn.Module, optional): The optional activation function.
norm_type (str, optional): Type of the normalization.
use_dropout (bool, optional): Whether to use dropout in the fully-connected block. Default is False.
dropout_probability (float, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.

Returns:

block (nn.Sequential): A sequential list containing the torch layers of the fully-connected block.

Note

You can refer to nn.linear (https://pytorch.org/docs/master/generated/torch.nn.Linear.html).

normed_linear¶

ding.torch_utils.network.nn_module.normed_linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None, scale: float = 1.0) → Linear[source]¶

Overview:

Create a nn.Linear module but with normalized fan-in init.

Arguments:

in_features (int): Number of features in the input tensor.
out_features (int): Number of features in the output tensor.
bias (bool, optional): Whether to add a learnable bias to the nn.Linear. Default is True.
device (torch.device, optional): The device to put the created module on. Default is None.
dtype (torch.dtype, optional): The desired data type of created module. Default is None.
scale (float, optional): The scale factor for initialization. Default is 1.0.

Returns:

out (nn.Linear): A nn.Linear module with normalized fan-in init.

normed_conv2d¶

ding.torch_utils.network.nn_module.normed_conv2d(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] = 1, padding: int | Tuple[int, int] = 0, dilation: int | Tuple[int, int] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None, scale: float = 1) → Conv2d[source]¶

Overview:

Create a nn.Conv2d module but with normalized fan-in init.

Arguments:

in_channels (int): Number of channels in the input tensor.
out_channels (int): Number of channels in the output tensor.
kernel_size (Union[int, Tuple[int, int]]): Size of the convolving kernel.
stride (Union[int, Tuple[int, int]], optional): Stride of the convolution. Default is 1.
padding (Union[int, Tuple[int, int]], optional): Zero-padding added to both sides of the input. Default is 0.
dilation (:Union[int, Tuple[int, int]], optional): Spacing between kernel elements. Default is 1.
groups (int, optional): Number of blocked connections from input channels to output channels. Default is 1.
bias (bool, optional): Whether to add a learnable bias to the nn.Conv2d. Default is True.
padding_mode (str, optional): The type of padding algorithm to use. Default is ‘zeros’.
device (torch.device, optional): The device to put the created module on. Default is None.
dtype (torch.dtype, optional): The desired data type of created module. Default is None.
scale (float, optional): The scale factor for initialization. Default is 1.

Returns:

out (nn.Conv2d): A nn.Conv2d module with normalized fan-in init.

MLP¶

ding.torch_utils.network.nn_module.MLP(in_channels: int, hidden_channels: int, out_channels: int, layer_num: int, layer_fn: Callable | None = None, activation: Module | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5, output_activation: bool = True, output_norm: bool = True, last_linear_layer_init_zero: bool = False)[source]¶

Overview:

Create a multi-layer perceptron using fully-connected blocks with activation, normalization, and dropout, optional normalization can be done to the dim 1 (across the channels). x -> fc -> norm -> act -> dropout -> out

Arguments:

in_channels (int): Number of channels in the input tensor.
hidden_channels (int): Number of channels in the hidden tensor.
out_channels (int): Number of channels in the output tensor.
layer_num (int): Number of layers.
layer_fn (Callable, optional): Layer function.
activation (nn.Module, optional): The optional activation function.
norm_type (str, optional): The type of the normalization.
use_dropout (bool, optional): Whether to use dropout in the fully-connected block. Default is False.
dropout_probability (float, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.
output_activation (bool, optional): Whether to use activation in the output layer. If True, we use the same activation as front layers. Default is True.
output_norm (bool, optional): Whether to use normalization in the output layer. If True, we use the same normalization as front layers. Default is True.
last_linear_layer_init_zero (bool, optional): Whether to use zero initializations for the last linear layer (including w and b), which can provide stable zero outputs in the beginning, usually used in the policy network in RL settings.

Returns:

block (nn.Sequential): A sequential list containing the torch layers of the multi-layer perceptron.

Note

you can refer to nn.linear (https://pytorch.org/docs/master/generated/torch.nn.Linear.html).

ChannelShuffle¶

class ding.torch_utils.network.nn_module.ChannelShuffle(group_num: int)[source]¶

Overview:: Apply channel shuffle to the input tensor. For more details about the channel shuffle, please refer to the ‘ShuffleNet’ paper: https://arxiv.org/abs/1707.01083
Interfaces:: __init__, forward

__init__(group_num: int) → None[source]¶

Overview:

Initialize the ChannelShuffle class.

Arguments:

group_num (int): The number of groups to exchange.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Forward pass through the ChannelShuffle module.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

x (torch.Tensor): The shuffled input tensor.

training: bool¶

one_hot¶

ding.torch_utils.network.nn_module.one_hot(val: LongTensor, num: int, num_first: bool = False) → FloatTensor[source]¶

Overview:

Convert a torch.LongTensor to one-hot encoding. This implementation can be slightly faster than torch.nn.functional.one_hot.

Arguments:

val (torch.LongTensor): Each element contains the state to be encoded, the range should be [0, num-1]
num (int): Number of states of the one-hot encoding
num_first (bool, optional): If False, the one-hot encoding is added as the last dimension; otherwise, it is added as the first dimension. Default is False.

Returns:

one_hot (torch.FloatTensor): The one-hot encoded tensor.

Example:

>>> one_hot(2*torch.ones([2,2]).long(),3)
tensor([[[0., 0., 1.],
         [0., 0., 1.]],
        [[0., 0., 1.],
         [0., 0., 1.]]])
>>> one_hot(2*torch.ones([2,2]).long(),3,num_first=True)
tensor([[[0., 0.], [1., 0.]],
        [[0., 1.], [0., 0.]],
        [[1., 0.], [0., 1.]]])

NearestUpsample¶

class ding.torch_utils.network.nn_module.NearestUpsample(scale_factor: float | List[float])[source]¶

Overview:: This module upsamples the input to the given scale_factor using the nearest mode.
Interfaces:: __init__, forward

__init__(scale_factor: float | List[float]) → None[source]¶

Overview:

Initialize the NearestUpsample class.

Arguments:

scale_factor (Union[float, List[float]]): The multiplier for the spatial size.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Return the upsampled input tensor.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

upsample(torch.Tensor): The upsampled input tensor.

training: bool¶

BilinearUpsample¶

class ding.torch_utils.network.nn_module.BilinearUpsample(scale_factor: float | List[float])[source]¶

Overview:: This module upsamples the input to the given scale_factor using the bilinear mode.
Interfaces:: __init__, forward

__init__(scale_factor: float | List[float]) → None[source]¶

Overview:

Initialize the BilinearUpsample class.

Arguments:

scale_factor (Union[float, List[float]]): The multiplier for the spatial size.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Return the upsampled input tensor.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

upsample(torch.Tensor): The upsampled input tensor.

training: bool¶

binary_encode¶

ding.torch_utils.network.nn_module.binary_encode(y: Tensor, max_val: Tensor) → Tensor[source]¶

Overview:

Convert elements in a tensor to its binary representation.

Arguments:

y (torch.Tensor): The tensor to be converted into its binary representation.
max_val (torch.Tensor): The maximum value of the elements in the tensor.

Returns:

binary (torch.Tensor): The input tensor in its binary representation.

Example:

>>> binary_encode(torch.tensor([3,2]),torch.tensor(8))
tensor([[0, 0, 1, 1],[0, 0, 1, 0]])

NoiseLinearLayer¶

class ding.torch_utils.network.nn_module.NoiseLinearLayer(in_channels: int, out_channels: int, sigma0: int = 0.4)[source]¶

Overview:: This is a linear layer with random noise.
Interfaces:: __init__, reset_noise, reset_parameters, forward

__init__(in_channels: int, out_channels: int, sigma0: int = 0.4) → None[source]¶

Overview:

Initialize the NoiseLinearLayer class. The ‘enable_noise’ attribute enables external control over whether noise is applied. - If enable_noise is True, the layer adds noise even if the module is in evaluation mode. - If enable_noise is False, no noise is added regardless of self.training.

Arguments:

in_channels (int): Number of channels in the input tensor.
out_channels (int): Number of channels in the output tensor.
sigma0 (int, optional): Default noise volume when initializing NoiseLinearLayer. Default is 0.4.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_scale_noise(size: int | Tuple)[source]¶

Overview:

Scale the noise.

Arguments:

size (Union[int, Tuple]): The size of the noise.

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor)[source]¶

Overview:

Perform the forward pass with noise.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

output (torch.Tensor): The output tensor with noise.

reset_noise()[source]¶

Overview:: Reset the noise settings in the layer.

reset_parameters()[source]¶

Overview:: Reset the parameters in the layer.

training: bool¶

noise_block¶

ding.torch_utils.network.nn_module.noise_block(in_channels: int, out_channels: int, activation: str | None = None, norm_type: str | None = None, use_dropout: bool = False, dropout_probability: float = 0.5, sigma0: float = 0.4)[source]¶

Overview:

Create a fully-connected noise layer with activation, normalization, and dropout. Optional normalization can be done to the dim 1 (across the channels).

Arguments:

in_channels (int): Number of channels in the input tensor.
out_channels (int): Number of channels in the output tensor.
activation (str, optional): The optional activation function. Default is None.
norm_type (str, optional): Type of normalization. Default is None.
use_dropout (bool, optional): Whether to use dropout in the fully-connected block.
dropout_probability (float, optional): Probability of an element to be zeroed in the dropout. Default is 0.5.
sigma0 (float, optional): The sigma0 is the default noise volume when initializing NoiseLinearLayer. Default is 0.4.

Returns:

block (nn.Sequential): A sequential list containing the torch layers of the fully-connected block.

NaiveFlatten¶

class ding.torch_utils.network.nn_module.NaiveFlatten(start_dim: int = 1, end_dim: int = -1)[source]¶

Overview:: This module is a naive implementation of the flatten operation.
Interfaces:: __init__, forward

__init__(start_dim: int = 1, end_dim: int = -1) → None[source]¶

Overview:

Initialize the NaiveFlatten class.

Arguments:

start_dim (int, optional): The first dimension to flatten. Default is 1.
end_dim (int, optional): The last dimension to flatten. Default is -1.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Perform the flatten operation on the input tensor.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

output (torch.Tensor): The flattened output tensor.

training: bool¶

network.normalization¶

Please refer to ding/torch_utils/network/normalization for more details.

build_normalization¶

ding.torch_utils.network.normalization.build_normalization(norm_type: str, dim: int | None = None) → Module[source]¶

Overview:

Construct the corresponding normalization module. For beginners, refer to [this article](https://zhuanlan.zhihu.com/p/34879333) to learn more about batch normalization.

Arguments:

norm_type (str): Type of the normalization. Currently supports [‘BN’, ‘LN’, ‘IN’, ‘SyncBN’].
dim (Optional[int]): Dimension of the normalization, applicable when norm_type is in [‘BN’, ‘IN’].

Returns:

norm_func (nn.Module): The corresponding batch normalization function.

network.popart¶

Please refer to ding/torch_utils/network/popart for more details.

PopArt¶

class ding.torch_utils.network.popart.PopArt(input_features: int | None = None, output_features: int | None = None, beta: float = 0.5)[source]¶

Overview:: A linear layer with popart normalization. This class implements a linear transformation followed by PopArt normalization, which is a method to automatically adapt the contribution of each task to the agent’s updates in multi-task learning, as described in the paper <https://arxiv.org/abs/1809.04474>.
Interfaces:: __init__, reset_parameters, forward, update_parameters

__init__(input_features: int | None = None, output_features: int | None = None, beta: float = 0.5) → None[source]¶

Overview:

Initialize the class with input features, output features, and the beta parameter.

Arguments:

input_features (Union[int, None]): The size of each input sample.
output_features (Union[int, None]): The size of each output sample.
beta (float): The parameter for moving average.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Dict[str, Tensor][source]¶

Overview:

Implement the forward computation of the linear layer and return both the output and the normalized output of the layer.

Arguments:

x (torch.Tensor): Input tensor which is to be normalized.

Returns:

output (Dict[str, torch.Tensor]): A dictionary contains ‘pred’ and ‘unnormalized_pred’.

reset_parameters()[source]¶

Overview:: Reset the parameters including weights and bias using kaiming_uniform_ and uniform_ initialization.

training: bool¶

update_parameters(value: Tensor) → Dict[str, Tensor][source]¶

Overview:

Update the normalization parameters based on the given value and return the new mean and standard deviation after the update.

Arguments:

value (torch.Tensor): The tensor to be used for updating parameters.

Returns:

update_results (Dict[str, torch.Tensor]): A dictionary contains ‘new_mean’ and ‘new_std’.

network.res_block¶

Please refer to ding/torch_utils/network/res_block for more details.

ResBlock¶

class ding.torch_utils.network.res_block.ResBlock(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', res_type: str = 'basic', bias: bool = True, out_channels: int | None = None)[source]¶

Overview:

Residual Block with 2D convolution layers, including 3 types:

basic block:: input channel: C x -> 3*3*C -> norm -> act -> 3*3*C -> norm -> act -> out __________________________________________/+
bottleneck block:: x -> 1*1*(1/4*C) -> norm -> act -> 3*3*(1/4*C) -> norm -> act -> 1*1*C -> norm -> act -> out _____________________________________________________________________________/+
downsample block: used in EfficientZero: input channel: C x -> 3*3*C -> norm -> act -> 3*3*C -> norm -> act -> out __________________ 3*3*C ____________________/+

Note

You can refer to Deep Residual Learning for Image Recognition for more details.

Interfaces:: __init__, forward

__init__(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', res_type: str = 'basic', bias: bool = True, out_channels: int | None = None) → None[source]¶

Overview:

Init the 2D convolution residual block.

Arguments:

in_channels (int): Number of channels in the input tensor.
activation (nn.Module): The optional activation function.
norm_type (str): Type of the normalization, default set to ‘BN’(Batch Normalization), supports [‘BN’, ‘LN’, ‘IN’, ‘GN’, ‘SyncBN’, None].
res_type (str): Type of residual block, supports [‘basic’, ‘bottleneck’, ‘downsample’]
bias (bool): Whether to add a learnable bias to the conv2d_block. default set to True.
out_channels (int): Number of channels in the output tensor, default set to None, which means out_channels = in_channels.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Return the redisual block output.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

x (torch.Tensor): The resblock output tensor.

training: bool¶

ResFCBlock¶

class ding.torch_utils.network.res_block.ResFCBlock(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', dropout: float | None = None)[source]¶

Overview:: Residual Block with 2 fully connected layers. x -> fc1 -> norm -> act -> fc2 -> norm -> act -> out _____________________________________/+
Interfaces:: __init__, forward

__init__(in_channels: int, activation: Module = ReLU(), norm_type: str = 'BN', dropout: float | None = None)[source]¶

Overview:

Init the fully connected layer residual block.

Arguments:

in_channels (int): The number of channels in the input tensor.
activation (nn.Module): The optional activation function.
norm_type (str): The type of the normalization, default set to ‘BN’.
dropout (float): The dropout rate, default set to None.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Return the output of the redisual block.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

x (torch.Tensor): The resblock output tensor.

training: bool¶

network.resnet¶

Please refer to ding/torch_utils/network/resnet for more details.

to_2tuple¶

ding.torch_utils.network.resnet.to_2tuple(item: int) → tuple[source]¶

Overview:

Convert a scalar to a 2-tuple or return the item if it’s not a scalar.

Arguments:

item (int): An item to be converted to a 2-tuple.

Returns:

(tuple): A 2-tuple of the item.

get_same_padding¶

ding.torch_utils.network.resnet.get_same_padding(x: int, k: int, s: int, d: int) → int[source]¶

Overview:

Calculate asymmetric TensorFlow-like ‘SAME’ padding for a convolution.

Arguments:

x (int): The size of the input.
k (int): The size of the kernel.
s (int): The stride of the convolution.
d (int): The dilation of the convolution.

Returns:

(int): The size of the padding.

pad_same¶

ding.torch_utils.network.resnet.pad_same(x, k: List[int], s: List[int], d: List[int] = (1, 1), value: float = 0)[source]¶

Overview:

Dynamically pad input x with ‘SAME’ padding for conv with specified args.

Arguments:

x (Tensor): The input tensor.
k (List[int]): The size of the kernel.
s (List[int]): The stride of the convolution.
d (List[int]): The dilation of the convolution.
value (float): Value to fill the padding.

Returns:

(Tensor): The padded tensor.

avg_pool2d_same¶

ding.torch_utils.network.resnet.avg_pool2d_same(x, kernel_size: List[int], stride: List[int], padding: List[int] = (0, 0), ceil_mode: bool = False, count_include_pad: bool = True)[source]¶

Overview:

Apply average pooling with ‘SAME’ padding on the input tensor.

Arguments:

x (Tensor): The input tensor.
kernel_size (List[int]): The size of the kernel.
stride (List[int]): The stride of the convolution.
padding (List[int]): The size of the padding.
ceil_mode (bool): When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool): When True, will include the zero-padding in the averaging calculation.

Returns:

(Tensor): The tensor after average pooling.

AvgPool2dSame¶

class ding.torch_utils.network.resnet.AvgPool2dSame(kernel_size: int, stride: Tuple[int, int] | None = None, padding: int = 0, ceil_mode: bool = False, count_include_pad: bool = True)[source]¶

Overview:: Tensorflow-like ‘SAME’ wrapper for 2D average pooling.
Interfaces:: __init__, forward

__init__(kernel_size: int, stride: Tuple[int, int] | None = None, padding: int = 0, ceil_mode: bool = False, count_include_pad: bool = True) → None[source]¶

Overview:

Initialize the AvgPool2dSame with given arguments.

Arguments:

kernel_size (int): The size of the window to take an average over.
stride (Optional[Tuple[int, int]]): The stride of the window. If None, default to kernel_size.
padding (int): Implicit zero padding to be added on both sides.
ceil_mode (bool): When True, will use ceil instead of floor to compute the output shape.
count_include_pad (bool): When True, will include the zero-padding in the averaging calculation.

ceil_mode: bool¶

count_include_pad: bool¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Forward pass of the AvgPool2dSame.

Argument:

x (torch.Tensor): Input tensor.

Returns:

(torch.Tensor): Output tensor after average pooling.

kernel_size: int | Tuple[int, int]¶

padding: int | Tuple[int, int]¶

stride: int | Tuple[int, int]¶

create_classifier¶

ding.torch_utils.network.resnet.create_classifier(num_features: int, num_classes: int, pool_type: str = 'avg', use_conv: bool = False) → Tuple[Module, Module][source]¶

Overview:

Create a classifier with global pooling layer and fully connected layer.

Arguments:

num_features (int): The number of features.
num_classes (int): The number of classes for the final classification.
pool_type (str): The type of pooling to use; ‘avg’ for Average Pooling.
use_conv (bool): Whether to use convolution or not.

Returns:

global_pool (nn.Module): The created global pooling layer.
fc (nn.Module): The created fully connected layer.

ClassifierHead¶

class ding.torch_utils.network.resnet.ClassifierHead(in_chs: int, num_classes: int, pool_type: str = 'avg', drop_rate: float = 0.0, use_conv: bool = False)[source]¶

Overview:: Classifier head with configurable global pooling and dropout.
Interfaces:: __init__, forward

__init__(in_chs: int, num_classes: int, pool_type: str = 'avg', drop_rate: float = 0.0, use_conv: bool = False) → None[source]¶

Overview:

Initialize the ClassifierHead with given arguments.

Arguments:

in_chs (int): Number of input channels.
num_classes (int): Number of classes for the final classification.
pool_type (str): The type of pooling to use; ‘avg’ for Average Pooling.
drop_rate (float): The dropout rate.
use_conv (bool): Whether to use convolution or not.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Forward pass of the ClassifierHead.

Argument:

x (torch.Tensor): Input tensor.

Returns:

(torch.Tensor): Output tensor after classification.

training: bool¶

create_attn¶

ding.torch_utils.network.resnet.create_attn(layer: Module, plane: int) → None[source]¶

Overview:

Create an attention mechanism.

Arguments:

layer (nn.Module): The layer where the attention is to be applied.
plane (int): The plane on which the attention is to be applied.

Returns:

None

get_padding¶

ding.torch_utils.network.resnet.get_padding(kernel_size: int, stride: int, dilation: int = 1) → int[source]¶

Overview:

Compute the padding based on the kernel size, stride and dilation.

Arguments:

kernel_size (int): The size of the kernel.
stride (int): The stride of the convolution.
dilation (int): The dilation factor.

Returns:

padding (int): The computed padding.

BasicBlock¶

class ding.torch_utils.network.resnet.BasicBlock(inplanes: int, planes: int, stride: int = 1, downsample: ~typing.Callable | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Callable = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Callable | None = None, aa_layer: ~typing.Callable | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None)[source]¶

Overview:

The basic building block for models like ResNet. This class extends pytorch’s Module class. It represents a standard block of layers including two convolutions, batch normalization, an optional attention mechanism, and activation functions.

Interfaces:

__init__, forward, zero_init_last_bn

Properties:

expansion (:obj:int): Specifies the expansion factor for the planes of the conv layers.

__init__(inplanes: int, planes: int, stride: int = 1, downsample: ~typing.Callable | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Callable = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Callable = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Callable | None = None, aa_layer: ~typing.Callable | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None) → None[source]¶

Overview:

Initialize the BasicBlock with given parameters.

Arguments:

inplanes (int): Number of input channels.
planes (int): Number of output channels.
stride (int): The stride of the convolutional layer.
downsample (Callable): Function for downsampling the inputs.
cardinality (int): Group size for grouped convolution.
base_width (int): Base width of the convolutions.
reduce_first (int): Reduction factor for first convolution of each block.
dilation (int): Spacing between kernel points.
first_dilation (int): First dilation value.
act_layer (Callable): Function for activation layer.
norm_layer (Callable): Function for normalization layer.
attn_layer (Callable): Function for attention layer.
aa_layer (Callable): Function for anti-aliasing layer.
drop_block (Callable): Method for dropping block.
drop_path (Callable): Method for dropping path.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

expansion = 1¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Defines the computation performed at every call.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

output (torch.Tensor): The output tensor after passing through the BasicBlock.

training: bool¶

zero_init_last_bn() → None[source]¶

Overview:: Initialize the batch normalization layer with zeros.

Bottleneck¶

class ding.torch_utils.network.resnet.Bottleneck(inplanes: int, planes: int, stride: int = 1, downsample: ~torch.nn.modules.module.Module | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, aa_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None)[source]¶

Overview:: The Bottleneck class is a basic block used to build ResNet networks. It is a part of the PyTorch’s implementation of ResNet. This block is designed with several layers including a convolutional layer, normalization layer, activation layer, attention layer, anti-aliasing layer, and a dropout layer.
Interfaces:: __init__, forward, zero_init_last_bn
Properties:: expansion, inplanes, planes, stride, downsample, cardinality, base_width, reduce_first, dilation, first_dilation, act_layer, norm_layer, attn_layer, aa_layer, drop_block, drop_path

__init__(inplanes: int, planes: int, stride: int = 1, downsample: ~torch.nn.modules.module.Module | None = None, cardinality: int = 1, base_width: int = 64, reduce_first: int = 1, dilation: int = 1, first_dilation: int | None = None, act_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, attn_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, aa_layer: ~typing.Type[~torch.nn.modules.module.Module] | None = None, drop_block: ~typing.Callable | None = None, drop_path: ~typing.Callable | None = None) → None[source]¶

Overview:

Initialize the Bottleneck class with various parameters.

Arguments:

inplanes (int): The number of input planes.
planes (int): The number of output planes.
stride (int, optional): The stride size, defaults to 1.
downsample (nn.Module, optional): The downsample method, defaults to None.
cardinality (int, optional): The size of the group convolutions, defaults to 1.
base_width (int, optional): The base width, defaults to 64.
reduce_first (int, optional): The first reduction factor, defaults to 1.
dilation (int, optional): The dilation factor, defaults to 1.
first_dilation (int, optional): The first dilation factor, defaults to None.
act_layer (Type[nn.Module], optional): The activation layer type, defaults to nn.ReLU.
norm_layer (Type[nn.Module], optional): The normalization layer type, defaults to nn.BatchNorm2d.
attn_layer (Type[nn.Module], optional): The attention layer type, defaults to None.
aa_layer (Type[nn.Module], optional): The anti-aliasing layer type, defaults to None.
drop_block (Callable): The dropout block, defaults to None.
drop_path (Callable): The drop path, defaults to None.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

expansion = 4¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Defines the computation performed at every call.

Arguments:

x (Tensor): The input tensor.

Returns:

x (Tensor): The output tensor resulting from the computation.

training: bool¶

zero_init_last_bn() → None[source]¶

Overview:: Initialize the last batch normalization layer with zero.

downsample_conv¶

ding.torch_utils.network.resnet.downsample_conv(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, first_dilation: int | None = None, norm_layer: Type[Module] | None = None) → Sequential[source]¶

Overview:

Create a sequential module for downsampling that includes a convolution layer and a normalization layer.

Arguments:

in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
kernel_size (int): The size of the kernel.
stride (int, optional): The stride size, defaults to 1.
dilation (int, optional): The dilation factor, defaults to 1.
first_dilation (int, optional): The first dilation factor, defaults to None.
norm_layer (Type[nn.Module], optional): The normalization layer type, defaults to nn.BatchNorm2d.

Returns:

nn.Sequential: A sequence of layers performing downsampling through convolution.

downsample_avg¶

ding.torch_utils.network.resnet.downsample_avg(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, dilation: int = 1, first_dilation: int | None = None, norm_layer: Type[Module] | None = None) → Sequential[source]¶

Overview:

Create a sequential module for downsampling that includes an average pooling layer, a convolution layer, and a normalization layer.

Arguments:

in_channels (int): The number of input channels.
out_channels (int): The number of output channels.
kernel_size (int): The size of the kernel.
stride (int, optional): The stride size, defaults to 1.
dilation (int, optional): The dilation factor, defaults to 1.
first_dilation (int, optional): The first dilation factor, defaults to None.
norm_layer (Type[nn.Module], optional): The normalization layer type, defaults to nn.BatchNorm2d.

Returns:

nn.Sequential: A sequence of layers performing downsampling through average pooling.

drop_blocks¶

ding.torch_utils.network.resnet.drop_blocks(drop_block_rate: float = 0.0) → List[None][source]¶

Overview:

Generate a list of None values based on the drop block rate.

Arguments:

drop_block_rate (float, optional): The drop block rate, defaults to 0.

Returns:

List[None]: A list of None values.

make_blocks¶

ding.torch_utils.network.resnet.make_blocks(block_fn: Type[Module], channels: List[int], block_repeats: List[int], inplanes: int, reduce_first: int = 1, output_stride: int = 32, down_kernel_size: int = 1, avg_down: bool = False, drop_block_rate: float = 0.0, drop_path_rate: float = 0.0, **kwargs) → Tuple[List[Tuple[str, Module]], List[Dict[str, int | str]]][source]¶

Overview:

Create a list of blocks for the network, with each block having a given number of repeats. Also, create a feature info list that contains information about the output of each block.

Arguments:

block_fn (Type[nn.Module]): The type of block to use.
channels (List[int]): The list of output channels for each block.
block_repeats (List[int]): The list of number of repeats for each block.
inplanes (int): The number of input planes.
reduce_first (int, optional): The first reduction factor, defaults to 1.
output_stride (int, optional): The total stride of the network, defaults to 32.
down_kernel_size (int, optional): The size of the downsample kernel, defaults to 1.
avg_down (bool, optional): Whether to use average pooling for downsampling, defaults to False.
drop_block_rate (float, optional): The drop block rate, defaults to 0.
drop_path_rate (float, optional): The drop path rate, defaults to 0.

Returns:

Tuple[List[Tuple[str, nn.Module]], List[Dict[str, Union[int, str]]]]: A tuple that includes a list of blocks for the network and a feature info list.

ResNet¶

class ding.torch_utils.network.resnet.ResNet(block: ~torch.nn.modules.module.Module, layers: ~typing.List[int], num_classes: int = 1000, in_chans: int = 3, cardinality: int = 1, base_width: int = 64, stem_width: int = 64, stem_type: str = '', replace_stem_pool: bool = False, output_stride: int = 32, block_reduce_first: int = 1, down_kernel_size: int = 1, avg_down: bool = False, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, aa_layer: ~torch.nn.modules.module.Module | None = None, drop_rate: float = 0.0, drop_path_rate: float = 0.0, drop_block_rate: float = 0.0, global_pool: str = 'avg', zero_init_last_bn: bool = True, block_args: dict | None = None)[source]¶

Overview:: Implements ResNet, ResNeXt, SE-ResNeXt, and SENet models. This implementation supports various modifications based on the v1c, v1d, v1e, and v1s variants included in the MXNet Gluon ResNetV1b model. For more details about the variants and options, please refer to the ‘Bag of Tricks’ paper: https://arxiv.org/pdf/1812.01187.
Interfaces:: __init__, forward, zero_init_last_bn, get_classifier

__init__(block: ~torch.nn.modules.module.Module, layers: ~typing.List[int], num_classes: int = 1000, in_chans: int = 3, cardinality: int = 1, base_width: int = 64, stem_width: int = 64, stem_type: str = '', replace_stem_pool: bool = False, output_stride: int = 32, block_reduce_first: int = 1, down_kernel_size: int = 1, avg_down: bool = False, act_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, norm_layer: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.batchnorm.BatchNorm2d'>, aa_layer: ~torch.nn.modules.module.Module | None = None, drop_rate: float = 0.0, drop_path_rate: float = 0.0, drop_block_rate: float = 0.0, global_pool: str = 'avg', zero_init_last_bn: bool = True, block_args: dict | None = None) → None[source]¶

Overview:

Initialize the ResNet model with given block, layers and other configuration options.

Arguments:

block (nn.Module): Class for the residual block.
layers (List[int]): Numbers of layers in each block.
num_classes (int, optional): Number of classification classes. Default is 1000.
in_chans (int, optional): Number of input (color) channels. Default is 3.
cardinality (int, optional): Number of convolution groups for 3x3 conv in Bottleneck. Default is 1.
base_width (int, optional): Factor determining bottleneck channels. Default is 64.
stem_width (int, optional): Number of channels in stem convolutions. Default is 64.
stem_type (str, optional): The type of stem. Default is ‘’.
replace_stem_pool (bool, optional): Whether to replace stem pooling. Default is False.
output_stride (int, optional): Output stride of the network. Default is 32.
block_reduce_first (int, optional): Reduction factor for first convolution output width of residual blocks. Default is 1.
down_kernel_size (int, optional): Kernel size of residual block downsampling path. Default is 1.
avg_down (bool, optional): Whether to use average pooling for projection skip connection between
stages/downsample. Default is False.
act_layer (nn.Module, optional): Activation layer. Default is nn.ReLU.
norm_layer (nn.Module, optional): Normalization layer. Default is nn.BatchNorm2d.
aa_layer (Optional[nn.Module], optional): Anti-aliasing layer. Default is None.
drop_rate (float, optional): Dropout probability before classifier, for training. Default is 0.0.
drop_path_rate (float, optional): Drop path rate. Default is 0.0.
drop_block_rate (float, optional): Drop block rate. Default is 0.0.
global_pool (str, optional): Global pooling type. Default is ‘avg’.
zero_init_last_bn (bool, optional): Whether to initialize last batch normalization with zero. Default is True.
block_args (Optional[dict], optional): Additional arguments for block. Default is None.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Full forward pass through the model.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

x (torch.Tensor): The output tensor after passing through the model.

forward_features(x: Tensor) → Tensor[source]¶

Overview:

Forward pass through the feature layers of the model.

Arguments:

x (torch.Tensor): The input tensor.

Returns:

x (torch.Tensor): The output tensor after passing through feature layers.

get_classifier() → Module[source]¶

Overview:

Get the classifier module from the model.

Returns:

classifier (nn.Module): The classifier module in the model.

init_weights(zero_init_last_bn: bool = True) → None[source]¶

Overview:

Initialize the weights in the model.

Arguments:

zero_init_last_bn (bool, optional): Whether to initialize last batch normalization with zero.
Default is True.

reset_classifier(num_classes: int, global_pool: str = 'avg') → None[source]¶

Overview:

Reset the classifier with a new number of classes and pooling type.

Arguments:

num_classes (int): New number of classification classes.
global_pool (str, optional): New global pooling type. Default is ‘avg’.

training: bool¶

resnet18¶

ding.torch_utils.network.resnet.resnet18() → Module[source]¶

Overview:

Creates a ResNet18 model.

Returns:

model (nn.Module): ResNet18 model.

network.rnn¶

Please refer to ding/torch_utils/network/rnn for more details.

is_sequence¶

ding.torch_utils.network.rnn.is_sequence(data)[source]¶

Overview:

Determines if the input data is of type list or tuple.

Arguments:

data: The input data to be checked.

Returns:

boolean: True if the input is a list or a tuple, False otherwise.

sequence_mask¶

ding.torch_utils.network.rnn.sequence_mask(lengths: Tensor, max_len: int | None = None) → BoolTensor[source]¶

Overview:

Generates a boolean mask for a batch of sequences with differing lengths.

Arguments:

lengths (torch.Tensor): A tensor with the lengths of each sequence. Shape could be (n, 1) or (n).
max_len (int, optional): The padding size. If max_len is None, the padding size is the max length of sequences.

Returns:

masks (torch.BoolTensor): A boolean mask tensor. The mask has the same device as lengths.

LSTMForwardWrapper¶

class ding.torch_utils.network.rnn.LSTMForwardWrapper[source]¶

Overview:: Class providing methods to use before and after the LSTM forward method. Wraps the LSTM forward method.
Interfaces:: _before_forward, _after_forward

_after_forward(next_state: Tuple[Tensor], list_next_state: bool = False) → List[Dict] | Dict[str, Tensor][source]¶

Overview:

Post-processes the next_state after the LSTM forward method.

Arguments:

next_state (Tuple[torch.Tensor]): Tuple containing the next state (h, c).
list_next_state (bool, optional): Determines the format of the returned next_state. If True, returns next_state in list format. Default is False.

Returns:

next_state(Union[List[Dict], Dict[str, torch.Tensor]]): The post-processed next_state.

_before_forward(inputs: Tensor, prev_state: None | List[Dict]) → Tensor[source]¶

Overview:

Preprocesses the inputs and previous states before the LSTM forward method.

Arguments:

inputs (torch.Tensor): Input vector of the LSTM cell. Shape: [seq_len, batch_size, input_size]
prev_state (Union[None, List[Dict]]): Previous state tensor. Shape: [num_directions*num_layers, batch_size, hidden_size]. If None, prv_state will be initialized to all zeros.

Returns:

prev_state (torch.Tensor): Preprocessed previous state for the LSTM batch.

LSTM¶

class ding.torch_utils.network.rnn.LSTM(input_size: int, hidden_size: int, num_layers: int, norm_type: str | None = None, dropout: float = 0.0)[source]¶

Overview:: Implementation of an LSTM cell with Layer Normalization (LN).
Interfaces:: __init__, forward

Note

For a primer on LSTM, refer to https://zhuanlan.zhihu.com/p/32085405.

__init__(input_size: int, hidden_size: int, num_layers: int, norm_type: str | None = None, dropout: float = 0.0) → None[source]¶

Overview:

Initialize LSTM cell parameters.

Arguments:

input_size (int): Size of the input vector.
hidden_size (int): Size of the hidden state vector.
num_layers (int): Number of LSTM layers.
norm_type (Optional[str]): Normalization type, default is None.
dropout (float): Dropout rate, default is 0.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_init()[source]¶

Overview:: Initialize the parameters of the LSTM cell.

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(inputs: Tensor, prev_state: Tensor, list_next_state: bool = True) → Tuple[Tensor, Tensor | list][source]¶

Overview:

Compute output and next state given previous state and input.

Arguments:

inputs (torch.Tensor): Input vector of cell, size [seq_len, batch_size, input_size].
prev_state (torch.Tensor): Previous state, size [num_directions*num_layers, batch_size, hidden_size].
list_next_state (bool): Whether to return next_state in list format, default is True.

Returns:

x (torch.Tensor): Output from LSTM.
next_state (Union[torch.Tensor, list]): Hidden state from LSTM.

training: bool¶

PytorchLSTM¶

class ding.torch_utils.network.rnn.PytorchLSTM(input_size: int, hidden_size: int, num_layers: int = 1, bias: bool = True, batch_first: bool = False, dropout: float = 0.0, bidirectional: bool = False, proj_size: int = 0, device=None, dtype=None)[source]¶

class ding.torch_utils.network.rnn.PytorchLSTM(*args, **kwargs)

Overview:: Wrapper class for PyTorch’s nn.LSTM, formats the input and output. For more details on nn.LSTM, refer to https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM
Interfaces:: forward

batch_first: bool¶

bias: bool¶

bidirectional: bool¶

dropout: float¶

forward(inputs: Tensor, prev_state: Tensor, list_next_state: bool = True) → Tuple[Tensor, Tensor | list][source]¶

Overview:

Executes nn.LSTM.forward with preprocessed input.

Arguments:

inputs (torch.Tensor): Input vector of cell, size [seq_len, batch_size, input_size].
prev_state (torch.Tensor): Previous state, size [num_directions*num_layers, batch_size, hidden_size].
list_next_state (bool): Whether to return next_state in list format, default is True.

Returns:

output (torch.Tensor): Output from LSTM.
next_state (Union[torch.Tensor, list]): Hidden state from LSTM.

hidden_size: int¶

input_size: int¶

mode: str¶

num_layers: int¶

proj_size: int¶

GRU¶

class ding.torch_utils.network.rnn.GRU(input_size: int, hidden_size: int, num_layers: int)[source]¶

Overview:: This class extends the torch.nn.GRUCell and LSTMForwardWrapper classes, and formats inputs and outputs accordingly.
Interfaces:: __init__, forward
Properties:: hidden_size, num_layers

Note

For further details, refer to the official PyTorch documentation: <https://pytorch.org/docs/stable/generated/torch.nn.GRU.html#torch.nn.GRU>

__init__(input_size: int, hidden_size: int, num_layers: int) → None[source]¶

Overview:

Initialize the GRU class with input size, hidden size, and number of layers.

Arguments:

input_size (int): The size of the input vector.
hidden_size (int): The size of the hidden state vector.
num_layers (int): The number of GRU layers.

bias: bool¶

forward(inputs: Tensor, prev_state: Tensor | None = None, list_next_state: bool = True) → Tuple[Tensor, Tensor | List][source]¶

Overview:

Wrap the nn.GRU.forward method.

Arguments:

inputs (torch.Tensor): Input vector of cell, tensor of size [seq_len, batch_size, input_size].
prev_state (Optional[torch.Tensor]): None or tensor of size [num_directions*num_layers, batch_size, hidden_size].
list_next_state (bool): Whether to return next_state in list format (default is True).

Returns:

output (torch.Tensor): Output from GRU.
next_state (torch.Tensor or list): Hidden state from GRU.

hidden_size: int¶

input_size: int¶

weight_hh: Tensor¶

weight_ih: Tensor¶

get_lstm¶

ding.torch_utils.network.rnn.get_lstm(lstm_type: str, input_size: int, hidden_size: int, num_layers: int = 1, norm_type: str = 'LN', dropout: float = 0.0, seq_len: int | None = None, batch_size: int | None = None) → LSTM | PytorchLSTM[source]¶

Overview:

Build and return the corresponding LSTM cell based on the provided parameters.

Arguments:

lstm_type (str): Version of RNN cell. Supported options are [‘normal’, ‘pytorch’, ‘hpc’, ‘gru’].
input_size (int): Size of the input vector.
hidden_size (int): Size of the hidden state vector.
num_layers (int): Number of LSTM layers (default is 1).
norm_type (str): Type of normalization (default is ‘LN’).
dropout (float): Dropout rate (default is 0.0).
seq_len (Optional[int]): Sequence length (default is None).
batch_size (Optional[int]): Batch size (default is None).

Returns:

lstm (Union[LSTM, PytorchLSTM]): The corresponding LSTM cell.

network.scatter_connection¶

Please refer to ding/torch_utils/network/scatter_connection for more details.

shape_fn_scatter_connection¶

ding.torch_utils.network.scatter_connection.shape_fn_scatter_connection(args, kwargs) → List[int][source]¶

Overview:

Return the shape of scatter_connection for HPC.

Arguments:

args (Tuple): The arguments passed to the scatter_connection function.
kwargs (Dict): The keyword arguments passed to the scatter_connection function.

Returns:

shape (List[int]): A list representing the shape of scatter_connection, in the form of [B, M, N, H, W, scatter_type].

ScatterConnection¶

class ding.torch_utils.network.scatter_connection.ScatterConnection(scatter_type: str)[source]¶

Overview:: Scatter feature to its corresponding location. In AlphaStar, each entity is embedded into a tensor, and these tensors are scattered into a feature map with map size.
Interfaces:: __init__, forward, xy_forward

__init__(scatter_type: str) → None[source]¶

Overview:

Initialize the ScatterConnection object.

Arguments:

scatter_type (str): The scatter type, which decides the behavior when two entities have the same location. It can be either ‘add’ or ‘cover’. If ‘add’, the first one will be added to the second one. If ‘cover’, the first one will be covered by the second one.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, spatial_size: Tuple[int, int], location: Tensor) → Tensor[source]¶

Overview:

Scatter input tensor ‘x’ into a spatial feature map.

Arguments:

x (torch.Tensor): The input tensor of shape (B, M, N), where B is the batch size, M is the number of entities, and N is the dimension of entity attributes.
spatial_size (Tuple[int, int]): The size (H, W) of the spatial feature map into which ‘x’ will be scattered, where H is the height and W is the width.
location (torch.Tensor): The tensor of locations of shape (B, M, 2). Each location should be (y, x).

Returns:

output (torch.Tensor): The scattered feature map of shape (B, N, H, W).

Note:

When there are some overlapping in locations, ‘cover’ mode will result in the loss of information. ‘add’ mode is used as a temporary substitute.

training: bool¶

xy_forward(x: Tensor, spatial_size: Tuple[int, int], coord_x: Tensor, coord_y) → Tensor[source]¶

Overview:

Scatter input tensor ‘x’ into a spatial feature map using separate x and y coordinates.

Arguments:

x (torch.Tensor): The input tensor of shape (B, M, N), where B is the batch size, M is the number of entities, and N is the dimension of entity attributes.
spatial_size (Tuple[int, int]): The size (H, W) of the spatial feature map into which ‘x’ will be scattered, where H is the height and W is the width.
coord_x (torch.Tensor): The x-coordinates tensor of shape (B, M).
coord_y (torch.Tensor): The y-coordinates tensor of shape (B, M).

Returns:

output (torch.Tensor): The scattered feature map of shape (B, N, H, W).

Note:

When there are some overlapping in locations, ‘cover’ mode will result in the loss of information. ‘add’ mode is used as a temporary substitute.

network.soft_argmax¶

Please refer to ding/torch_utils/network/soft_argmax for more details.

SoftArgmax¶

class ding.torch_utils.network.soft_argmax.SoftArgmax[source]¶

Overview:: A neural network module that computes the SoftArgmax operation (essentially a 2-dimensional spatial softmax), which is often used for location regression tasks. It converts a feature map (such as a heatmap) into precise coordinate locations.
Interfaces:: __init__, forward

Note

For more information on SoftArgmax, you can refer to <https://en.wikipedia.org/wiki/Softmax_function> and the paper <https://arxiv.org/pdf/1504.00702.pdf>.

__init__()[source]¶

Overview:: Initialize the SoftArgmax module.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor) → Tensor[source]¶

Overview:

Perform the forward pass of the SoftArgmax operation.

Arguments:

x (torch.Tensor): The input tensor, typically a heatmap representing predicted locations.

Returns:

location (torch.Tensor): The predicted coordinates as a result of the SoftArgmax operation.

Shapes:

x: \((B, C, H, W)\), where B is the batch size, C is the number of channels, and H and W represent height and width respectively.
location: \((B, 2)\), where B is the batch size and 2 represents the coordinates (height, width).

training: bool¶

network.transformer¶

Please refer to ding/torch_utils/network/transformer for more details.

Attention¶

class ding.torch_utils.network.transformer.Attention(input_dim: int, head_dim: int, output_dim: int, head_num: int, dropout: Module)[source]¶

Overview:: For each entry embedding, compute individual attention across all entries, add them up to get output attention.
Interfaces:: __init__, split, forward

__init__(input_dim: int, head_dim: int, output_dim: int, head_num: int, dropout: Module) → None[source]¶

Overview:

Initialize the Attention module with the provided dimensions and dropout layer.

Arguments:

input_dim (int): The dimension of the input.
head_dim (int): The dimension of each head in the multi-head attention mechanism.
output_dim (int): The dimension of the output.
head_num (int): The number of heads in the multi-head attention mechanism.
dropout (nn.Module): The dropout layer used in the attention mechanism.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, mask: Tensor | None = None) → Tensor[source]¶

Overview:

Compute the attention from the input tensor.

Arguments:

x (torch.Tensor): The input tensor for the forward computation.
mask (Optional[torch.Tensor], optional): Optional mask to exclude invalid entries.
Defaults to None.

Returns:

attention (torch.Tensor): The computed attention tensor.

split(x: Tensor, T: bool = False) → List[Tensor][source]¶

Overview:

Split the input to get multi-head queries, keys, and values.

Arguments:

x (torch.Tensor): The tensor to be split, which could be a query, key, or value.
T (bool, optional): If True, transpose the output tensors. Defaults to False.

Returns:

x (List[torch.Tensor]): A list of output tensors for each head.

training: bool¶

TransformerLayer¶

class ding.torch_utils.network.transformer.TransformerLayer(input_dim: int, head_dim: int, hidden_dim: int, output_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module)[source]¶

Overview:: In transformer layer, first computes entries’s attention and applies a feedforward layer.
Interfaces:: __init__, forward

__init__(input_dim: int, head_dim: int, hidden_dim: int, output_dim: int, head_num: int, mlp_num: int, dropout: Module, activation: Module) → None[source]¶

Overview:

Initialize the TransformerLayer with the provided dimensions, dropout layer, and activation function.

Arguments:

input_dim (int): The dimension of the input.
head_dim (int): The dimension of each head in the multi-head attention mechanism.
hidden_dim (int): The dimension of the hidden layer in the MLP (Multi-Layer Perceptron).
output_dim (int): The dimension of the output.
head_num (int): The number of heads in the multi-head attention mechanism.
mlp_num (int): The number of layers in the MLP.
dropout (nn.Module): The dropout layer used in the attention mechanism.
activation (nn.Module): The activation function used in the MLP.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(inputs: Tuple[Tensor, Tensor]) → Tuple[Tensor, Tensor][source]¶

Overview:

Compute the forward pass through the Transformer layer.

Arguments:

inputs (Tuple[torch.Tensor, torch.Tensor]): A tuple containing the input tensor x and
the mask tensor.

Returns:

output (Tuple[torch.Tensor, torch.Tensor]): A tuple containing the predicted value tensor and
the mask tensor.

training: bool¶

Transformer¶

class ding.torch_utils.network.transformer.Transformer(input_dim: int, head_dim: int = 128, hidden_dim: int = 1024, output_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, dropout_ratio: float = 0.0, activation: Module = ReLU())[source]¶

Overview:: Implementation of the Transformer model.

Note

For more details, refer to “Attention is All You Need”: http://arxiv.org/abs/1706.03762.

Interfaces:: __init__, forward

__init__(input_dim: int, head_dim: int = 128, hidden_dim: int = 1024, output_dim: int = 256, head_num: int = 2, mlp_num: int = 2, layer_num: int = 3, dropout_ratio: float = 0.0, activation: Module = ReLU())[source]¶

Overview:

Initialize the Transformer with the provided dimensions, dropout layer, activation function, and layer numbers.

Arguments:

input_dim (int): The dimension of the input.
head_dim (int): The dimension of each head in the multi-head attention mechanism.
hidden_dim (int): The dimension of the hidden layer in the MLP (Multi-Layer Perceptron).
output_dim (int): The dimension of the output.
head_num (int): The number of heads in the multi-head attention mechanism.
mlp_num (int): The number of layers in the MLP.
layer_num (int): The number of Transformer layers.
dropout_ratio (float): The dropout ratio for the dropout layer.
activation (nn.Module): The activation function used in the MLP.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(x: Tensor, mask: Tensor | None = None) → Tensor[source]¶

Overview:

Perform the forward pass through the Transformer.

Arguments:

x (torch.Tensor): The input tensor, with shape (B, N, C), where B is batch size, N is the number of entries, and C is the feature dimension.
mask (Optional[torch.Tensor], optional): The mask tensor (bool), used to mask out invalid entries in attention. It has shape (B, N), where B is batch size and N is number of entries. Defaults to None.

Returns:

x (torch.Tensor): The output tensor from the Transformer.

training: bool¶

ScaledDotProductAttention¶

class ding.torch_utils.network.transformer.ScaledDotProductAttention(d_k: int, dropout: float = 0.0)[source]¶

Overview:: Implementation of Scaled Dot Product Attention, a key component of Transformer models. This class performs the dot product of the query, key and value tensors, scales it with the square root of the dimension of the key vector (d_k) and applies dropout for regularization.
Interfaces:: __init__, forward

__init__(d_k: int, dropout: float = 0.0) → None[source]¶

Overview:

Initialize the ScaledDotProductAttention module with the dimension of the key vector and the dropout rate.

Arguments:

d_k (int): The dimension of the key vector. This will be used to scale the dot product of the query and key.
dropout (float, optional): The dropout rate to be applied after the softmax operation. Defaults to 0.0.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward(q: Tensor, k: Tensor, v: Tensor, mask: Tensor | None = None) → Tensor[source]¶

Overview:

Perform the Scaled Dot Product Attention operation on the query, key and value tensors.

Arguments:

q (torch.Tensor): The query tensor.
k (torch.Tensor): The key tensor.
v (torch.Tensor): The value tensor.
mask (Optional[torch.Tensor]): An optional mask tensor to be applied on the attention scores.
Defaults to None.

Returns:

output (torch.Tensor): The output tensor after the attention operation.

training: bool¶

backend_helper¶

Please refer to ding/torch_utils/backend_helper for more details.

enable_tf32¶

ding.torch_utils.backend_helper.enable_tf32() → None[source]¶

Overview:: Enable tf32 on matmul and cudnn for faster computation. This only works on Ampere GPU devices. For detailed information, please refer to: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices.

checkpoint_helper¶

Please refer to ding/torch_utils/checkpoint_helper for more details.

build_checkpoint_helper¶

ding.torch_utils.checkpoint_helper.build_checkpoint_helper(cfg)[source]¶

Overview:

Use config to build checkpoint helper.

Arguments:

cfg (dict): ckpt_helper config

Returns:

(CheckpointHelper): checkpoint_helper created by this function

CheckpointHelper¶

class ding.torch_utils.checkpoint_helper.CheckpointHelper[source]¶

Overview:: Help to save or load checkpoint by give args.
Interfaces:: __init__, save, load, _remove_prefix, _add_prefix, _load_matched_model_state_dict

__init__()[source]¶

_add_prefix(state_dict: dict, prefix: str = 'module.') → dict[source]¶

Overview:

Add prefix in state_dict

Arguments:

state_dict (dict): model’s state_dict
prefix (str): this prefix will be added in keys

Returns:

(dict): new state_dict after adding prefix

_load_matched_model_state_dict(model: Module, ckpt_state_dict: dict) → None[source]¶

Overview:

Load matched model state_dict, and show mismatch keys between model’s state_dict and checkpoint’s state_dict

Arguments:

model (torch.nn.Module): model
ckpt_state_dict (dict): checkpoint’s state_dict

_remove_prefix(state_dict: dict, prefix: str = 'module.') → dict[source]¶

Overview:

Remove prefix in state_dict

Arguments:

state_dict (dict): model’s state_dict
prefix (str): this prefix will be removed in keys

Returns:

new_state_dict (dict): new state_dict after removing prefix

load(load_path: str, model: Module, optimizer: Optimizer = None, last_iter: CountVar = None, last_epoch: CountVar = None, last_frame: CountVar = None, lr_schduler: Scheduler = None, dataset: Dataset = None, collector_info: Module = None, prefix_op: str = None, prefix: str = None, strict: bool = True, logger_prefix: str = '', state_dict_mask: list = [])[source]¶

Overview:

Load checkpoint by given path

Arguments:

load_path (str): checkpoint’s path
model (torch.nn.Module): model definition
optimizer (torch.optim.Optimizer): optimizer obj
last_iter (CountVar): iter num, default None
last_epoch (CountVar): epoch num, default None
last_frame (CountVar): frame num, default None
lr_schduler (Schduler): lr_schduler obj
dataset (torch.utils.data.Dataset): dataset, should be replaydataset
collector_info (torch.nn.Module): attr of checkpoint, save collector info
prefix_op (str): should be [‘remove’, ‘add’], process on state_dict
prefix (str): prefix to be processed on state_dict
strict (bool): args of model.load_state_dict
logger_prefix (str): prefix of logger
state_dict_mask (list): A list containing state_dict keys, which shouldn’t be loaded into model(after prefix op)

Note

The checkpoint loaded from load_path is a dict, whose format is like ‘{‘state_dict’: OrderedDict(), …}’

Overview:

Save checkpoint by given args

Arguments:

path (str): the path of saving checkpoint
model (torch.nn.Module): model to be saved
optimizer (torch.optim.Optimizer): optimizer obj
last_iter (CountVar): iter num, default None
last_epoch (CountVar): epoch num, default None
last_frame (CountVar): frame num, default None
dataset (torch.utils.data.Dataset): dataset, should be replaydataset
collector_info (torch.nn.Module): attr of checkpoint, save collector info
prefix_op (str): should be [‘remove’, ‘add’], process on state_dict
prefix (str): prefix to be processed on state_dict

CountVar¶

class ding.torch_utils.checkpoint_helper.CountVar(init_val: int)[source]¶

Overview:

Number counter

Interfaces:

__init__, update, add

Properties:

val (int): the value of the counter

__init__(init_val: int) → None[source]¶

Overview:

Init the var counter

Arguments:

init_val (int): the init value of the counter

add(add_num: int)[source]¶

Overview:

Add the number to counter

Arguments:

add_num (int): the number added to the counter

update(val: int) → None[source]¶

Overview:

Update the var counter

Arguments:

val (int): the update value of the counter

property val: int¶

Overview:: Get the var counter

auto_checkpoint¶

ding.torch_utils.checkpoint_helper.auto_checkpoint(func: Callable) → Callable[source]¶

Overview:

Create a wrapper to wrap function, and the wrapper will call the save_checkpoint method whenever an exception happens.

Arguments:

func(Callable): the function to be wrapped

Returns:

wrapper (Callable): the wrapped function

data_helper¶

Please refer to ding/torch_utils/data_helper for more details.

to_device¶

ding.torch_utils.data_helper.to_device(item: Any, device: str, ignore_keys: list = []) → Any[source]¶

Overview:

Transfer data to certain device.

Arguments:

item (Any): The item to be transferred.
device (str): The device wanted.
ignore_keys (list): The keys to be ignored in transfer, default set to empty.

Returns:

item (Any): The transferred item.

Examples:

>>> setup_data_dict['module'] = nn.Linear(3, 5)
>>> device = 'cuda'
>>> cuda_d = to_device(setup_data_dict, device, ignore_keys=['module'])
>>> assert cuda_d['module'].weight.device == torch.device('cpu')

Examples:

>>> setup_data_dict['module'] = nn.Linear(3, 5)
>>> device = 'cuda'
>>> cuda_d = to_device(setup_data_dict, device)
>>> assert cuda_d['module'].weight.device == torch.device('cuda:0')

to_dtype¶

ding.torch_utils.data_helper.to_dtype(item: Any, dtype: type) → Any[source]¶

Overview:

Change data to certain dtype.

Arguments:

item (Any): The item for changing the dtype.
dtype (type): The type wanted.

Returns:

item (object): The item with changed dtype.

Examples (tensor):

>>> t = torch.randint(0, 10, (3, 5))
>>> tfloat = to_dtype(t, torch.float)
>>> assert tfloat.dtype == torch.float

Examples (list):

>>> tlist = [torch.randint(0, 10, (3, 5))]
>>> tlfloat = to_dtype(tlist, torch.float)
>>> assert tlfloat[0].dtype == torch.float

Examples (dict):

>>> tdict = {'t': torch.randint(0, 10, (3, 5))}
>>> tdictf = to_dtype(tdict, torch.float)
>>> assert tdictf['t'].dtype == torch.float

to_tensor¶

ding.torch_utils.data_helper.to_tensor(item: Any, dtype: dtype | None = None, ignore_keys: list = [], transform_scalar: bool = True) → Any[source]¶

Overview:

Convert numpy.ndarray object to torch.Tensor.

Arguments:

item (Any): The numpy.ndarray objects to be converted. It can be exactly a numpy.ndarray object or a container (list, tuple or dict) that contains several numpy.ndarray objects.
dtype (torch.dtype): The type of wanted tensor. If set to None, its dtype will be unchanged.
ignore_keys (list): If the item is a dict, values whose keys are in ignore_keys will not be converted.
transform_scalar (bool): If set to True, a scalar will be also converted to a tensor object.

Returns:

item (Any): The converted tensors.

Examples (scalar):

>>> i = 10
>>> t = to_tensor(i)
>>> assert t.item() == i

Examples (dict):

>>> d = {'i': i}
>>> dt = to_tensor(d, torch.int)
>>> assert dt['i'].item() == i

Examples (named tuple):

>>> data_type = namedtuple('data_type', ['x', 'y'])
>>> inputs = data_type(np.random.random(3), 4)
>>> outputs = to_tensor(inputs, torch.float32)
>>> assert type(outputs) == data_type
>>> assert isinstance(outputs.x, torch.Tensor)
>>> assert isinstance(outputs.y, torch.Tensor)
>>> assert outputs.x.dtype == torch.float32
>>> assert outputs.y.dtype == torch.float32

to_ndarray¶

ding.torch_utils.data_helper.to_ndarray(item: Any, dtype: dtype | None = None) → Any[source]¶

Overview:

Convert torch.Tensor to numpy.ndarray.

Arguments:

item (Any): The torch.Tensor objects to be converted. It can be exactly a torch.Tensor object or a container (list, tuple or dict) that contains several torch.Tensor objects.
dtype (np.dtype): The type of wanted array. If set to None, its dtype will be unchanged.

Returns:

item (object): The changed arrays.

Examples (ndarray):

>>> t = torch.randn(3, 5)
>>> tarray1 = to_ndarray(t)
>>> assert tarray1.shape == (3, 5)
>>> assert isinstance(tarray1, np.ndarray)

Examples (list):

>>> t = [torch.randn(5, ) for i in range(3)]
>>> tarray1 = to_ndarray(t, np.float32)
>>> assert isinstance(tarray1, list)
>>> assert tarray1[0].shape == (5, )
>>> assert isinstance(tarray1[0], np.ndarray)

to_list¶

ding.torch_utils.data_helper.to_list(item: Any) → Any[source]¶

Overview:

Convert torch.Tensor, numpy.ndarray objects to list objects, and keep their dtypes unchanged.

Arguments:

item (Any): The item to be converted.

Returns:

item (Any): The list after conversion.

Examples:

>>> data = {                 'tensor': torch.randn(4),                 'list': [True, False, False],                 'tuple': (4, 5, 6),                 'bool': True,                 'int': 10,                 'float': 10.,                 'array': np.random.randn(4),                 'str': "asdf",                 'none': None,             }         >>> transformed_data = to_list(data)

Note

Now supports item type: torch.Tensor, numpy.ndarray, dict, list, tuple and None.

tensor_to_list¶

ding.torch_utils.data_helper.tensor_to_list(item: Any) → Any[source]¶

Overview:

Convert torch.Tensor objects to list, and keep their dtypes unchanged.

Arguments:

item (Any): The item to be converted.

Returns:

item (Any): The lists after conversion.

Examples (2d-tensor):

>>> t = torch.randn(3, 5)
>>> tlist1 = tensor_to_list(t)
>>> assert len(tlist1) == 3
>>> assert len(tlist1[0]) == 5

Examples (1d-tensor):

>>> t = torch.randn(3, )
>>> tlist1 = tensor_to_list(t)
>>> assert len(tlist1) == 3

Examples (list)

>>> t = [torch.randn(5, ) for i in range(3)]
>>> tlist1 = tensor_to_list(t)
>>> assert len(tlist1) == 3
>>> assert len(tlist1[0]) == 5

Examples (dict):

>>> td = {'t': torch.randn(3, 5)}
>>> tdlist1 = tensor_to_list(td)
>>> assert len(tdlist1['t']) == 3
>>> assert len(tdlist1['t'][0]) == 5

Note

Now supports item type: torch.Tensor, dict, list, tuple and None.

to_item¶

ding.torch_utils.data_helper.to_item(data: Any, ignore_error: bool = True) → Any[source]¶

Overview:

Convert data to python native scalar (i.e. data item), and keep their dtypes unchanged.

Arguments:

data (Any): The data that needs to be converted.
ignore_error (bool): Whether to ignore the error when the data type is not supported. That is to say, only the data can be transformed into a python native scalar will be returned.

Returns:

data (Any): Converted data.

Examples:

>>>> data = { ‘tensor’: torch.randn(1), ‘list’: [True, False, torch.randn(1)], ‘tuple’: (4, 5, 6), ‘bool’: True, ‘int’: 10, ‘float’: 10., ‘array’: np.random.randn(1), ‘str’: “asdf”, ‘none’: None, } >>>> new_data = to_item(data) >>>> assert np.isscalar(new_data[‘tensor’]) >>>> assert np.isscalar(new_data[‘array’]) >>>> assert np.isscalar(new_data[‘list’][-1])

Note

Now supports item type: torch.Tensor, torch.Tensor, ttorch.Tensor, bool, str, dict, list, tuple and None.

same_shape¶

ding.torch_utils.data_helper.same_shape(data: list) → bool[source]¶

Overview:

Judge whether all data elements in a list have the same shapes.

Arguments:

data (list): The list of data.

Returns:

same (bool): Whether the list of data all have the same shape.

Examples:

>>> tlist = [torch.randn(3, 5) for i in range(5)]
>>> assert same_shape(tlist)
>>> tlist = [torch.randn(3, 5), torch.randn(4, 5)]
>>> assert not same_shape(tlist)

LogDict¶

class ding.torch_utils.data_helper.LogDict[source]¶

Overview:: Derived from dict. Would convert torch.Tensor to list for convenient logging.
Interfaces:: _transform, __setitem__, update.

_transform(data: Any) → None[source]¶

Overview:

Convert tensor objects to lists for better logging.

Arguments:

data (Any): The input data to be converted.

update(data: dict) → None[source]¶

Overview:

Override the update function of built-in dict.

Arguments:

data (dict): The dict for updating current object.

build_log_buffer¶

ding.torch_utils.data_helper.build_log_buffer() → LogDict[source]¶

Overview:

Build log buffer, a subclass of dict, which can convert the input data into log format.

Returns:

log_buffer (LogDict): Log buffer dict.

Examples:

>>> log_buffer = build_log_buffer()
>>> log_buffer['not_tensor'] = torch.randn(3)
>>> assert isinstance(log_buffer['not_tensor'], list)
>>> assert len(log_buffer['not_tensor']) == 3
>>> log_buffer.update({'not_tensor': 4, 'a': 5})
>>> assert log_buffer['not_tensor'] == 4

CudaFetcher¶

class ding.torch_utils.data_helper.CudaFetcher(data_source: Iterable, device: str, queue_size: int = 4, sleep: float = 0.1)[source]¶

Overview:: Fetch data from source, and transfer it to a specified device.
Interfaces:: __init__, __next__, run, close.

__init__(data_source: Iterable, device: str, queue_size: int = 4, sleep: float = 0.1) → None[source]¶

Overview:

Initialize the CudaFetcher object using the given arguments.

Arguments:

data_source (Iterable): The iterable data source.
device (str): The device to put data to, such as “cuda:0”.
queue_size (int): The internal size of queue, such as 4.
sleep (float): Sleeping time when the internal queue is full.

_producer() → None[source]¶

Overview:: Keep fetching data from source, change the device, and put into queue for request.

close() → None[source]¶

Overview:: Stop producer thread by setting end_flag to True .

run() → None[source]¶

Overview:

Start producer thread: Keep fetching data from source, change the device, and put into queue for request.

Examples:

>>> timer = EasyTimer()
>>> dataloader = iter([torch.randn(3, 3) for _ in range(10)])
>>> dataloader = CudaFetcher(dataloader, device='cuda', sleep=0.1)
>>> dataloader.run()
>>> data = next(dataloader)

get_tensor_data¶

ding.torch_utils.data_helper.get_tensor_data(data: Any) → Any[source]¶

Overview:

Get pure tensor data from the given data (without disturbing grad computation graph).

Arguments:

data (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).

Returns:

output (Any): The output data.

Examples:

>>> a = {                 'tensor': torch.tensor([1, 2, 3.], requires_grad=True),                 'list': [torch.tensor([1, 2, 3.], requires_grad=True) for _ in range(2)],                 'none': None             }
>>> tensor_a = get_tensor_data(a)
>>> assert not tensor_a['tensor'].requires_grad
>>> for t in tensor_a['list']:
>>>     assert not t.requires_grad

unsqueeze¶

ding.torch_utils.data_helper.unsqueeze(data: Any, dim: int = 0) → Any[source]¶

Overview:

Unsqueeze the tensor data.

Arguments:

data (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).
dim (int): The dimension to be unsqueezed.

Returns:

output (Any): The output data.

Examples (tensor):

>>> t = torch.randn(3, 3)
>>> tt = unsqueeze(t, dim=0)
>>> assert tt.shape == torch.Shape([1, 3, 3])

Examples (list):

>>> t = [torch.randn(3, 3)]
>>> tt = unsqueeze(t, dim=0)
>>> assert tt[0].shape == torch.Shape([1, 3, 3])

Examples (dict):

>>> t = {"t": torch.randn(3, 3)}
>>> tt = unsqueeze(t, dim=0)
>>> assert tt["t"].shape == torch.Shape([1, 3, 3])

squeeze¶

ding.torch_utils.data_helper.squeeze(data: Any, dim: int = 0) → Any[source]¶

Overview:

Squeeze the tensor data.

Arguments:

data (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).
dim (int): The dimension to be Squeezed.

Returns:

output (Any): The output data.

Examples (tensor):

>>> t = torch.randn(1, 3, 3)
>>> tt = squeeze(t, dim=0)
>>> assert tt.shape == torch.Shape([3, 3])

Examples (list):

>>> t = [torch.randn(1, 3, 3)]
>>> tt = squeeze(t, dim=0)
>>> assert tt[0].shape == torch.Shape([3, 3])

Examples (dict):

>>> t = {"t": torch.randn(1, 3, 3)}
>>> tt = squeeze(t, dim=0)
>>> assert tt["t"].shape == torch.Shape([3, 3])

get_null_data¶

ding.torch_utils.data_helper.get_null_data(template: Any, num: int) → List[Any][source]¶

Overview:

Get null data given an input template.

Arguments:

template (Any): The template data.
num (int): The number of null data items to generate.

Returns:

output (List[Any]): The generated null data.

Examples:

>>> temp = {'obs': [1, 2, 3], 'action': 1, 'done': False, 'reward': torch.tensor(1.)}
>>> null_data = get_null_data(temp, 2)
>>> assert len(null_data) ==2
>>> assert null_data[0]['null'] and null_data[0]['done']

zeros_like¶

ding.torch_utils.data_helper.zeros_like(h: Any) → Any[source]¶

Overview:

Generate zero-tensors like the input data.

Arguments:

h (Any): The original data. It can be exactly a tensor or a container (Sequence or dict).

Returns:

output (Any): The output zero-tensors.

Examples (tensor):

>>> t = torch.randn(3, 3)
>>> tt = zeros_like(t)
>>> assert tt.shape == torch.Shape([3, 3])
>>> assert torch.sum(torch.abs(tt)) < 1e-8

Examples (list):

>>> t = [torch.randn(3, 3)]
>>> tt = zeros_like(t)
>>> assert tt[0].shape == torch.Shape([3, 3])
>>> assert torch.sum(torch.abs(tt[0])) < 1e-8

Examples (dict):

>>> t = {"t": torch.randn(3, 3)}
>>> tt = zeros_like(t)
>>> assert tt["t"].shape == torch.Shape([3, 3])
>>> assert torch.sum(torch.abs(tt["t"])) < 1e-8

dataparallel¶

Please refer to ding/torch_utils/dataparallel for more details.

DataParallel¶

class ding.torch_utils.dataparallel.DataParallel(module, device_ids=None, output_device=None, dim=0)[source]¶

Overview:: A wrapper class for nn.DataParallel.
Interfaces:: __init__, parameters

__init__(module, device_ids=None, output_device=None, dim=0)[source]¶

Overview:

Initialize the DataParallel object.

Arguments:

module (nn.Module): The module to be parallelized.
device_ids (list): The list of GPU ids.
output_device (int): The output GPU id.
dim (int): The dimension to be parallelized.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

parameters(recurse: bool = True)[source]¶

Overview:

Return the parameters of the module.

Arguments:

recurse (bool): Whether to return the parameters of the submodules.

Returns:

params (generator): The generator of the parameters.

training: bool¶

distribution¶

Please refer to ding/torch_utils/distribution for more details.

Pd¶

class ding.torch_utils.distribution.Pd[source]¶

Overview:: Abstract class for parameterizable probability distributions and sampling functions.
Interfaces:: neglogp, entropy, noise_mode, mode, sample

Tip

In dereived classes, logits should be an attribute member stored in class.

entropy() → Tensor[source]¶

Overview:

Calculate the softmax entropy of logits

Arguments:

reduction (str): support [None, ‘mean’], default set to ‘mean’

Returns:

entropy (torch.Tensor): the calculated entropy

mode()[source]¶

Overview:: Return logits argmax result. This method is designed for deterministic.

neglogp(x: Tensor) → Tensor[source]¶

Overview:

Calculate cross_entropy between input x and logits

Arguments:

x (torch.Tensor): the input tensor

Return:

cross_entropy (torch.Tensor): the returned cross_entropy loss

noise_mode()[source]¶

Overview:: Add noise to logits. This method is designed for randomness

sample()[source]¶

Overview:: Sample from logits’s distribution by using softmax. This method is designed for multinomial.

CategoricalPd¶

class ding.torch_utils.distribution.CategoricalPd(logits: Tensor | None = None)[source]¶

Overview:: Catagorical probility distribution sampler
Interfaces:: __init__, neglogp, entropy, noise_mode, mode, sample

__init__(logits: Tensor | None = None) → None[source]¶

Overview:

Init the Pd with logits

Arguments:

logits (:obj:torch.Tensor): logits to sample from

entropy(reduction: str = 'mean') → Tensor[source]¶

Overview:

Calculate the softmax entropy of logits

Arguments:

reduction (str): support [None, ‘mean’], default set to mean

Returns:

entropy (torch.Tensor): the calculated entropy

mode(viz: bool = False) → Tuple[Tensor, Dict[str, ndarray]][source]¶

Overview:

return logits argmax result

Arguments:

viz (bool): Whether to return numpy from of logits, noise and noise_logits;
Short for visualize . (Because tensor type cannot visualize in tb or text log)

Returns:

result (torch.Tensor): the logits argmax result
viz_feature (Dict[str, np.ndarray]): ndarray type data for visualization.

neglogp(x, reduction: str = 'mean') → Tensor[source]¶

Overview:

Calculate cross_entropy between input x and logits

Arguments:

x (torch.Tensor): the input tensor
reduction (str): support [None, ‘mean’], default set to mean

Return:

cross_entropy (torch.Tensor): the returned cross_entropy loss

noise_mode(viz: bool = False) → Tuple[Tensor, Dict[str, ndarray]][source]¶

Overview:

add noise to logits

Arguments:

viz (bool): Whether to return numpy from of logits, noise and noise_logits; Short for visualize . (Because tensor type cannot visualize in tb or text log)

Returns:

result (torch.Tensor): noised logits
viz_feature (Dict[str, np.ndarray]): ndarray type data for visualization.

sample(viz: bool = False) → Tuple[Tensor, Dict[str, ndarray]][source]¶

Overview:

Sample from logits’s distribution by using softmax

Arguments:

viz (bool): Whether to return numpy from of logits, noise and noise_logits; Short for visualize . (Because tensor type cannot visualize in tb or text log)

Returns:

result (torch.Tensor): the logits sampled result
viz_feature (Dict[str, np.ndarray]): ndarray type data for visualization.

update_logits(logits: Tensor) → None[source]¶

Overview:

Updata logits

Arguments:

logits (torch.Tensor): logits to update

CategoricalPdPytorch¶

class ding.torch_utils.distribution.CategoricalPdPytorch(probs: Tensor | None = None)[source]¶

Overview:: Wrapped torch.distributions.Categorical
Interfaces:: __init__, update_logits, update_probs, sample, neglogp, mode, entropy

__init__(probs: Tensor | None = None) → None[source]¶

Overview:

Initialize the CategoricalPdPytorch object.

Arguments:

probs (torch.Tensor): The tensor of probabilities.

entropy(reduction: str | None = None) → Tensor[source]¶

Overview:

Calculate the softmax entropy of logits

Arguments:

reduction (str): support [None, ‘mean’], default set to mean

Returns:

entropy (torch.Tensor): the calculated entropy

mode() → Tensor[source]¶

Overview:

Return logits argmax result

Return:

result(torch.Tensor): the logits argmax result

neglogp(actions: Tensor, reduction: str = 'mean') → Tensor[source]¶

Overview:

Calculate cross_entropy between input x and logits

Arguments:

actions (torch.Tensor): the input action tensor
reduction (str): support [None, ‘mean’], default set to mean

Return:

cross_entropy (torch.Tensor): the returned cross_entropy loss

sample() → Tensor[source]¶

Overview:

Sample from logits’s distribution by using softmax

Return:

result (torch.Tensor): the logits sampled result

update_logits(logits: Tensor) → None[source]¶

Overview:

Updata logits

Arguments:

logits (torch.Tensor): logits to update

update_probs(probs: Tensor) → None[source]¶

Overview:

Updata probs

Arguments:

probs (torch.Tensor): probs to update

lr_scheduler¶

Please refer to ding/torch_utils/lr_scheduler for more details.

get_lr_ratio¶

ding.torch_utils.lr_scheduler.get_lr_ratio(epoch: int, warmup_epochs: int, learning_rate: float, lr_decay_epochs: int, min_lr: float) → float[source]¶

Overview:

Get learning rate ratio for each epoch.

Arguments:

epoch (int): Current epoch.
warmup_epochs (int): Warmup epochs.
learning_rate (float): Learning rate.
lr_decay_epochs (int): Learning rate decay epochs.
min_lr (float): Minimum learning rate.

cos_lr_scheduler¶

ding.torch_utils.lr_scheduler.cos_lr_scheduler(optimizer: Optimizer, learning_rate: float, warmup_epochs: float = 5, lr_decay_epochs: float = 100, min_lr: float = 6e-05) → LambdaLR[source]¶

Overview:

Cosine learning rate scheduler.

Arguments:

optimizer (torch.optim.Optimizer): Optimizer.
learning_rate (float): Learning rate.
warmup_epochs (float): Warmup epochs.
lr_decay_epochs (float): Learning rate decay epochs.
min_lr (float): Minimum learning rate.

math_helper¶

Please refer to ding/torch_utils/math_helper for more details.

cov¶

ding.torch_utils.math_helper.cov(x: Tensor, rowvar: bool = False, bias: bool = False, ddof: int | None = None, aweights: Tensor | None = None) → Tensor[source]¶

Overview:

Estimates covariance matrix like numpy.cov.

Arguments:

x (torch.Tensor): A 1-D or 2-D tensor containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables.
rowvar (bool): If rowvar is True by default, and each column is a single observation of all those variables. Otherwise, each column represents a variable, while the rows contain observations.
bias (bool): Default normalization (False) is by dividing N - 1, where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N.
ddof (Optional[int]): If ddof is not None, it implies that the argument bias is overridden. Note that ddof=1 will return the unbiased estimate (equals to bias=False), and ddof=0 will return the biased estimation (equals to bias=True).
aweights (Optional[torch.Tensor]): 1-D tensor of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0, the tensor of weights can be used to assign weights to observation vectors.

Returns:

cov_mat (torch.Tensor): Covariance matrix calculated.

metric¶

Please refer to ding/torch_utils/metric for more details.

levenshtein_distance¶

ding.torch_utils.metric.levenshtein_distance(pred: LongTensor, target: LongTensor, pred_extra: Tensor | None = None, target_extra: Tensor | None = None, extra_fn: Callable | None = None) → FloatTensor[source]¶

Overview:

Levenshtein Distance, i.e. Edit Distance.

Arguments:

pred (torch.LongTensor): The first tensor to calculate the distance, shape: (N1, ) (N1 >= 0).
target (torch.LongTensor): The second tensor to calculate the distance, shape: (N2, ) (N2 >= 0).
pred_extra (Optional[torch.Tensor]): Extra tensor to calculate the distance, only works when extra_fn is not None.
target_extra (Optional[torch.Tensor]): Extra tensor to calculate the distance, only works when extra_fn is not None.
extra_fn (Optional[Callable]): The distance function for pred_extra and target_extra. If set to None, this distance will not be considered.

Returns:

distance (torch.FloatTensor): distance(scalar), shape: (1, ).

hamming_distance¶

ding.torch_utils.metric.hamming_distance(pred: LongTensor, target: LongTensor, weight=1.0) → LongTensor[source]¶

Overview:

Hamming Distance.

Arguments:

pred (torch.LongTensor): Pred input, boolean vector(0 or 1).
target (torch.LongTensor): Target input, boolean vector(0 or 1).
weight (torch.LongTensor): Weight to multiply.

Returns:

distance(torch.LongTensor): Distance (scalar), shape (1, ).

Shapes:

pred & target (torch.LongTensor): shape \((B, N)\), while B is the batch size, N is the dimension

model_helper¶

Please refer to ding/torch_utils/model_helper for more details.

get_num_params¶

ding.torch_utils.model_helper.get_num_params(model: Module) → int[source]¶

Overview:

Return the number of parameters in the model.

Arguments:

model (torch.nn.Module): The model object to calculate the parameter number.

Returns:

n_params (int): The calculated number of parameters.

Examples:

>>> model = torch.nn.Linear(3, 5)
>>> num = get_num_params(model)
>>> assert num == 15

nn_test_helper¶

Please refer to ding/torch_utils/nn_test_helper for more details.

is_differentiable¶

ding.torch_utils.nn_test_helper.is_differentiable(loss: Tensor, model: Module | List[Module], print_instead: bool = False) → None[source]¶

Overview:

Judge whether the model/models are differentiable. First check whether module’s grad is None, then do loss’s back propagation, finally check whether module’s grad are torch.Tensor.

Arguments:

loss (torch.Tensor): loss tensor of the model
model (Union[torch.nn.Module, List[torch.nn.Module]]): model or models to be checked
print_instead (bool): Whether to print module’s final grad result, instead of asserting. Default set to False.

optimizer_helper¶

Please refer to ding/torch_utils/optimizer_helper for more details.

calculate_grad_norm¶

ding.torch_utils.optimizer_helper.calculate_grad_norm(model: Module, norm_type=2) → float[source]¶

Overview:

calculate grad norm of the parameters whose grad norms are not None in the model.

Arguments:

model: torch.nn.Module
norm_type (int or inf)

calculate_grad_norm_without_bias_two_norm¶

ding.torch_utils.optimizer_helper.calculate_grad_norm_without_bias_two_norm(model: Module) → float[source]¶

Overview:

calculate grad norm of the parameters whose grad norms are not None in the model.

Arguments:

model: torch.nn.Module

grad_ignore_norm¶

ding.torch_utils.optimizer_helper.grad_ignore_norm(parameters, max_norm, norm_type=2)[source]¶

Overview:

Clip the gradient norm of an iterable of parameters.

Arguments:

parameters (Iterable): an iterable of torch.Tensor
max_norm (float): the max norm of the gradients
norm_type (float): 2.0 means use norm2 to clip

grad_ignore_value¶

ding.torch_utils.optimizer_helper.grad_ignore_value(parameters, clip_value)[source]¶

Overview:

Clip the gradient value of an iterable of parameters.

Arguments:

parameters (Iterable): an iterable of torch.Tensor
clip_value (float): the value to start clipping

Adam¶

class ding.torch_utils.optimizer_helper.Adam(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, optim_type: str = 'adam', grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶

Overview:: Rewrited Adam optimizer to support more features.
Interfaces:: __init__, step, _state_init, get_grad

__init__(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, optim_type: str = 'adam', grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶

Overview:

init method of refactored Adam class

Arguments:

params (iterable): – an iterable of torch.Tensor s or dict s. Specifies what Tensors should be optimized
lr (float): learning rate, default set to 1e-3
betas (Tuple[float, float]): coefficients used for computing running averages of gradient and its square, default set to (0.9, 0.999))
eps (float): term added to the denominator to improve numerical stability, default set to 1e-8
weight_decay (float): weight decay coefficient, deault set to 0
amsgrad (bool): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond <https://arxiv.org/abs/1904.09237>
optim_type (:obj:str): support [“adam”, “adamw”]
grad_clip_type (str): support [None, ‘clip_momentum’, ‘clip_value’, ‘clip_norm’, ‘clip_momentum_norm’]
clip_value (float): the value to start clipping
clip_coef (float): the cliping coefficient
clip_norm_type (float): 2.0 means use norm2 to clip
clip_momentum_timestep (int): after how many step should we start the momentum clipping
grad_ignore_type (str): support [None, ‘ignore_momentum’, ‘ignore_value’, ‘ignore_norm’, ‘ignore_momentum_norm’]
ignore_value (float): the value to start ignoring
ignore_coef (float): the ignoreing coefficient
ignore_norm_type (float): 2.0 means use norm2 to ignore
ignore_momentum_timestep (int): after how many step should we start the momentum ignoring

_optimizer_load_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶

_optimizer_load_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶

_optimizer_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶

_optimizer_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶

_optimizer_step_post_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], None]]¶

_optimizer_step_pre_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], Tuple[Tuple[Any, ...], Dict[str, Any]] | None]]¶

_state_init(p, amsgrad)[source]¶

Overview:

Initialize the state of the optimizer

Arguments:

p (torch.Tensor): the parameter to be optimized
amsgrad (bool): whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond <https://arxiv.org/abs/1904.09237>

get_grad() → float[source]¶

step(closure: Callable | None = None)[source]¶

Overview:

Performs a single optimization step

Arguments:

closure (callable): A closure that reevaluates the model and returns the loss, default set to None

RMSprop¶

class ding.torch_utils.optimizer_helper.RMSprop(params: Iterable, lr: float = 0.01, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0, centered: bool = False, grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶

Overview:: Rewrited RMSprop optimizer to support more features.
Interfaces:: __init__, step, _state_init, get_grad

__init__(params: Iterable, lr: float = 0.01, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0, centered: bool = False, grad_clip_type: str | None = None, clip_value: float | None = None, clip_coef: float = 5, clip_norm_type: float = 2.0, clip_momentum_timestep: int = 100, grad_norm_type: str | None = None, grad_ignore_type: str | None = None, ignore_value: float | None = None, ignore_coef: float = 5, ignore_norm_type: float = 2.0, ignore_momentum_timestep: int = 100)[source]¶

Overview:

init method of refactored Adam class

Arguments:

params (iterable): – an iterable of torch.Tensor s or dict s. Specifies what Tensors should be optimized
lr (float): learning rate, default set to 1e-3
alpha (float): smoothing constant, default set to 0.99
eps (float): term added to the denominator to improve numerical stability, default set to 1e-8
weight_decay (float): weight decay coefficient, deault set to 0
centred (bool): if True, compute the centered RMSprop, the gradient is normalized by an estimation of its variance
grad_clip_type (str): support [None, ‘clip_momentum’, ‘clip_value’, ‘clip_norm’, ‘clip_momentum_norm’]
clip_value (float): the value to start clipping
clip_coef (float): the cliping coefficient
clip_norm_type (float): 2.0 means use norm2 to clip
clip_momentum_timestep (int): after how many step should we start the momentum clipping
grad_ignore_type (str): support [None, ‘ignore_momentum’, ‘ignore_value’, ‘ignore_norm’, ‘ignore_momentum_norm’]
ignore_value (float): the value to start ignoring
ignore_coef (float): the ignoreing coefficient
ignore_norm_type (float): 2.0 means use norm2 to ignore
ignore_momentum_timestep (int): after how many step should we start the momentum ignoring

_optimizer_load_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶

_optimizer_load_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶

_optimizer_state_dict_post_hooks: OrderedDict[int, Callable[['Optimizer', StateDict], StateDict | None]]¶

_optimizer_state_dict_pre_hooks: OrderedDict[int, Callable[['Optimizer'], None]]¶

_optimizer_step_post_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], None]]¶

_optimizer_step_pre_hooks: Dict[int, Callable[[Self, Tuple[Any, ...], Dict[str, Any]], Tuple[Tuple[Any, ...], Dict[str, Any]] | None]]¶

_state_init(p, momentum, centered)[source]¶

Overview:

Initialize the state of the optimizer

Arguments:

p (torch.Tensor): the parameter to be optimized
momentum (float): the momentum coefficient
centered (bool): if True, compute the centered RMSprop, the gradient is normalized by an estimation of its variance

get_grad() → float[source]¶

Overview:: calculate grad norm of the parameters whose grad norms are not None in the model.

step(closure: Callable | None = None)[source]¶

Overview:

Performs a single optimization step

Arguments:

closure (callable): A closure that reevaluates the model and returns the loss, default set to None

PCGrad¶

class ding.torch_utils.optimizer_helper.PCGrad(optimizer, reduction='mean')[source]¶

Overview:

PCGrad optimizer to support multi-task. you can view the paper in the following link https://arxiv.org/pdf/2001.06782.pdf

Interfaces:

__init__, zero_grad, step, pc_backward

Properties:

optimizer (torch.optim): the optimizer to be used

__init__(optimizer, reduction='mean')[source]¶

Overview:

Initialization of PCGrad optimizer

Arguments:

optimizer (torch.optim): the optimizer to be used
reduction (str): the reduction method, support [‘mean’, ‘sum’]

_flatten_grad(grads, shapes)[source]¶

Overview:

flatten the gradient of the parameters of the network

Arguments:

grads (list): a list of the gradient of the parameters
shapes (list): a list of the shape of the parameters

_pack_grad(objectives)[source]¶

Overview:

pack the gradient of the parameters of the network for each objective

Arguments:

objectives: a list of objectives

Returns:

grad: a list of the gradient of the parameters
shape: a list of the shape of the parameters
has_grad: a list of mask represent whether the parameter has gradient

_project_conflicting(grads, has_grads, shapes=None)[source]¶

Overview:

project the conflicting gradient to the orthogonal space

Arguments:

grads (list): a list of the gradient of the parameters
has_grads (list): a list of mask represent whether the parameter has gradient
shapes (list): a list of the shape of the parameters

_retrieve_grad()[source]¶

Overview:

get the gradient of the parameters of the network with specific objective

Returns:

grad: a list of the gradient of the parameters
shape: a list of the shape of the parameters
has_grad: a list of mask represent whether the parameter has gradient

_set_grad(grads)[source]¶

Overview:

set the modified gradients to the network

Arguments:

grads (list): a list of the gradient of the parameters

_unflatten_grad(grads, shapes)[source]¶

Overview:

unflatten the gradient of the parameters of the network

Arguments:

grads (list): a list of the gradient of the parameters
shapes (list): a list of the shape of the parameters

property optimizer¶

Overview:: get the optimizer

pc_backward(objectives)[source]¶

Overview:

calculate the gradient of the parameters

Arguments:

objectives: a list of objectives

step()[source]¶

Overview:: update the parameters with the gradient

zero_grad()[source]¶

Overview:: clear the gradient of the parameters

configure_weight_decay¶

ding.torch_utils.optimizer_helper.configure_weight_decay(model: Module, weight_decay: float) → List[source]¶

Overview:

Separating out all parameters of the model into two buckets: those that will experience weight decay for regularization and those that won’t (biases, and layer-norm or embedding weights).

Arguments:

model (nn.Module): The given PyTorch model.
weight_decay (float): Weight decay value for optimizer.

Returns:

optim groups (List): The parameter groups to be set in the latter optimizer.

parameter¶

Please refer to ding/torch_utils/parameter for more details.

NonegativeParameter¶

class ding.torch_utils.parameter.NonegativeParameter(data: Tensor | None = None, requires_grad: bool = True, delta: float = 1e-08)[source]¶

Overview:: This module will output a non-negative parameter during the forward process.
Interfaces:: __init__, forward, set_data.

__init__(data: Tensor | None = None, requires_grad: bool = True, delta: float = 1e-08)[source]¶

Overview:

Initialize the NonegativeParameter object using the given arguments.

Arguments:

data (Optional[torch.Tensor]): The initial value of generated parameter. If set to None, the default value is 0.
requires_grad (bool): Whether this parameter requires grad.
delta (Any): The delta of log function.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward() → Tensor[source]¶

Overview:: Output the non-negative parameter during the forward process.
Returns:: parameter (torch.Tensor): The generated parameter.

set_data(data: Tensor) → None[source]¶

Overview：: Set the value of the non-negative parameter.
Arguments:: data (torch.Tensor): The new value of the non-negative parameter.

training: bool¶

TanhParameter¶

class ding.torch_utils.parameter.TanhParameter(data: Tensor | None = None, requires_grad: bool = True)[source]¶

Overview:: This module will output a tanh parameter during the forward process.
Interfaces:: __init__, forward, set_data.

__init__(data: Tensor | None = None, requires_grad: bool = True)[source]¶

Overview:

Initialize the TanhParameter object using the given arguments.

Arguments:

data (Optional[torch.Tensor]): The initial value of generated parameter. If set to None, the default value is 1.
requires_grad (bool): Whether this parameter requires grad.

_backward_hooks: Dict[int, Callable]¶

_backward_pre_hooks: Dict[int, Callable]¶

_buffers: Dict[str, Tensor | None]¶

_forward_hooks: Dict[int, Callable]¶

_forward_hooks_always_called: Dict[int, bool]¶

_forward_hooks_with_kwargs: Dict[int, bool]¶

_forward_pre_hooks: Dict[int, Callable]¶

_forward_pre_hooks_with_kwargs: Dict[int, bool]¶

_is_full_backward_hook: bool | None¶

_load_state_dict_post_hooks: Dict[int, Callable]¶

_load_state_dict_pre_hooks: Dict[int, Callable]¶

_modules: Dict[str, Module | None]¶

_non_persistent_buffers_set: Set[str]¶

_parameters: Dict[str, Parameter | None]¶

_state_dict_hooks: Dict[int, Callable]¶

_state_dict_pre_hooks: Dict[int, Callable]¶

forward() → Tensor[source]¶

Overview:: Output the tanh parameter during the forward process.
Returns:: parameter (torch.Tensor): The generated parameter.

set_data(data: Tensor) → None[source]¶

Overview:: Set the value of the tanh parameter.
Arguments:: data (torch.Tensor): The new value of the tanh parameter.

training: bool¶

reshape_helper¶

Please refer to ding/torch_utils/reshape_helper for more details.

fold_batch¶

ding.torch_utils.reshape_helper.fold_batch(x: Tensor, nonbatch_ndims: int = 1) → Tuple[Tensor, Size][source]¶

Overview:

\((T, B, X) \leftarrow (T*B, X)\) Fold the first (ndim - nonbatch_ndims) dimensions of a tensor as batch dimension. This operation is similar to torch.flatten but provides an inverse function unfold_batch to restore the folded dimensions.

Arguments:

x (torch.Tensor): the tensor to fold
nonbatch_ndims (int): the number of dimensions that is not folded as
batch dimension.

Returns:

x (torch.Tensor): the folded tensor
batch_dims: the folded dimensions of the original tensor, which can be used to
reverse the operation

Examples:

>>> x = torch.ones(10, 20, 5, 4, 8)
>>> x, batch_dim = fold_batch(x, 2)
>>> x.shape == (1000, 4, 8)
>>> batch_dim == (10, 20, 5)

unfold_batch¶

ding.torch_utils.reshape_helper.unfold_batch(x: Tensor, batch_dims: Size | Tuple) → Tensor[source]¶

Overview:

Unfold the batch dimension of a tensor.

Arguments:

x (torch.Tensor): the tensor to unfold
batch_dims (torch.Size): the dimensions that are folded

Returns:

x (torch.Tensor): the original unfolded tensor

Examples:

>>> x = torch.ones(10, 20, 5, 4, 8)
>>> x, batch_dim = fold_batch(x, 2)
>>> x.shape == (1000, 4, 8)
>>> batch_dim == (10, 20, 5)
>>> x = unfold_batch(x, batch_dim)
>>> x.shape == (10, 20, 5, 4, 8)

unsqueeze_repeat¶

ding.torch_utils.reshape_helper.unsqueeze_repeat(x: Tensor, repeat_times: int, unsqueeze_dim: int = 0) → Tensor[source]¶

Overview:

Squeeze the tensor on unsqueeze_dim and then repeat in this dimension for repeat_times times. This is useful for preproprocessing the input to an model ensemble.

Arguments:

x (torch.Tensor): the tensor to squeeze and repeat
repeat_times (int): the times that the tensor is repeatd
unsqueeze_dim (int): the unsqueezed dimension

Returns:

x (torch.Tensor): the unsqueezed and repeated tensor

Examples:

>>> x = torch.ones(64, 6)
>>> x = unsqueeze_repeat(x, 4)
>>> x.shape == (4, 64, 6)

>>> x = torch.ones(64, 6)
>>> x = unsqueeze_repeat(x, 4, -1)
>>> x.shape == (64, 6, 4)

ding.torch_utils¶

loss¶

ContrastiveLoss¶

LabelSmoothCELoss¶

SoftFocalLoss¶

build_ce_criterion¶

MultiLogitsLoss¶

network.activation¶

Lambda¶

GLU¶

Swish¶

GELU¶

build_activation¶

network.diffusion¶

extract¶

cosine_beta_schedule¶

apply_conditioning¶

DiffusionConv1d¶

SinusoidalPosEmb¶

Residual¶

LayerNorm¶

PreNorm¶

LinearAttention¶

ResidualTemporalBlock¶

DiffusionUNet1d¶

TemporalValue¶

network.dreamer¶

Conv2dSame¶

DreamerLayerNorm¶

DenseHead¶

ActionHead¶

SampleDist¶

OneHotDist¶

TwoHotDistSymlog¶

SymlogDist¶

ContDist¶

Bernoulli¶

network.gtrxl¶

PositionalEmbedding¶

GRUGatingUnit¶

Memory¶

AttentionXL¶

GatedTransformerXLLayer¶

GTrXL¶

network.gumbel_softmax¶

GumbelSoftmax¶

network.merge¶

BilinearGeneral¶

TorchBilinearCustomized¶

FiLM¶

GatingType¶

SumMerge¶

VectorMerge¶

network.nn_module¶

weight_init¶

sequential_pack¶

conv1d_block¶

conv2d_block¶

deconv2d_block¶

fc_block¶

normed_linear¶

normed_conv2d¶

MLP¶

ChannelShuffle¶

one_hot¶

NearestUpsample¶

BilinearUpsample¶

binary_encode¶

NoiseLinearLayer¶

noise_block¶

NaiveFlatten¶

network.normalization¶

build_normalization¶

network.popart¶

PopArt¶

network.res_block¶

ResBlock¶

ResFCBlock¶

network.resnet¶

to_2tuple¶

weight_init ¶