lightrft.strategy.utils.optimizer_utils¶

PyTorch Optimization Utilities Module

This module provides utility functions for optimizing PyTorch models, particularly focused on parameter grouping for optimizers with customized weight decay settings. It includes support for both regular tensors and distributed tensors (DTensor) with specialized grouping strategies for optimal performance in distributed training scenarios.

_DEFAULT_NO_DECAY_NAME_LIST¶

lightrft.strategy.utils.optimizer_utils._DEFAULT_NO_DECAY_NAME_LIST = ['bias', 'layer_norm.weight', 'layernorm.weight', 'norm.weight', 'ln_f.weight']¶

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

get_optimizer_grouped_parameters¶

lightrft.strategy.utils.optimizer_utils.get_optimizer_grouped_parameters(model, weight_decay, no_decay_name_list: List[str] | None = None)[source]¶

Prepare parameter groups for optimizer with weight decay control.

Groups parameters into two groups: - Parameters that should have weight decay applied - Parameters that should not have weight decay applied (typically normalization layers and biases)

Parameters:

model (torch.nn.Module) – The model whose parameters will be organized.
weight_decay (float) – Weight decay value to apply to applicable parameters.
no_decay_name_list (Optional[List[str]]) – List of parameter name patterns that should not have weight decay. If None, defaults to _DEFAULT_NO_DECAY_NAME_LIST.

Returns:

List of parameter groups for the optimizer.

Return type:

list

Example:

>>> import torch.nn as nn
>>> model = nn.Sequential(nn.Linear(10, 10), nn.LayerNorm(10))
>>> grouped_params = get_optimizer_grouped_parameters(model, weight_decay=0.01)
>>> optimizer = torch.optim.AdamW(grouped_params)