lightrft.strategy.deepspeed.deepspeed¶

DeepSpeed Strategy Module for lightrft.

This module provides a DeepSpeed-based strategy for training and inference in lightrft. It handles model initialization, optimization, parameter management, and checkpoint operations using DeepSpeed’s distributed training capabilities. The strategy supports various DeepSpeed features including ZeRO optimization stages, mixed precision training, and parameter offloading.

ModelOptimPair¶

lightrft.strategy.deepspeed.deepspeed.ModelOptimPair¶: alias of Tuple[Module, Optimizer]

ModelOrModelOptimPair¶

lightrft.strategy.deepspeed.deepspeed.ModelOrModelOptimPair¶: alias of Union[Module, Tuple[Module, Optimizer]]

DeepspeedStrategy¶

class lightrft.strategy.deepspeed.deepspeed.DeepspeedStrategy(seed: int = 42, max_norm: float = 0.0, micro_train_batch_size: int = 1, train_batch_size: int = 1, zero_stage: int = 2, bf16: bool = True, args=None)[source]¶

DeepSpeed implementation of the training strategy.

Modified from https://github.com/OpenRLHF/OpenRLHF, with these changes: 1. inherits from StrategyBase, add some api 2. removed ring-attn related code

Parameters:

seed (int) – Random seed for reproducibility
max_norm (float) – Maximum gradient norm for gradient clipping (0.0 means no clipping)
micro_train_batch_size (int) – Batch size for a single GPU/process
train_batch_size (int) – Global batch size across all GPUs/processes
zero_stage (int) – DeepSpeed ZeRO optimization stage (0, 1, 2, or 3)
bf16 (bool) – Whether to use bfloat16 precision
args (object) – Additional arguments for configuration

__init__(seed: int = 42, max_norm: float = 0.0, micro_train_batch_size: int = 1, train_batch_size: int = 1, zero_stage: int = 2, bf16: bool = True, args=None) → None[source]¶

Initialize strategy with common parameters.

Parameters:

seed (int) – Random seed for reproducibility
max_norm (float) – Maximum gradient norm for clipping
micro_train_batch_size (int) – Batch size for each training step
train_batch_size (int) – Total batch size for training
args (Any (usually argparse.Namespace)) – Additional configuration arguments

backward(loss: torch.Tensor, model: torch.nn.Module, optimizer: torch.optim.Optimizer, **kwargs) → None[source]¶

Perform backward pass to compute gradients.

Parameters:

loss (torch.Tensor) – The loss tensor to backpropagate
model (nn.Module) – The model being trained
optimizer (optim.Optimizer) – The optimizer for the model
kwargs – Additional arguments

create_optimizer(model, **kwargs) → torch.optim.Optimizer[source]¶

Create an optimizer for the given model.

Parameters:

model (nn.Module) – The model to create an optimizer for
kwargs – Additional arguments for the optimizer

Returns:

The created optimizer

Return type:

Optimizer

get_ds_eval_config(offload=False)[source]¶

Get the DeepSpeed configuration for evaluation.

Parameters:: offload (bool) – Whether to offload parameters to CPU
Returns:: DeepSpeed configuration dictionary
Return type:: dict

get_ds_train_config(is_actor)[source]¶

Get the DeepSpeed configuration for training.

Parameters:: is_actor (bool) – Whether the model is an actor model
Returns:: DeepSpeed configuration dictionary
Return type:: dict

init_model_context()[source]¶

Context manager for initializing a model with DeepSpeed configuration.

This sets up the HfDeepSpeedConfig for use with Hugging Face models.

load_ckpt(model, load_dir, tag=None, load_module_strict=True, load_optimizer_states=True, load_lr_scheduler_states=True, load_module_only=False, **_kwargs)[source]¶

Load a DeepSpeed model checkpoint from the specified directory.

This function wraps DeepSpeed’s checkpoint loading functionality with error handling.

Parameters:

model (deepspeed.DeepSpeedEngine) – The DeepSpeed model to load the checkpoint into
load_dir (str) – Directory from which to load the checkpoint
tag (str, optional) – Optional tag to specify which checkpoint to load
load_module_strict (bool, default=True) – Whether to strictly enforce that the keys in the model state dict match the keys in the checkpoint
load_optimizer_states (bool, default=True) – Whether to load optimizer states from checkpoint
load_lr_scheduler_states (bool, default=True) – Whether to load learning rate scheduler states
load_module_only (bool, default=False) – Whether to load only the module weights and not optimizer or scheduler states

Returns:

A tuple containing the checkpoint path and loaded states

Return type:

tuple(str, dict)

Raises:

AssertionError – If model is not a DeepSpeedEngine
Exception – If loading the checkpoint fails

load_model(model: torch.nn.Module, path: str, map_location='cpu', strict: bool = False, key_replace_fn=None) → None[source]¶

Load model weights from a file.

Parameters:

model (nn.Module) – The model to load weights into
path (str) – Path to the saved model weights
map_location (str or torch.device) – Device to load the weights to
strict (bool) – Whether to strictly enforce that the keys in state_dict match the model
key_replace_fn (callable) – Function to modify state dict keys

moving_average(model, model_ema, beta=0.992, device='cpu')[source]¶

Update model_ema parameters with exponential moving average of model parameters.

Parameters:

model (nn.Module) – Source model
model_ema (nn.Module) – Target model for EMA
beta (float) – EMA decay factor
device (str) – Device to perform operations on

optimizer_step(optimizer: torch.optim.Optimizer, model: torch.nn.Module, scheduler, name='model', **kwargs) → None[source]¶

Perform an optimization step.

Parameters:

optimizer (optim.Optimizer) – The optimizer to step
model (nn.Module) – The model being trained
scheduler – The learning rate scheduler
name (str) – Name identifier for the model
kwargs – Additional arguments

prepare(*models_or_model_optim_pairs: torch.nn.Module | Tuple[torch.nn.Module, torch.optim.Optimizer], is_rlhf=False) → List[torch.nn.Module | Tuple[torch.nn.Module, torch.optim.Optimizer]] | torch.nn.Module | Tuple[torch.nn.Module, torch.optim.Optimizer][source]¶

Prepare models and optimizers for DeepSpeed training.

Expected input format for RLHF: tuple: ((actor, actor_optim, actor_scheduler),

(critic, critic_optim, critic_scheduler), reward_models, initial_model)

Parameters:

models_or_model_optim_pairs – Models or (model, optimizer, scheduler) tuples
is_rlhf (bool) – Whether this is for RLHF training

Returns:

Prepared models or model-optimizer pairs

Return type:

Union[List[ModelOrModelOptimPair], ModelOrModelOptimPair]

prepare_reward_models(reward_models)[source]¶

Prepare reward models for DeepSpeed evaluation.

Parameters:: reward_models (List[nn.Module]) – List of reward models to prepare
Returns:: List of prepared reward models
Return type:: List[nn.Module]

save_ckpt(model, save_dir, tag=None, max_num=3, max_mem=1000, client_state={}, save_latest=True, **_kwargs)[source]¶

Save a DeepSpeed model checkpoint with automatic management of checkpoint storage.

This function manages checkpoint storage by limiting the number of checkpoints and total storage size. It automatically removes the oldest checkpoints when limits are exceeded.

Parameters:

model (deepspeed.DeepSpeedEngine) – The DeepSpeed model to save
save_dir (str) – Directory where checkpoints will be saved
tag (str, optional) – Optional tag for the checkpoint (e.g., iteration number)
max_num (int, default=3) – Maximum number of checkpoints to keep
max_mem (int, default=1000) – Maximum storage size for checkpoints in GB
client_state (dict, default={}) – Additional client state to save with the checkpoint
save_latest (bool, default=True) – Whether to save a symbolic link to the latest checkpoint
_kwargs – dict, not used keys, e.g. optimizer, scheduler

Returns:

None

Raises:

AssertionError – If model is not a DeepSpeedEngine

save_model(model: torch.nn.Module, tokenizer, output_dir, **kwargs) → None[source]¶

Save a model, its configuration, and tokenizer to the specified output directory.

Handles special cases for DeepSpeed ZeRO-2/3 parameters and PEFT models. For ZeRO parallelism, it gathers distributed parameters before saving. For PEFT models, it saves adapter weights appropriately based on the DeepSpeed stage.

Parameters:

model (nn.Module) – The model to save
tokenizer (PreTrainedTokenizer or similar) – The tokenizer to save
output_dir (str) – Directory where the model, config, and tokenizer will be saved
kwargs – Additional arguments to pass to the model’s save_pretrained method

Returns:

None

unwrap_model(model) → torch.nn.Module[source]¶

Unwrap the model from any wrappers to access the base model.

Parameters:: model (nn.Module) – The model to unwrap
Returns:: The unwrapped model
Return type:: nn.Module