lightrft.strategy.deepspeed.deepspeed_utils¶

DeepSpeed Configuration and Optimization Utilities Module.

This module provides utility functions for configuring DeepSpeed for training and evaluation, managing optimizer parameters, and handling DeepSpeed ZeRO stage 3 states. It includes functions for creating DeepSpeed configurations with various optimization options, organizing model parameters for optimizers with weight decay control, and offloading/reloading DeepSpeed states to manage memory efficiently.

get_train_ds_config¶

lightrft.strategy.deepspeed.deepspeed_utils.get_train_ds_config(offload, adam_offload=True, stage=2, bf16=True, max_norm=1.0, zpg=8, grad_accum_dtype=None, overlap_comm=False)[source]¶

Generate a DeepSpeed configuration dictionary for training.

Parameters:

offload (bool) – Whether to offload parameters to CPU.
adam_offload (bool) – Whether to offload Adam optimizer states to CPU.
stage (int) – ZeRO optimization stage (0, 1, 2, or 3).
bf16 (bool) – Whether to use bfloat16 precision.
max_norm (float) – Maximum norm for gradient clipping.
zpg (int) – ZeRO++ partition size.
grad_accum_dtype (str or None) – Data type for gradient accumulation.
overlap_comm (bool) – Whether to overlap communication with computation.

Returns:

DeepSpeed configuration dictionary for training.

Return type:

dict

get_eval_ds_config¶

lightrft.strategy.deepspeed.deepspeed_utils.get_eval_ds_config(offload, stage=0, bf16=True)[source]¶

Generate a DeepSpeed configuration dictionary for evaluation.

Parameters:

offload (bool) – Whether to offload parameters to CPU.
stage (int) – ZeRO optimization stage (0, 1, 2, or 3).
bf16 (bool) – Whether to use bfloat16 precision.

Returns:

DeepSpeed configuration dictionary for evaluation.

Return type:

dict

offload_deepspeed_states¶

lightrft.strategy.deepspeed.deepspeed_utils.offload_deepspeed_states(model, pin_memory=True, non_blocking=True)[source]¶

Offload DeepSpeed optimizer states to CPU to save GPU memory.

This function is particularly useful for ZeRO stage 3 when not using Adam optimizer offloading. It offloads various states to CPU, empties partition cache, and synchronizes devices.

Parameters:

model (deepspeed.DeepSpeedEngine) – DeepSpeed model with optimizer.
pin_memory (bool) – Whether to use pinned memory for offloaded states.
non_blocking (bool) – Whether to perform non-blocking transfers.

Raises:

NotImplementedError – If ZeRO stage is not 3.

reload_deepspeed_states¶

lightrft.strategy.deepspeed.deepspeed_utils.reload_deepspeed_states(model, non_blocking=True)[source]¶

Reload DeepSpeed optimizer states from CPU back to GPU.

This function is used to restore states previously offloaded with offload_deepspeed_states().

Parameters:

model (deepspeed.DeepSpeedEngine) – DeepSpeed model with optimizer.
non_blocking (bool) – Whether to perform non-blocking transfers.

Raises:

NotImplementedError – If ZeRO stage is not 3.