Shortcuts

lightrft.strategy.deepspeed.deepspeed_utils

DeepSpeed Configuration and Optimization Utilities Module.

This module provides utility functions for configuring DeepSpeed for training and evaluation, managing optimizer parameters, and handling DeepSpeed ZeRO stage 3 states. It includes functions for creating DeepSpeed configurations with various optimization options, organizing model parameters for optimizers with weight decay control, and offloading/reloading DeepSpeed states to manage memory efficiently.

get_train_ds_config

lightrft.strategy.deepspeed.deepspeed_utils.get_train_ds_config(offload, adam_offload=True, stage=2, bf16=True, max_norm=1.0, zpg=8, grad_accum_dtype=None, overlap_comm=False)[source]

Generate a DeepSpeed configuration dictionary for training.

Parameters:
  • offload (bool) – Whether to offload parameters to CPU.

  • adam_offload (bool) – Whether to offload Adam optimizer states to CPU.

  • stage (int) – ZeRO optimization stage (0, 1, 2, or 3).

  • bf16 (bool) – Whether to use bfloat16 precision.

  • max_norm (float) – Maximum norm for gradient clipping.

  • zpg (int) – ZeRO++ partition size.

  • grad_accum_dtype (str or None) – Data type for gradient accumulation.

  • overlap_comm (bool) – Whether to overlap communication with computation.

Returns:

DeepSpeed configuration dictionary for training.

Return type:

dict

get_eval_ds_config

lightrft.strategy.deepspeed.deepspeed_utils.get_eval_ds_config(offload, stage=0, bf16=True)[source]

Generate a DeepSpeed configuration dictionary for evaluation.

Parameters:
  • offload (bool) – Whether to offload parameters to CPU.

  • stage (int) – ZeRO optimization stage (0, 1, 2, or 3).

  • bf16 (bool) – Whether to use bfloat16 precision.

Returns:

DeepSpeed configuration dictionary for evaluation.

Return type:

dict

offload_deepspeed_states

lightrft.strategy.deepspeed.deepspeed_utils.offload_deepspeed_states(model, pin_memory=True, non_blocking=True)[source]

Offload DeepSpeed optimizer states to CPU to save GPU memory.

This function is particularly useful for ZeRO stage 3 when not using Adam optimizer offloading. It offloads various states to CPU, empties partition cache, and synchronizes devices.

Parameters:
  • model (deepspeed.DeepSpeedEngine) – DeepSpeed model with optimizer.

  • pin_memory (bool) – Whether to use pinned memory for offloaded states.

  • non_blocking (bool) – Whether to perform non-blocking transfers.

Raises:

NotImplementedError – If ZeRO stage is not 3.

reload_deepspeed_states

lightrft.strategy.deepspeed.deepspeed_utils.reload_deepspeed_states(model, non_blocking=True)[source]

Reload DeepSpeed optimizer states from CPU back to GPU.

This function is used to restore states previously offloaded with offload_deepspeed_states().

Parameters:
  • model (deepspeed.DeepSpeedEngine) – DeepSpeed model with optimizer.

  • non_blocking (bool) – Whether to perform non-blocking transfers.

Raises:

NotImplementedError – If ZeRO stage is not 3.