lightrft.strategy.deepspeed.deepspeed_utils¶
DeepSpeed Configuration and Optimization Utilities Module.
This module provides utility functions for configuring DeepSpeed for training and evaluation, managing optimizer parameters, and handling DeepSpeed ZeRO stage 3 states. It includes functions for creating DeepSpeed configurations with various optimization options, organizing model parameters for optimizers with weight decay control, and offloading/reloading DeepSpeed states to manage memory efficiently.
get_train_ds_config¶
- lightrft.strategy.deepspeed.deepspeed_utils.get_train_ds_config(offload, adam_offload=True, stage=2, bf16=True, max_norm=1.0, zpg=8, grad_accum_dtype=None, overlap_comm=False)[source]¶
Generate a DeepSpeed configuration dictionary for training.
- Parameters:
offload (bool) – Whether to offload parameters to CPU.
adam_offload (bool) – Whether to offload Adam optimizer states to CPU.
stage (int) – ZeRO optimization stage (0, 1, 2, or 3).
bf16 (bool) – Whether to use bfloat16 precision.
max_norm (float) – Maximum norm for gradient clipping.
zpg (int) – ZeRO++ partition size.
grad_accum_dtype (str or None) – Data type for gradient accumulation.
overlap_comm (bool) – Whether to overlap communication with computation.
- Returns:
DeepSpeed configuration dictionary for training.
- Return type:
dict
get_eval_ds_config¶
- lightrft.strategy.deepspeed.deepspeed_utils.get_eval_ds_config(offload, stage=0, bf16=True)[source]¶
Generate a DeepSpeed configuration dictionary for evaluation.
- Parameters:
offload (bool) – Whether to offload parameters to CPU.
stage (int) – ZeRO optimization stage (0, 1, 2, or 3).
bf16 (bool) – Whether to use bfloat16 precision.
- Returns:
DeepSpeed configuration dictionary for evaluation.
- Return type:
dict
offload_deepspeed_states¶
- lightrft.strategy.deepspeed.deepspeed_utils.offload_deepspeed_states(model, pin_memory=True, non_blocking=True)[source]¶
Offload DeepSpeed optimizer states to CPU to save GPU memory.
This function is particularly useful for ZeRO stage 3 when not using Adam optimizer offloading. It offloads various states to CPU, empties partition cache, and synchronizes devices.
- Parameters:
model (deepspeed.DeepSpeedEngine) – DeepSpeed model with optimizer.
pin_memory (bool) – Whether to use pinned memory for offloaded states.
non_blocking (bool) – Whether to perform non-blocking transfers.
- Raises:
NotImplementedError – If ZeRO stage is not 3.
reload_deepspeed_states¶
- lightrft.strategy.deepspeed.deepspeed_utils.reload_deepspeed_states(model, non_blocking=True)[source]¶
Reload DeepSpeed optimizer states from CPU back to GPU.
This function is used to restore states previously offloaded with offload_deepspeed_states().
- Parameters:
model (deepspeed.DeepSpeedEngine) – DeepSpeed model with optimizer.
non_blocking (bool) – Whether to perform non-blocking transfers.
- Raises:
NotImplementedError – If ZeRO stage is not 3.