Shortcuts

lightrft.strategy.config

Configuration dataclasses for LightRFT strategy module.

This module provides typed configuration objects to replace the use of getattr for accessing configuration parameters, improving type safety and code clarity.

class lightrft.strategy.config.StrategyConfig(seed: int = 42, max_norm: float = 1.0, micro_train_batch_size: int = 1, train_batch_size: int = 128, bf16: bool = True, zero_stage: int = 2, fsdp: bool = False, fsdp_cpu_offload: bool = False, adam_offload: bool = False, zpg: int = 1, grad_accum_dtype: str | None = None, overlap_comm: bool = False, engine_type: str = 'vllm', engine_tp_size: int = 1, enable_engine_sleep: bool = False, local_rank: int = -1, sp_size: int = 1, actor_learning_rate: float = 1e-05, critic_learning_rate: float = 1e-05, adam_betas: tuple = (0.9, 0.95), l2: float = 0.0, lr_warmup_ratio: float = 0.03, critic_pretrain: bool = False, remote_rm_url: str | None = None, pretrain_data: str | None = None, fused_linear_logprob: bool = False, reward_running_norm: bool = False, reward_running_norm_minus_mean: bool = False, advantages_norm: bool = False, advantage_clip: float = 0.0, reward_clip: float = 0.0, micro_rollout_batch_size: int = 2, n_samples_per_prompt: int = 8, overlong_buffer: bool = False, overlong_buffer_len: int = 1024, overlong_buffer_penalty_factor: float = 1.0, dynamic_sampling: bool = False, advantage_estimator: str = 'group_norm', use_kl_loss: bool = False, kl_estimator: str = 'k3', mixed_mm_data: bool = False, use_mp_opt: bool = False, plot_every: int = -1, use_tensorboard: bool = False, extra_args: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: object

Base configuration for all training strategies.

actor_learning_rate: float = 1e-05
adam_betas: tuple = (0.9, 0.95)
adam_offload: bool = False
advantage_clip: float = 0.0
advantage_estimator: str = 'group_norm'
advantages_norm: bool = False
bf16: bool = True
critic_learning_rate: float = 1e-05
critic_pretrain: bool = False
dynamic_sampling: bool = False
enable_engine_sleep: bool = False
engine_tp_size: int = 1
engine_type: str = 'vllm'
extra_args: Dict[str, Any]
classmethod from_args(args_dict) StrategyConfig[source]

Create StrategyConfig from argparse.Namespace or similar object.

This method provides backward compatibility by extracting parameters that were previously accessed via getattr, ensuring smooth migration from legacy configuration systems.

Parameters:

args_dict (object) – Configuration arguments object containing training parameters

Returns:

StrategyConfig instance with extracted parameters

Return type:

StrategyConfig

Example:

# From argparse.Namespace
args = argparse.Namespace(
    seed=42,
    max_norm=1.0,
    micro_train_batch_size=1,
    # ... other parameters
)
config = StrategyConfig.from_args(args)

# From dictionary
args_dict = {
    'seed': 42,
    'max_norm': 1.0,
    'micro_train_batch_size': 1,
    # ... other parameters
}
config = StrategyConfig.from_args(args_dict)
fsdp: bool = False
fsdp_cpu_offload: bool = False
fused_linear_logprob: bool = False
grad_accum_dtype: str | None = None
kl_estimator: str = 'k3'
l2: float = 0.0
local_rank: int = -1
lr_warmup_ratio: float = 0.03
max_norm: float = 1.0
micro_rollout_batch_size: int = 2
micro_train_batch_size: int = 1
mixed_mm_data: bool = False
n_samples_per_prompt: int = 8
overlap_comm: bool = False
overlong_buffer: bool = False
overlong_buffer_len: int = 1024
overlong_buffer_penalty_factor: float = 1.0
plot_every: int = -1
pretrain_data: str | None = None
print_config_summary() None[source]

Print a summary of the configuration for verification.

This method shows which parameters were overridden from defaults and which are using default values.

remote_rm_url: str | None = None
reward_clip: float = 0.0
reward_running_norm: bool = False
reward_running_norm_minus_mean: bool = False
seed: int = 42
sp_size: int = 1
train_batch_size: int = 128
use_kl_loss: bool = False
use_mp_opt: bool = False
use_tensorboard: bool = False
zero_stage: int = 2
zpg: int = 1