lightrft.strategy.config¶
Configuration dataclasses for LightRFT strategy module.
This module provides typed configuration objects to replace the use of getattr for accessing configuration parameters, improving type safety and code clarity.
- class lightrft.strategy.config.StrategyConfig(seed: int = 42, max_norm: float = 1.0, micro_train_batch_size: int = 1, train_batch_size: int = 128, bf16: bool = True, zero_stage: int = 2, fsdp: bool = False, fsdp_cpu_offload: bool = False, adam_offload: bool = False, zpg: int = 1, grad_accum_dtype: str | None = None, overlap_comm: bool = False, engine_type: str = 'vllm', engine_tp_size: int = 1, enable_engine_sleep: bool = False, local_rank: int = -1, sp_size: int = 1, actor_learning_rate: float = 1e-05, critic_learning_rate: float = 1e-05, adam_betas: tuple = (0.9, 0.95), l2: float = 0.0, lr_warmup_ratio: float = 0.03, critic_pretrain: bool = False, remote_rm_url: str | None = None, pretrain_data: str | None = None, fused_linear_logprob: bool = False, reward_running_norm: bool = False, reward_running_norm_minus_mean: bool = False, advantages_norm: bool = False, advantage_clip: float = 0.0, reward_clip: float = 0.0, micro_rollout_batch_size: int = 2, n_samples_per_prompt: int = 8, overlong_buffer: bool = False, overlong_buffer_len: int = 1024, overlong_buffer_penalty_factor: float = 1.0, dynamic_sampling: bool = False, advantage_estimator: str = 'group_norm', use_kl_loss: bool = False, kl_estimator: str = 'k3', mixed_mm_data: bool = False, use_mp_opt: bool = False, plot_every: int = -1, use_tensorboard: bool = False, extra_args: ~typing.Dict[str, ~typing.Any] = <factory>)[source]¶
Bases:
objectBase configuration for all training strategies.
- actor_learning_rate: float = 1e-05¶
- adam_betas: tuple = (0.9, 0.95)¶
- adam_offload: bool = False¶
- advantage_clip: float = 0.0¶
- advantage_estimator: str = 'group_norm'¶
- advantages_norm: bool = False¶
- bf16: bool = True¶
- critic_learning_rate: float = 1e-05¶
- critic_pretrain: bool = False¶
- dynamic_sampling: bool = False¶
- enable_engine_sleep: bool = False¶
- engine_tp_size: int = 1¶
- engine_type: str = 'vllm'¶
- extra_args: Dict[str, Any]¶
- classmethod from_args(args_dict) StrategyConfig[source]¶
Create StrategyConfig from argparse.Namespace or similar object.
This method provides backward compatibility by extracting parameters that were previously accessed via getattr, ensuring smooth migration from legacy configuration systems.
- Parameters:
args_dict (object) – Configuration arguments object containing training parameters
- Returns:
StrategyConfig instance with extracted parameters
- Return type:
Example:
# From argparse.Namespace args = argparse.Namespace( seed=42, max_norm=1.0, micro_train_batch_size=1, # ... other parameters ) config = StrategyConfig.from_args(args) # From dictionary args_dict = { 'seed': 42, 'max_norm': 1.0, 'micro_train_batch_size': 1, # ... other parameters } config = StrategyConfig.from_args(args_dict)
- fsdp: bool = False¶
- fsdp_cpu_offload: bool = False¶
- fused_linear_logprob: bool = False¶
- grad_accum_dtype: str | None = None¶
- kl_estimator: str = 'k3'¶
- l2: float = 0.0¶
- local_rank: int = -1¶
- lr_warmup_ratio: float = 0.03¶
- max_norm: float = 1.0¶
- micro_rollout_batch_size: int = 2¶
- micro_train_batch_size: int = 1¶
- mixed_mm_data: bool = False¶
- n_samples_per_prompt: int = 8¶
- overlap_comm: bool = False¶
- overlong_buffer: bool = False¶
- overlong_buffer_len: int = 1024¶
- overlong_buffer_penalty_factor: float = 1.0¶
- plot_every: int = -1¶
- pretrain_data: str | None = None¶
- print_config_summary() None[source]¶
Print a summary of the configuration for verification.
This method shows which parameters were overridden from defaults and which are using default values.
- remote_rm_url: str | None = None¶
- reward_clip: float = 0.0¶
- reward_running_norm: bool = False¶
- reward_running_norm_minus_mean: bool = False¶
- seed: int = 42¶
- sp_size: int = 1¶
- train_batch_size: int = 128¶
- use_kl_loss: bool = False¶
- use_mp_opt: bool = False¶
- use_tensorboard: bool = False¶
- zero_stage: int = 2¶
- zpg: int = 1¶