lightrft.strategy.config¶

Configuration dataclasses for LightRFT strategy module.

This module provides typed configuration objects to replace the use of getattr for accessing configuration parameters, improving type safety and code clarity.

class lightrft.strategy.config.StrategyConfig(seed: int = 42, max_norm: float = 1.0, micro_train_batch_size: int = 1, train_batch_size: int = 128, bf16: bool = True, zero_stage: int = 2, fsdp: bool = False, fsdp_cpu_offload: bool = False, adam_offload: bool = False, zpg: int = 1, grad_accum_dtype: str | None = None, overlap_comm: bool = False, engine_type: str = 'vllm', engine_tp_size: int = 1, enable_engine_sleep: bool = False, local_rank: int = -1, sp_size: int = 1, actor_learning_rate: float = 1e-05, critic_learning_rate: float = 1e-05, adam_betas: tuple = (0.9, 0.95), l2: float = 0.0, lr_warmup_ratio: float = 0.03, critic_pretrain: bool = False, remote_rm_url: str | None = None, pretrain_data: str | None = None, fused_linear_logprob: bool = False, reward_running_norm: bool = False, reward_running_norm_minus_mean: bool = False, advantages_norm: bool = False, advantage_clip: float = 0.0, reward_clip: float = 0.0, micro_rollout_batch_size: int = 2, n_samples_per_prompt: int = 8, overlong_buffer: bool = False, overlong_buffer_len: int = 1024, overlong_buffer_penalty_factor: float = 1.0, dynamic_sampling: bool = False, advantage_estimator: str = 'group_norm', opd_kl_coef: float = 1.0, teacher_model_url: str | None = None, use_task_reward: bool = True, use_kl_loss: bool = False, kl_estimator: str = 'k3', mixed_mm_data: bool = False, use_mp_opt: bool = False, plot_every: int = -1, use_tensorboard: bool = False, extra_args: ~typing.Dict[str, ~typing.Any] = <factory>)[source]¶

Bases: object

Base configuration for all training strategies.

actor_learning_rate: float = 1e-05¶

adam_betas: tuple = (0.9, 0.95)¶

adam_offload: bool = False¶

advantage_clip: float = 0.0¶

advantage_estimator: str = 'group_norm'¶

advantages_norm: bool = False¶

bf16: bool = True¶

critic_learning_rate: float = 1e-05¶

critic_pretrain: bool = False¶

dynamic_sampling: bool = False¶

enable_engine_sleep: bool = False¶

engine_tp_size: int = 1¶

engine_type: str = 'vllm'¶

extra_args: Dict[str, Any]¶

classmethod from_args(args_dict) → StrategyConfig[source]¶

Create StrategyConfig from argparse.Namespace or similar object.

This method provides backward compatibility by extracting parameters that were previously accessed via getattr, ensuring smooth migration from legacy configuration systems.

Parameters:: args_dict (object) – Configuration arguments object containing training parameters
Returns:: StrategyConfig instance with extracted parameters
Return type:: StrategyConfig

Example:

# From argparse.Namespace
args = argparse.Namespace(
    seed=42,
    max_norm=1.0,
    micro_train_batch_size=1,
    # ... other parameters
)
config = StrategyConfig.from_args(args)

# From dictionary
args_dict = {
    'seed': 42,
    'max_norm': 1.0,
    'micro_train_batch_size': 1,
    # ... other parameters
}
config = StrategyConfig.from_args(args_dict)

fsdp: bool = False¶

fsdp_cpu_offload: bool = False¶

fused_linear_logprob: bool = False¶

grad_accum_dtype: str | None = None¶

kl_estimator: str = 'k3'¶

l2: float = 0.0¶

local_rank: int = -1¶

lr_warmup_ratio: float = 0.03¶

max_norm: float = 1.0¶

micro_rollout_batch_size: int = 2¶

micro_train_batch_size: int = 1¶

mixed_mm_data: bool = False¶

n_samples_per_prompt: int = 8¶

opd_kl_coef: float = 1.0¶

overlap_comm: bool = False¶

overlong_buffer: bool = False¶

overlong_buffer_len: int = 1024¶

overlong_buffer_penalty_factor: float = 1.0¶

plot_every: int = -1¶

pretrain_data: str | None = None¶

print_config_summary() → None[source]¶

Print a summary of the configuration for verification.

This method shows which parameters were overridden from defaults and which are using default values.

remote_rm_url: str | None = None¶

reward_clip: float = 0.0¶

reward_running_norm: bool = False¶

reward_running_norm_minus_mean: bool = False¶

seed: int = 42¶

sp_size: int = 1¶

teacher_model_url: str | None = None¶

train_batch_size: int = 128¶

use_kl_loss: bool = False¶

use_mp_opt: bool = False¶

use_task_reward: bool = True¶

use_tensorboard: bool = False¶

zero_stage: int = 2¶

zpg: int = 1¶