lightrft.strategy.fake_strategy¶
FakeStrategy for testing LightRFT without distributed environment.
This module provides a FakeStrategy class that mimics the behavior of real distributed training strategies (DeepSpeed, FSDP) but runs in a single process without actual distributed communication. This is useful for unit testing and development environments where distributed setup is not available.
- class lightrft.strategy.fake_strategy.FakeStrategy(seed: int = 42, max_norm: float = 1.0, micro_train_batch_size: int = 1, train_batch_size: int = 128, args=None)[source]¶
Bases:
StrategyBaseFake strategy for testing without distributed environment.
This strategy provides the same API as real distributed strategies but runs everything in a single process without actual distributed communication. It’s useful for unit testing and development.
- Parameters:
seed (int) – Random seed for reproducibility
max_norm (float) – Maximum gradient norm for clipping
micro_train_batch_size (int) – Batch size for each training step
train_batch_size (int) – Total batch size for training
args (Any) – Additional configuration arguments
- all_gather(data)[source]¶
Fake all-gather operation - returns data wrapped in list.
- Parameters:
data (Union[torch.Tensor, dict]) – Data to be gathered
- Returns:
Data wrapped to mimic gathered result
- Return type:
Union[torch.Tensor, dict]
- all_reduce(data, op='mean')[source]¶
Fake all-reduce operation - returns data unchanged.
- Parameters:
data (Union[torch.Tensor, dict]) – Data to be reduced
op (str) – Reduction operation (ignored in fake mode)
- Returns:
Data unchanged
- Return type:
Union[torch.Tensor, dict]
- backward(loss: torch.Tensor, model: torch.nn.Module, optimizer: torch.optim.Optimizer, **kwargs) None[source]¶
Perform backward pass using standard PyTorch.
- Parameters:
loss (torch.Tensor) – The loss to backpropagate
model (nn.Module) – The model
optimizer (Optimizer) – The optimizer
kwargs – Additional arguments
- create_optimizer(model: torch.nn.Module, **kwargs) torch.optim.Optimizer[source]¶
Create a standard optimizer for the model.
- Parameters:
model (nn.Module) – The model to optimize
kwargs – Additional optimizer arguments
- Returns:
The created optimizer
- Return type:
Optimizer
- engine_generate_local(sampling_params, prompt_token_ids=None, multi_modal_inputs=None)[source]¶
Fake generation - returns empty results.
- Parameters:
sampling_params – Parameters for generation (ignored)
prompt_token_ids – Prompt token IDs (ignored)
multi_modal_inputs – Multimodal inputs (ignored)
- Returns:
Empty list
- Return type:
List
- gather_and_generate(sampling_params, all_prompt_token_ids=None, all_prompts=None, all_images=None, sleep_engine=True, images_num=None)[source]¶
Fake gather and generate - returns empty results.
- Parameters:
sampling_params – Parameters for generation (ignored)
all_prompt_token_ids – All prompt token IDs (ignored)
all_prompts – All prompts (ignored)
all_images – All images (ignored)
sleep_engine (bool) – Whether to sleep engine after generation (ignored)
images_num – Number of images (ignored)
- Returns:
Empty list
- Return type:
List
- classmethod is_rank_0() bool[source]¶
Always returns True in fake mode (single process is rank 0).
- Returns:
True
- Return type:
bool
- load_ckpt(model, load_dir: str, tag=None, load_module_strict=True, load_optimizer_states=True, load_lr_scheduler_states=True, load_module_only=False)[source]¶
Load checkpoint using standard PyTorch loading.
- Parameters:
model – The model to load checkpoint into
load_dir (str) – Directory containing the checkpoint
tag – Optional specific checkpoint tag to load
load_module_strict (bool) – Whether to use strict loading for module states
load_optimizer_states (bool) – Whether to load optimizer states
load_lr_scheduler_states (bool) – Whether to load learning rate scheduler states
load_module_only (bool) – Whether to load only the module states
- Returns:
Tuple of (load_path, client_states)
- Return type:
tuple
- maybe_load_optimizer(optimizer, device=torch.cuda.current_device)[source]¶
Fake optimizer loading - returns optimizer unchanged.
- Parameters:
optimizer (torch.optim.Optimizer) – The optimizer to potentially load
device (torch.device) – Target device for loading (ignored)
- Returns:
The original optimizer
- Return type:
torch.optim.Optimizer
- maybe_offload_optimizer(optimizer)[source]¶
Fake optimizer offloading - returns optimizer unchanged.
- Parameters:
optimizer (torch.optim.Optimizer) – The optimizer to potentially offload
- Returns:
The original optimizer
- Return type:
torch.optim.Optimizer
- optimizer_step(optimizer: torch.optim.Optimizer, model: torch.nn.Module, scheduler=None, name='model', **kwargs) None[source]¶
Take optimizer step using standard PyTorch.
- Parameters:
optimizer (Optimizer) – The optimizer
model (nn.Module) – The model
scheduler – The learning rate scheduler (optional)
name (str) – Name for logging purposes
kwargs – Additional arguments
- prepare(*models_or_model_optim_pairs: torch.nn.Module | Tuple[torch.nn.Module, torch.optim.Optimizer]) List[torch.nn.Module | Tuple[torch.nn.Module, torch.optim.Optimizer]] | torch.nn.Module | Tuple[torch.nn.Module, torch.optim.Optimizer][source]¶
Prepare models and optimizers - returns them as-is in fake mode.
- Parameters:
models_or_model_optim_pairs – Models or (model, optimizer) pairs to prepare
- Returns:
Prepared models/optimizers (unchanged in fake mode)
- Return type:
Union[List[ModelOrModelOptimPair], ModelOrModelOptimPair]
- save_ckpt(model, save_dir: str, tag=None, max_num=3, max_mem=1000, client_state={}, save_latest=True) None[source]¶
Save checkpoint using standard PyTorch saving.
- Parameters:
model – The model to save
save_dir (str) – Directory to save the checkpoint
tag – Optional tag for the checkpoint
max_num (int) – Maximum number of checkpoints to keep
max_mem (int) – Maximum memory in MB for checkpoints (ignored)
client_state (dict) – Additional state to save
save_latest (bool) – Whether to save as latest checkpoint
- setup_distributed(timeout=None, num_gpu_per_node=8) None[source]¶
Fake distributed setup - does nothing in single process mode.
- Parameters:
timeout (timedelta, optional) – Maximum time to wait for initialization (ignored)
num_gpu_per_node (int) – Number of GPUs per node (ignored)
- setup_inference_engine(args, engine_type='vllm', actor=None)[source]¶
Fake inference engine setup - returns None.
- Parameters:
args (argparse.Namespace) – Configuration arguments
engine_type (str) – Type of inference engine (ignored)
actor (torch.nn.Module) – The actor module (ignored)
- Returns:
None
- Return type:
None