lightrft.trainer.fast_exp_maker¶
FastExperienceMaker Module
This module provides an optimized experience maker for RLHF (Reinforcement Learning from Human Feedback) that supports high-performance inference backends like VLLM and SGLang. It extends the base NaiveExperienceMaker with enhanced features for multimodal data processing, reward computation, and advantage estimation.
- Key Features:
VLLM/SGLang backend support for efficient text generation
Multimodal (vision-language) data processing
Multiple advantage estimation methods (GAE, RLOO, REINFORCE, Group Norm)
Flexible reward model composition with custom reward functions
Sample packing support for improved training efficiency
Running reward normalization and advantage whitening
- Classes:
MultimodalDataProcessor: Handles preprocessing of mixed text/image data RewardComputationEngine: Manages reward model inference and aggregation FastExperienceMaker: Main experience generation class
FastExperienceMaker¶
- class lightrft.trainer.fast_exp_maker.FastExperienceMaker(*args, packing_samples: bool = False, processor=None, **kwargs)[source]¶
Optimized experience maker with VLLM/SGLang support and advanced RL features.
- This class extends NaiveExperienceMaker to provide:
High-performance inference via VLLM or SGLang backends
Multimodal (vision-language) data processing
Multiple advantage estimation algorithms (GAE, RLOO, REINFORCE, Group Norm)
Flexible reward model composition with custom aggregation
Sample packing for improved training efficiency
Running reward normalization and advantage whitening/clipping
- The experience generation pipeline:
Sample Generation: Use inference engine to generate responses
Shard-Parallel Preprocessing: Distribute samples across shards
Model Inference: Batch forward through actor, critic, initial, and reward models
Shard-Parallel Postprocessing: Gather results back
Reward Processing: Apply transformations (normalization, shaping, filtering)
Advantage Estimation: Compute advantages and returns
- Args:
packing_samples: Whether to pack multiple sequences into single batch processor: Multimodal processor for vision-language models *args, **kwargs: Arguments passed to parent NaiveExperienceMaker
- __init__(*args, packing_samples: bool = False, processor=None, **kwargs)[source]¶
Initialize FastExperienceMaker.
- Parameters:
args (tuple) – Positional arguments for NaiveExperienceMaker
packing_samples (bool) – Enable sample packing for efficiency
processor (Optional[Any]) – Multimodal processor (required for VLM models)
kwargs (dict) – Keyword arguments for NaiveExperienceMaker
- generate_samples(all_prompts: List[str], all_images: List | None = None, all_videos: List | None = None, images_num: List[int] | None = None, videos_num: List[int] | None = None, all_references: List[str] | None = None, all_labels: List | None = None, **generate_kwargs) List[Samples]¶
Generate samples using the inference engine (VLLM or SGLang).
- This method handles:
Sampling parameter configuration
Multimodal data processing
Inference engine invocation
Output processing into Samples format
- Parameters:
all_prompts (List[str]) – List of text prompts
all_images (Optional[List]) – Optional images for VLM
images_num (Optional[List[int]]) – Number of images per prompt
all_references (Optional[List[str]]) – Reference texts
all_labels (Optional[List]) – Sample labels
all_videos (Optional[List]) – Optional videos for VLM
videos_num (Optional[List[int]]) – Number of videos per prompt
generate_kwargs (dict) – Generation parameters (temperature, max_new_tokens, etc.)
- Returns:
List of Samples or SamplesVL objects
- Return type:
- make_experience(samples: Samples) Experience¶
Turn samples into experience by calculating log probs, values, rewards, and KL divergence.
- Parameters:
samples (Samples) – Samples object containing sequences and metadata.
- Returns:
Experience object with all computed values.
- Return type:
- make_experience_list(all_prompts: List[str], all_images: List | None = None, all_videos: List | None = None, all_references: List[str] | None = None, all_labels: List | None = None, **generate_kwargs) List[ExperienceVL]¶
Generate a list of experiences from prompts and optional multimodal inputs.
This is the main entry point for experience generation. It orchestrates the entire pipeline from sampling to advantage computation.
- Parameters:
all_prompts (List[str]) – List of text prompts
all_images (Optional[List]) – Optional images for multimodal generation
all_references (Optional[List[str]]) – Optional reference texts for evaluation
all_labels (Optional[List]) – Optional labels for samples
all_videos (Optional[List]) – Optional videos for multimodal generation
generate_kwargs (dict) – Generation parameters (temperature, max_new_tokens, etc.)
- Returns:
List of Experience or ExperienceVL objects with computed advantages and returns
- Return type:
List[Union[Experience, ExperienceVL]]