Shortcuts

lightrft.trainer.fast_exp_maker

FastExperienceMaker Module

This module provides an optimized experience maker for RLHF (Reinforcement Learning from Human Feedback) that supports high-performance inference backends like VLLM and SGLang. It extends the base NaiveExperienceMaker with enhanced features for multimodal data processing, reward computation, and advantage estimation.

Key Features:
  • VLLM/SGLang backend support for efficient text generation

  • Multimodal (vision-language) data processing

  • Multiple advantage estimation methods (GAE, RLOO, REINFORCE, Group Norm)

  • Flexible reward model composition with custom reward functions

  • Sample packing support for improved training efficiency

  • Running reward normalization and advantage whitening

Classes:

MultimodalDataProcessor: Handles preprocessing of mixed text/image data RewardComputationEngine: Manages reward model inference and aggregation FastExperienceMaker: Main experience generation class

FastExperienceMaker

class lightrft.trainer.fast_exp_maker.FastExperienceMaker(*args, packing_samples: bool = False, processor=None, **kwargs)[source]

Optimized experience maker with VLLM/SGLang support and advanced RL features.

This class extends NaiveExperienceMaker to provide:
  • High-performance inference via VLLM or SGLang backends

  • Multimodal (vision-language) data processing

  • Multiple advantage estimation algorithms (GAE, RLOO, REINFORCE, Group Norm)

  • Flexible reward model composition with custom aggregation

  • Sample packing for improved training efficiency

  • Running reward normalization and advantage whitening/clipping

The experience generation pipeline:
  1. Sample Generation: Use inference engine to generate responses

  2. Shard-Parallel Preprocessing: Distribute samples across shards

  3. Model Inference: Batch forward through actor, critic, initial, and reward models

  4. Shard-Parallel Postprocessing: Gather results back

  5. Reward Processing: Apply transformations (normalization, shaping, filtering)

  6. Advantage Estimation: Compute advantages and returns

Args:

packing_samples: Whether to pack multiple sequences into single batch processor: Multimodal processor for vision-language models *args, **kwargs: Arguments passed to parent NaiveExperienceMaker

__init__(*args, packing_samples: bool = False, processor=None, **kwargs)[source]

Initialize FastExperienceMaker.

Parameters:
  • args (tuple) – Positional arguments for NaiveExperienceMaker

  • packing_samples (bool) – Enable sample packing for efficiency

  • processor (Optional[Any]) – Multimodal processor (required for VLM models)

  • kwargs (dict) – Keyword arguments for NaiveExperienceMaker

generate_samples(all_prompts: List[str], all_images: List | None = None, all_videos: List | None = None, images_num: List[int] | None = None, videos_num: List[int] | None = None, all_references: List[str] | None = None, all_labels: List | None = None, **generate_kwargs) List[Samples]

Generate samples using the inference engine (VLLM or SGLang).

This method handles:
  • Sampling parameter configuration

  • Multimodal data processing

  • Inference engine invocation

  • Output processing into Samples format

Parameters:
  • all_prompts (List[str]) – List of text prompts

  • all_images (Optional[List]) – Optional images for VLM

  • images_num (Optional[List[int]]) – Number of images per prompt

  • all_references (Optional[List[str]]) – Reference texts

  • all_labels (Optional[List]) – Sample labels

  • all_videos (Optional[List]) – Optional videos for VLM

  • videos_num (Optional[List[int]]) – Number of videos per prompt

  • generate_kwargs (dict) – Generation parameters (temperature, max_new_tokens, etc.)

Returns:

List of Samples or SamplesVL objects

Return type:

List[Union[Samples, SamplesVL]]

make_experience(samples: Samples) Experience

Turn samples into experience by calculating log probs, values, rewards, and KL divergence.

Parameters:

samples (Samples) – Samples object containing sequences and metadata.

Returns:

Experience object with all computed values.

Return type:

Experience

make_experience_list(all_prompts: List[str], all_images: List | None = None, all_videos: List | None = None, all_references: List[str] | None = None, all_labels: List | None = None, **generate_kwargs) List[ExperienceVL]

Generate a list of experiences from prompts and optional multimodal inputs.

This is the main entry point for experience generation. It orchestrates the entire pipeline from sampling to advantage computation.

Parameters:
  • all_prompts (List[str]) – List of text prompts

  • all_images (Optional[List]) – Optional images for multimodal generation

  • all_references (Optional[List[str]]) – Optional reference texts for evaluation

  • all_labels (Optional[List]) – Optional labels for samples

  • all_videos (Optional[List]) – Optional videos for multimodal generation

  • generate_kwargs (dict) – Generation parameters (temperature, max_new_tokens, etc.)

Returns:

List of Experience or ExperienceVL objects with computed advantages and returns

Return type:

List[Union[Experience, ExperienceVL]]