Welcome to LightRFT’s Documentation!¶
LightRFT (Light Reinforcement Fine-Tuning) is a light and efficient reinforcement learning fine-tuning framework designed for Large Language Models (LLMs), Vision-Language Models (VLMs) and other modalities and tasks. This framework provides efficient and scalable RLHF (Reinforcement Learning from Human Feedback), RLVR (Reinforcement Learning with Verifiable Rewards), and Reward Model training and evaluation capabilities, supporting multiple state-of-the-art algorithms and distributed training (FSDP, DeepSpeed, etc.) strategies.
Key Features¶
- 🚀 High-Performance Inference Engines
Integrated vLLM and SGLang for efficient sampling and inference
FP8 inference optimization for reduced latency and memory usage
Flexible engine sleep/wake mechanisms for optimal resource utilization
- 🧠 Rich Algorithm Ecosystem
Policy Optimization: GRPO, GSPO, GMPO, Dr.GRPO
Advantage Estimation: REINFORCE++, CPGD
Reward Processing: Reward Norm/Clip
Sampling Strategy: FIRE Sampling, Token-Level Policy
Stability Enhancement: Clip Higher, select_high_entropy_tokens
- 🔧 Flexible Training Strategies
FSDP (Fully Sharded Data Parallel) support
DeepSpeed ZeRO (Stage 1/2/3) support
Gradient checkpointing and mixed precision training (BF16/FP16)
Adam Offload and memory optimization techniques
- 🌐 Comprehensive Multimodal Support
Native Vision-Language Model (VLM) training
Support for Qwen-VL, LLaVA, and other mainstream VLMs
Multimodal reward modeling with multiple reward models
- 📊 Complete Experimental Toolkit
Weights & Biases (W&B) integration
Math capability benchmarking (GSM8K, Geo3K, etc.)
Trajectory saving and analysis tools
Automatic checkpoint management
Documentation Contents¶
Getting Started
User Guide & Best Practices
API Documentation
Quick Links¶
Installation - Installation guide
Quick Start - Quick start tutorial
Supported Algorithms - Supported algorithms
Strategy Usage Guide - Strategy usage guide
Configuration Parameters - Configuration parameters
Frequently Asked Questions (FAQ) - Frequently asked questions
Troubleshooting Guide - Troubleshooting guide