Welcome to LightRFT’s Documentation!¶

LightRFT (Light Reinforcement Fine-Tuning) is a lightweight, efficient, and versatile reinforcement learning fine-tuning framework designed for the fine-tuning tasks of Large Language Models (LLMs) and Vision-Language Models (VLMs). Its core advantages include:

Comprehensive RLVR + RLHF and Multi-modal Training Support: Native support for RLVR and RLHF training, covering various modalities such as text, image, video, and audio, and supporting the full lifecycle from base models to reward models and reward rules.
Unified Strategy Abstraction Layer: A highly abstract Strategy layer that flexibly controls training (DeepSpeed/FSDPv2) and high-performance inference (vLLM/SGLang) strategies.
Easy-to-use and Efficient Multi-model Co-location Paradigm: Supports flexible multi-model co-location training, enabling scalable algorithm exploration and comparison in large-scale scenarios.

Key Features¶

🚀 High-Performance Inference Engines

Integrated vLLM and SGLang for efficient sampling and inference
FP8 inference optimization for reduced latency and memory usage (Work in Progress)
Flexible engine sleep/wake mechanisms for optimal resource utilization

🧠 Rich Algorithm Ecosystem

Policy Optimization: GRPO, GSPO, GMPO, Dr.GRPO
Advantage Estimation: REINFORCE++, CPGD
Reward Processing: Reward Norm/Clip
Sampling Strategy: FIRE Sampling, Token-Level Policy
Stability Enhancement: Clip Higher, select_high_entropy_tokens

🔧 Flexible Training Strategies

FSDP (Fully Sharded Data Parallel) support
DeepSpeed ZeRO (Stage 1/2/3) support
Gradient checkpointing and mixed precision training (BF16/FP16)
Adam Offload and memory optimization techniques

🌐 Comprehensive Multimodal Support

Native Vision-Language Model (VLM) training
Support for Qwen-VL, LLaVA, and other mainstream VLMs
Multimodal reward modeling with multiple reward models

📊 Complete Experimental Toolkit

Weights & Biases (W&B) integration
Math capability benchmarking (GSM8K, Geo3K, etc.)
Trajectory saving and analysis tools
Automatic checkpoint management

Documentation Contents¶

Getting Started

User Guide & Best Practices

Best Practices

API Documentation

Quick Links¶

Installation - Installation guide
Quick Start - Quick start tutorial
Supported Algorithms - Supported algorithms
Strategy Usage Guide - Strategy usage guide
Configuration Parameters - Configuration parameters
Frequently Asked Questions (FAQ) - Frequently asked questions
Troubleshooting Guide - Troubleshooting guide

Welcome to LightRFT’s Documentation!¶

Key Features¶

Documentation Contents¶

Quick Links¶

Indices and Tables¶