Installation¶
This guide provides instructions for installing and setting up LightRFT, a lightweight and high-performance reinforcement learning fine-tuning framework designed for Large Language Models (LLMs) and Vision-Language Models (VLMs).
Requirements¶
Before installing LightRFT, ensure your environment meets the following requirements:
Python >= 3.10
CUDA >= 11.8
PyTorch >= 2.5.1
CUDA-compatible GPU(s)
Docker Images¶
TO BE DONE
Installation¶
Standard Installation¶
Clone and install LightRFT:
# Clone the repository
git clone https://github.com/opendilab/LightRFT.git
cd LightRFT
# Install dependencies
pip install -r requirements.txt
# Install LightRFT
pip install -e .
Documentation Generation (Optional)¶
To install dependencies for generating documentation:
pip install -r requirements-doc.txt
To generate HTML documentation:
make docs
The documentation will be generated in the docs/build directory. Open index.html to view it.
For live browser documentation with auto-reload:
make live
Project Structure¶
LightRFT is organized into several key modules:
LightRFT/
├── lightrft/ # Core library
│ ├── strategy/ # Training & inference strategies
│ │ ├── fsdp/ # FSDP implementation
│ │ ├── deepspeed/ # DeepSpeed implementation
│ │ ├── vllm_utils/ # vLLM utilities
│ │ ├── sglang_utils/ # SGLang utilities
│ │ └── utils/ # Strategy utilities
│ ├── models/ # Model definitions
│ │ ├── actor_al.py # Audio-language model actor
│ │ ├── actor_language.py # Language model actor
│ │ ├── actor_vl.py # Vision-language model actor
│ │ ├── grm_vl.py # Generative reward model (Vision-Language)
│ │ ├── srm_al.py # Scalar reward model (Audio-Language)
│ │ ├── srm_vl.py # Scalar reward model (Vision-Language)
│ │ ├── loss.py # Loss functions
│ │ ├── monkey_patch/ # Model adaptation patches for distributed training
│ │ ├── tests/ # Model tests
│ │ └── utils.py # Model utilities
│ ├── trainer/ # Trainer implementations
│ │ ├── ppo_trainer.py # LLM PPO trainer
│ │ ├── ppo_trainer_vl.py # VLM PPO trainer
│ │ ├── spmd_ppo_trainer.py # SPMD PPO trainer Extension (**Core**)
│ │ ├── grm_trainer_vl.py # Generative reward model trainer (Vision-Language)
│ │ ├── srm_trainer_al.py # Scalar reward model trainer (Audio-Language)
│ │ ├── srm_trainer_vl.py # Scalar reward model trainer (Vision-Language)
│ │ ├── fast_exp_maker.py # Fast experience generator (**Core**)
│ │ ├── experience_maker.py # Base experience generator
│ │ ├── experience_maker_vl.py # Base experience generator for VLM
│ │ ├── replay_buffer.py # Replay buffer
│ │ ├── replay_buffer_vl.py # VLM replay buffer
│ │ ├── replay_buffer_utils.py # Replay buffer utilities
│ │ ├── kl_controller.py # KL divergence controller
│ │ └── utils.py # Trainer utilities
│ ├── datasets/ # Dataset processing
│ │ ├── audio_alpaca.py # Audio Alpaca dataset
│ │ ├── grm_dataset.py # Generative reward model dataset
│ │ ├── hpdv3.py # HPDv3 reward model dataset
│ │ ├── image_reward_db.py # Image reward database
│ │ ├── imagegen_cot_reward.py # Image generation CoT generative reward
│ │ ├── omnirewardbench.py # OmniRewardBench dataset
│ │ ├── process_reward_dataset.py # Reward dataset processing
│ │ ├── prompts_dataset.py # LLM Prompts dataset
│ │ ├── prompts_dataset_vl.py # Vision-language prompts dataset
│ │ ├── rapidata.py # Rapidata reward modeldataset
│ │ ├── sft_dataset.py # SFT dataset
│ │ ├── sft_dataset_vl.py # VLM SFT dataset
│ │ ├── srm_dataset.py # Scalar reward model base dataset
│ │ └── utils.py # Dataset utilities
│ └── utils/ # Utility functions
│ ├── ckpt_scripts/ # Checkpoint processing scripts
│ ├── cli_args.py # CLI argument parsing
│ ├── distributed_sampler.py # Distributed sampler
│ ├── logging_utils.py # Logging utilities
│ ├── processor.py # Data processor for HF model
│ ├── remote_rm_utils.py # Remote reward model utilities
│ ├── timer.py # Timer utilities
│ ├── trajectory_saver.py # Trajectory saver
│ └── utils.py # General utilities
│
├── examples/ # Usage examples
│ ├── gsm8k_geo3k/ # GSM8K/Geo3K math reasoning training examples
│ ├── grm_training/ # Generative reward model training examples
│ ├── srm_training/ # Scalar reward model training examples
│ ├── chat/ # Model dialogue examples
│
├── docs/ # 📚 Sphinx documentation
│ ├── Makefile # Documentation build Makefile
│ ├── make.bat # Documentation build batch file
│ └── source/ # Documentation source
│ ├── _static/ # Static files (CSS, etc.)
│ ├── api_doc/ # API documentation
│ ├── best_practice/ # Best practices & resources
│ ├── installation/ # Installation guides
│ └── quick_start/ # Quick start & user guides
│
├── assets/ # Assets
│ └── logo.png # Project logo
│
├── CHANGELOG.md # Changelog
├── LICENSE # License file
├── Makefile # Project Makefile
├── README.md # Project documentation (English)
├── README_zh.md # Project documentation (Chinese)
├── requirements.txt # Python dependencies
├── requirements-dev.txt # Development dependencies
├── requirements-doc.txt # Documentation dependencies
└── setup.py # Package setup script
Key Directory Descriptions¶
lightrft/: LightRFT core library with five main modules:
datasets/: Dataset implementations for prompts, SFT, reward modeling (text, vision-language, audio-language)models/: Actor models (language, vision-language, audio-language), reward models, and loss functionsstrategy/: Training strategies including FSDP, DeepSpeed, vLLM/SGLang integrationtrainer/: Trainer implementations for PPO, experience generation, and replay buffersutils/: Utility functions for CLI, logging, distributed training, and trajectory saving
examples/: Complete training examples and scripts
gsm8k_geo3k/: GSM8K and Geo3K math reasoning training examplesgrm_training/: Generative reward model training examplessrm_training/: Scalar reward model training exampleschat/: Model dialogue examples
docs/: Sphinx documentation with complete user guides and API documentation
Verification¶
To verify your installation, run a simple test:
python -c "import lightrft; print(lightrft)"
You should see the module path without any import errors.
Quick Start Example¶
After installation, try a basic GRPO training example:
# Single node, 8 GPU training example
cd /path/to/LightRFT
# Run GRPO training (GSM8K math reasoning task)
bash examples/gsm8k_geo3k/run_grpo_gsm8k_qwen2.5_0.5b.sh
# Or run Geo3K geometry problem training (VLM multimodal)
bash examples/gsm8k_geo3k/run_grpo_geo3k_qwen2.5_vl_7b.sh
Troubleshooting¶
Common Issues¶
Issue: CUDA errors or version mismatch
Solution: Ensure CUDA drivers and toolkit version match your PyTorch installation. Check with
nvcc --versionandpython -c "import torch; print(torch.version.cuda)"
Issue: Out of memory errors during training
Solution:
Reduce
micro_train_batch_sizeormicro_rollout_batch_sizeEnable gradient checkpointing:
--gradient_checkpointingUse FSDP with CPU offload:
--fsdp --fsdp_cpu_offloadAdjust engine memory utilization:
--engine_mem_util 0.4
Issue: Slow installation of evaluation dependencies
Solution: Use a mirror or proxy for pip:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple <package>
For Additional Support¶
If you encounter issues not covered here:
Check the project’s GitHub Issues
Review the ../best_practice/strategy_usage guide for training configuration
Consult the example scripts in the examples directory
Next Steps¶
After successful installation:
Review the Quick Start guide to understand basic usage
Explore ../best_practice/strategy_usage for distributed training strategies
Check out the examples directory for complete training examples
Read the algorithm documentation for specific implementation details