# Frequently Asked Questions (FAQ)

Common questions and answers about LightRFT.

## General Questions

### Q: What is LightRFT?

**A**: LightRFT (Light Reinforcement Fine-Tuning) is an advanced reinforcement learning framework for fine-tuning Large Language Models (LLMs) and Vision-Language Models (VLMs). It provides efficient and scalable RLHF training with support for multiple algorithms and distributed training strategies.

### Q: What are the main differences between LightRFT and OpenRLHF?

**A**: LightRFT extends OpenRLHF with:
- Enhanced multimodal (VLM) support
- More RL algorithms (GRPO, GSPO, GMPO, REINFORCE++, CPGD, etc.)
- Better memory optimization (engine sleep, optimizer offload)
- Improved inference engines (vLLM, SGLang with FP8)
- Reward model co-location for efficiency
- More flexible distributed training strategies

### Q: Which models are supported?

**A**: LightRFT supports:
- **LLMs**: Qwen, Qwen2.5, LLaMA, Mistral, and most HuggingFace models
- **VLMs**: Qwen-VL, Qwen2-VL, LLaVA
- **Custom**: Easy to add new models via monkey patching

### Q: What hardware is required?

**A**: Minimum requirements:
- **GPU**: NVIDIA GPUs with CUDA 11.8+
- **Memory**: 40GB+ VRAM per GPU recommended (24GB possible with optimizations)
- **PyTorch**: 2.5.1+
- **Python**: 3.8+

For production: 8× A100/H100 80GB recommended

## Installation Questions

### Q: How do I install LightRFT?

**A**: Simple installation:
```bash
git clone https://github.com/opendilab/LightRFT.git
cd LightRFT
pip install -r requirements.txt && pip install -e .
```

### Q: Do I need to install vLLM separately?

**A**: No, vLLM is included in the requirements. However, for the latest features, you can install from source.

## Training Questions

### Q: What's the difference between FSDP and DeepSpeed?

**A**:
- **FSDP**: PyTorch-native, better integration, supports CPU offload
- **DeepSpeed**: More mature, ZeRO-3 optimization, generally faster

Use FSDP for maximum memory efficiency, DeepSpeed for speed.

### Q: How do I choose batch sizes?

**A**: Follow this constraint:
```
train_batch_size >= rollout_batch_size × n_samples_per_prompt
```

Example for 8 GPUs:
- `train_batch_size=256`
- `rollout_batch_size=64`
- `n_samples_per_prompt=8`
- `micro_train_batch_size=1`
- `micro_rollout_batch_size=2`

### Q: Which algorithm should I use?

**A**: By task:
- **Math/Coding**: GRPO, Dr.GRPO
- **Instruction Following**: CPGD, GSPO
- **Open-ended**: FIRE Sampling
- **Low Memory**: GRPO (no critic)
- **Research**: GMPO, REINFORCE++

### Q: How many samples per prompt should I use?

**A**: Typical values:
- **4-8**: Standard, good balance
- **16+**: Better quality, slower training
- **32+**: Best-of-N scenarios

More samples = better advantage estimation but slower.

### Q: Can I use multiple reward models?

**A**: Yes! LightRFT supports:
- Multiple reward models in parallel
- Reward model co-location (same GPU as training)
- Remote reward model servers
- Weighted reward combination

### Q: How do I enable multimodal (VLM) training?

**A**: Use the VLM training script:
```bash
python train_vl.py \
    --pretrain /path/to/Qwen2-VL \
    --mixed_mm_data \
    --packing_samples
```

## Performance Questions

### Q: How do I reduce memory usage?

**A**: Use these techniques:
1. Enable gradient checkpointing: `--gradient_checkpointing`
2. Use FSDP with CPU offload: `--fsdp --fsdp_cpu_offload`
3. Lower engine memory: `--engine_mem_util 0.4`
4. Use ZeRO-3: `--zero_stage 3`
5. Reduce batch sizes
6. Enable engine sleep: `--enable_engine_sleep`

### Q: How do I speed up training?

**A**:
1. Increase batch sizes (if memory allows)
2. Use FP8 inference (vLLM)
3. Enable Flash Attention: `--flash_attn`
4. Reduce `n_samples_per_prompt` if possible
5. Use tensor parallelism for inference: `--engine_tp_size 2`
6. Optimize NCCL: `export TORCH_NCCL_AVOID_RECORD_STREAMS=1`

### Q: What's the typical training speed?

**A**: On 8× A100 80GB:
- **7B model**: ~1000 samples/min
- **13B model**: ~500 samples/min
- **34B model**: ~200 samples/min
- **70B model**: ~50 samples/min

With FSDP and optimizations.

### Q: How do I use multiple nodes?

**A**: Use SLURM or Ray:
```bash
# SLURM example
srun -N2 --gres=gpu:8 --ntasks-per-node=8 bash train.sh

# Or use torchrun
torchrun --nproc_per_node=8 \
    --nnodes=2 \
    --node_rank=$NODE_RANK \
    --master_addr=$MASTER_ADDR \
    --master_port=$MASTER_PORT \
    train.py
```

## Algorithm Questions

### Q: What's the difference between GRPO and PPO?

**A**:
- **GRPO**: Group-normalized advantages, no critic network
- **PPO**: Uses separate value network (critic)

GRPO is simpler and more memory-efficient.

### Q: When should I use CPGD?

**A**: Use CPGD when:
- Fine-tuning pre-trained models
- Want to preserve base capabilities
- Need controlled policy updates
- Preventing catastrophic forgetting

### Q: What is Clip Higher?

**A**: An improved clipping scheme with separate upper/lower bounds for positive/negative advantages. Better for:
- Noisy rewards
- Large distribution shifts
- Unstable training

## Debugging Questions

### Q: Training crashes with OOM error

**A**: See the [Troubleshooting Guide](troubleshooting.md#out-of-memory-oom-errors)

### Q: `num_rollouts_per_episodes = 0` error

**A**: Your `train_batch_size` is too small. Ensure:
```
train_batch_size >= rollout_batch_size × n_samples_per_prompt
```

### Q: Model not improving / Reward not increasing

**A**: Check:
1. Learning rate too high/low
2. KL penalty too large
3. Reward model quality
4. Enable reward normalization: `--reward_running_norm`
5. Try different advantage estimator

### Q: NCCL timeout or hanging

**A**:
```bash
export NCCL_DEBUG=INFO
export TORCH_DISTRIBUTED_DEBUG=DETAIL
# Increase timeout
export NCCL_SOCKET_IFNAME=eth0
export NCCL_IB_DISABLE=1
```

### Q: vLLM engine initialization fails

**A**:
1. Check GPU memory: `--engine_mem_util 0.5`
2. Reduce TP size: `--engine_tp_size 1`
3. Check CUDA compatibility
4. Update vLLM: `pip install -U vllm`

## Evaluation Questions

### Q: How do I evaluate on benchmarks?

**A**: For math benchmarks, use the evaluation scripts in the examples directory:
```bash
# Refer to the examples/gsm8k_geo3k directory for evaluation scripts
# See the example training scripts for evaluation configurations
```

### Q: Can I save generation trajectories?

**A**: Yes, use the trajectory saver:
```python
from lightrft.utils import TrajectorySaver

saver = TrajectorySaver(output_dir="./trajectories")
# Automatically saves prompts, responses, rewards
```

### Q: How do I integrate with W&B?

**A**:
```bash
python train.py \
    --use_wandb your-project \
    --wandb_org your-org \
    --wandb_run_name experiment-1
```

## Advanced Questions

### Q: Can I implement custom algorithms?

**A**: Yes! Extend the trainer class:
```python
from lightrft.trainer import SPMDPPOTrainer

class CustomTrainer(SPMDPPOTrainer):
    def compute_advantages(self, ...):
        # Your custom advantage computation
        pass
```

### Q: How do I add a new model architecture?

**A**: Create a monkey patch in `lightrft/models/monkey_patch/`:
```python
# your_model.py
def patch_your_model(model):
    # Add custom forward methods
    pass

# In apply.py
from .your_model import patch_your_model
```

### Q: Can I use custom reward functions?

**A**: Yes, pass a callable:
```python
def custom_reward_fn(responses, labels):
    # Your reward computation
    return rewards

trainer = SPMDPPOTrainer(
    ...,
    reward_fn=custom_reward_fn
)
```

### Q: How do I checkpoint during training?

**A**: Checkpoints are automatic:
```bash
--save_path ./checkpoints \
--save_interval 1 \
--max_ckpt_num 3
```

Resume with:
```bash
--load_checkpoint \
--ckpt_path ./checkpoints/episode_5
```

## Contributing Questions

### Q: How can I contribute to LightRFT?

**A**:
1. Fork the repository
2. Create a feature branch
3. Implement your changes
4. Add tests
5. Submit a pull request

See [Contributing Guide](contributing.md) for details.

### Q: How do I report bugs?

**A**: Open an issue on [GitHub Issues](https://github.com/opendilab/LightRFT/issues) with:
- Environment details (GPU, CUDA, PyTorch versions)
- Full error traceback
- Minimal reproduction script
- Expected vs actual behavior

### Q: Where can I get help?

**A**:
- GitHub Issues for bugs
- Discussions for questions
- Documentation for guides
- Examples directory for code samples

## Additional Resources

- [Installation Guide](../installation/index.rst)
- [Quick Start](../quick_start/index.rst)
- [Algorithm Guide](algorithms.md)
- [Configuration Reference](configuration.md)
- [Troubleshooting Guide](troubleshooting.md)
- [Best Practices](../best_practice/index.rst)