# Model Testing Guide Guide for testing and evaluating trained models, particularly Vision-Language Models (VLMs). ## Overview LightRFT provides tools for interactive model testing with support for: - **Interactive Chat**: Real-time conversation testing - **Multimodal Support**: Text and image inputs - **Batch Testing**: Automated testing with JSON files - **Performance Optimization**: Flash Attention 2 and bfloat16 - **Command-line Interface**: Convenient testing commands ## Quick Start ### Basic Text Conversation ```bash python test_chat.py --model_path ``` ### Image-based Testing ```bash # Start interactive mode python test_chat.py --model_path ``` In interactive mode: ``` [You] /image ✓ Image loaded: [You] What do you see in this image? [Assistant] ... ``` ### Batch Testing ```bash python test_chat.py \ --model_path \ --batch \ --output ``` ### Custom Generation Parameters ```bash python test_chat.py \ --model_path \ --max_tokens 4096 \ --temperature 0.5 \ --top_p 0.9 ``` ## Interactive Commands Available commands in interactive mode: | Command | Description | |---------|-------------| | `/image ` | Load image for next query | | `/clear` | Clear conversation history | | `/reset` | Reset loaded images | | `/help` | Show help information | | `/quit` or `/exit` | Exit program | ## Batch Test File Format ### Text-only Tests ```json [ { "query": "What is 2 + 2?", "expected": "4" }, { "query": "Explain the Pythagorean theorem." } ] ``` ### Image-based Tests ```json [ { "query": "Describe what you see in this image.", "images": [""], "expected": "Description of the image" }, { "query": "Compare these two images.", "images": ["", ""] } ] ``` ## Configuration Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `--model_path` | - | Model checkpoint path (required) | | `--device` | `cuda` | Inference device (cuda/cpu) | | `--max_tokens` | `8192` | Maximum generation tokens | | `--temperature` | `0.7` | Sampling temperature (0 for greedy) | | `--top_p` | `0.95` | Top-p sampling parameter | | `--system_prompt` | (default) | Custom system prompt | | `--batch` | `None` | Batch test JSON file path | | `--output` | `None` | Batch test results output file | ## Usage Examples ### Example 1: Math Problem Solving ```bash python test_chat.py --model_path ``` Example interaction: ``` [You] If a triangle has sides 3, 4, and 5, what is its area? [Assistant] This is a right triangle since 3² + 4² = 9 + 16 = 25 = 5². For a right triangle, the area is (1/2) × base × height. Using the two perpendicular sides: Area = (1/2) × 3 × 4 = 6 The area of the triangle is 6 square units. ``` ### Example 2: Geometry Recognition ```bash python test_chat.py --model_path ``` Example interaction: ``` [You] /image ✓ Image loaded: [You] Solve the geometry problem shown in this image. [Assistant] Looking at the diagram, I can see a triangle ABC with... [detailed reasoning process] The answer is [solution]. ``` ### Example 3: Batch Performance Testing Create test file `test_questions.json`: ```json [ { "query": "Find the area of triangle with base 6 and height 8.", "expected": "24" }, { "query": "What is the perimeter of a square with side length 5?", "expected": "20" } ] ``` Run batch test: ```bash python test_chat.py \ --model_path \ --batch test_questions.json \ --output test_results.json \ --temperature 0.0 ``` ## Performance Optimizations The testing script includes: 1. **Flash Attention 2**: Accelerated attention computation 2. **BFloat16**: Reduced memory usage and faster inference 3. **Batch Processing**: Improved throughput for batch tests 4. **Memory Management**: Automatic GPU memory cleanup ## Troubleshooting ### Out of Memory (OOM) If encountering memory issues: **1. Reduce max_tokens:** ```bash python test_chat.py --model_path --max_tokens 4096 ``` **2. Use CPU inference (slower):** ```bash python test_chat.py --model_path --device cpu ``` ### Image Loading Failure Ensure image path is correct and format is supported (JPG, PNG, etc.): ```bash ls -lh ``` ### Generation Quality Issues Adjust sampling parameters: - **More deterministic**: `--temperature 0.0` (greedy decoding) - **More diverse**: `--temperature 1.0 --top_p 0.9` - **Balanced**: `--temperature 0.7 --top_p 0.95` (default) ## Dependencies Required packages: ```bash pip install torch transformers pillow flash-attn ``` ## Best Practices 1. **Model Loading**: First run requires model loading time 2. **Image Reset**: Images auto-reset after each conversation 3. **History Management**: Use `/clear` to reset conversation history 4. **Batch Independence**: Each batch test runs independently ## Additional Resources - [Quick Start Guide](../quick_start/index.rst) - [Configuration Reference](../quick_start/configuration.md) - [FAQ](faq.md) - [Troubleshooting Guide](troubleshooting.md)