Shortcuts

Model Testing Guide

Guide for testing and evaluating trained models, particularly Vision-Language Models (VLMs).

Overview

LightRFT provides tools for interactive model testing with support for:

  • Interactive Chat: Real-time conversation testing

  • Multimodal Support: Text and image inputs

  • Batch Testing: Automated testing with JSON files

  • Performance Optimization: Flash Attention 2 and bfloat16

  • Command-line Interface: Convenient testing commands

Quick Start

Basic Text Conversation

python test_chat.py --model_path <checkpoint-path>

Image-based Testing

# Start interactive mode
python test_chat.py --model_path <checkpoint-path>

In interactive mode:

[You] /image <image-path>
✓ Image loaded: <image-path>
[You] What do you see in this image?
[Assistant] ...

Batch Testing

python test_chat.py \
  --model_path <checkpoint-path> \
  --batch <test-file.json> \
  --output <results.json>

Custom Generation Parameters

python test_chat.py \
  --model_path <checkpoint-path> \
  --max_tokens 4096 \
  --temperature 0.5 \
  --top_p 0.9

Interactive Commands

Available commands in interactive mode:

Command

Description

/image <path>

Load image for next query

/clear

Clear conversation history

/reset

Reset loaded images

/help

Show help information

/quit or /exit

Exit program

Batch Test File Format

Text-only Tests

[
  {
    "query": "What is 2 + 2?",
    "expected": "4"
  },
  {
    "query": "Explain the Pythagorean theorem."
  }
]

Image-based Tests

[
  {
    "query": "Describe what you see in this image.",
    "images": ["<image-path-1>"],
    "expected": "Description of the image"
  },
  {
    "query": "Compare these two images.",
    "images": ["<image-path-1>", "<image-path-2>"]
  }
]

Configuration Parameters

Parameter

Default

Description

--model_path

-

Model checkpoint path (required)

--device

cuda

Inference device (cuda/cpu)

--max_tokens

8192

Maximum generation tokens

--temperature

0.7

Sampling temperature (0 for greedy)

--top_p

0.95

Top-p sampling parameter

--system_prompt

(default)

Custom system prompt

--batch

None

Batch test JSON file path

--output

None

Batch test results output file

Usage Examples

Example 1: Math Problem Solving

python test_chat.py --model_path <checkpoint-path>

Example interaction:

[You] If a triangle has sides 3, 4, and 5, what is its area?

[Assistant] <think>
This is a right triangle since 3² + 4² = 9 + 16 = 25 = 5².
For a right triangle, the area is (1/2) × base × height.
Using the two perpendicular sides: Area = (1/2) × 3 × 4 = 6
</think>

The area of the triangle is 6 square units.

Example 2: Geometry Recognition

python test_chat.py --model_path <checkpoint-path>

Example interaction:

[You] /image <geometry-image-path>
✓ Image loaded: <geometry-image-path>

[You] Solve the geometry problem shown in this image.

[Assistant] <think>
Looking at the diagram, I can see a triangle ABC with...
[detailed reasoning process]
</think>

The answer is [solution].

Example 3: Batch Performance Testing

Create test file test_questions.json:

[
  {
    "query": "Find the area of triangle with base 6 and height 8.",
    "expected": "24"
  },
  {
    "query": "What is the perimeter of a square with side length 5?",
    "expected": "20"
  }
]

Run batch test:

python test_chat.py \
  --model_path <checkpoint-path> \
  --batch test_questions.json \
  --output test_results.json \
  --temperature 0.0

Performance Optimizations

The testing script includes:

  1. Flash Attention 2: Accelerated attention computation

  2. BFloat16: Reduced memory usage and faster inference

  3. Batch Processing: Improved throughput for batch tests

  4. Memory Management: Automatic GPU memory cleanup

Troubleshooting

Out of Memory (OOM)

If encountering memory issues:

1. Reduce max_tokens:

python test_chat.py --model_path <checkpoint-path> --max_tokens 4096

2. Use CPU inference (slower):

python test_chat.py --model_path <checkpoint-path> --device cpu

Image Loading Failure

Ensure image path is correct and format is supported (JPG, PNG, etc.):

ls -lh <image-path>

Generation Quality Issues

Adjust sampling parameters:

  • More deterministic: --temperature 0.0 (greedy decoding)

  • More diverse: --temperature 1.0 --top_p 0.9

  • Balanced: --temperature 0.7 --top_p 0.95 (default)

Dependencies

Required packages:

pip install torch transformers pillow flash-attn

Best Practices

  1. Model Loading: First run requires model loading time

  2. Image Reset: Images auto-reset after each conversation

  3. History Management: Use /clear to reset conversation history

  4. Batch Independence: Each batch test runs independently

Additional Resources