Shortcuts

lightrft.strategy.vllm_utils

This module provides utilities for initializing and configuring a vLLM engine.

The module simplifies the process of creating a vLLM engine with specific configurations for large language model inference, particularly in reinforcement learning from human feedback (RLHF) contexts. It offers both high-level and low-level functions for engine creation, with support for tensor parallelism, memory optimization, and multimodal capabilities.

get_vllm_engine

lightrft.strategy.vllm_utils.get_vllm_engine(pretrain_name_or_path: str, dtype: str = 'bfloat16', tp_size: int = 1, mem_util: float = 0.5, max_model_len: int = 4096, enable_sleep: bool = True, **kwargs: Any) vllm.LLM[source]

Create and configure a vLLM engine with specified parameters.

This is the core function for initializing a vLLM engine with custom configurations. It sets up the engine with distributed execution capabilities, memory optimization, and custom worker classes for RLHF training scenarios.

Parameters:
  • pretrain_name_or_path (str) – Path or name of the pretrained model to load.

  • dtype (str) – Data type for model weights, either “bfloat16” or “float16”. Defaults to “bfloat16”.

  • tp_size (int) – Tensor parallel size for distributed inference. Defaults to 1.

  • mem_util (float) – GPU memory utilization ratio (0.0 to 1.0). Defaults to 0.5.

  • max_model_len (int) – Maximum sequence length the model can handle. Defaults to 4096.

  • enable_sleep (bool) – Whether to enable sleep mode for memory efficiency. Defaults to True.

  • kwargs (Any) – Additional keyword arguments passed to the LLM constructor.

Returns:

Configured vLLM engine instance.

Return type:

vllm.LLM

Example:

>>> engine = get_vllm_engine(
...     "Qwen/Qwen2.5-14B-Instruct",
...     dtype="bfloat16",
...     tp_size=2,
...     mem_util=0.8,
...     max_model_len=2048,
...     enable_sleep=True
... )
Note:

Uses external launcher for distributed execution and custom worker class for integration with lightrft strategy components.