lightrft.utils.remote_rm_utils¶

lightrft.utils.remote_rm_utils.remote_rm_fn(api_url: str, queries: List[str], prompts: List[str], labels: List[Any] | None = None, references: List[str] | None = None, raw_images: List[Image | List[Image] | None] | None = None, score_key: str = 'rewards') → torch.Tensor[source]¶

Remote reward model API function for scoring text and image inputs.

This function prepares data and sends requests to a remote reward model API, supporting both text-only and multimodal (text + image) scoring scenarios.

Parameters:

api_url (str) – Reward model API endpoint URL
queries (List[str]) – List of query strings with response templates
prompts (List[str]) – List of prompt strings for context
labels (Optional[List[Any]]) – Optional list of labels for supervised scoring (currently unused)
references (Optional[List[str]]) – Optional list of reference responses for comparison scoring
raw_images (Optional[List[Optional[Union[Image.Image, List[Image.Image]]]]]) – Optional list of PIL Image objects or lists of PIL Image objects for multimodal scoring. Each element can be None, a single image, or a list of images.
score_key (str) – Key in the API response that contains the reward scores

Returns:

Tensor of reward scores for all input samples

Return type:

torch.Tensor

Raises:

Exception – When API requests fail after maximum retry attempts

Example:

# Text-only scoring
scores = remote_rm_fn(
    api_url="http://localhost:8000/score",
    queries=["What is 2+2?"],
    prompts=["Calculate the following:"],
    score_key="rewards"
)

# Multimodal scoring with images
scores = remote_rm_fn(
    api_url="http://localhost:8000/score",
    queries=["Describe this image"],
    prompts=["Please analyze the image:"],
    raw_images=[Image.open("image.jpg")],
    score_key="rewards"
)

lightrft.utils.remote_rm_utils.request_api_wrapper(url: str, data: Dict[str, Any], score_key: str = 'rewards', try_max_times: int = 5) → float | List[float][source]¶

Synchronous request API wrapper for reward model scoring.

This function makes HTTP POST requests to a reward model API endpoint and handles retries with exponential backoff for failed requests.

Parameters:

url (str) – The API endpoint URL to send requests to
data (Dict[str, Any]) – The request payload data as a dictionary
score_key (str) – The key in the response JSON that contains the reward scores
try_max_times (int) – Maximum number of retry attempts for failed requests

Returns:

Reward scores extracted from the API response, either as a single float or a list of floats depending on the API response structure

Return type:

Union[float, List[float]]

Raises:

Exception – When all retry attempts fail after the maximum number of tries

Example:

score = request_api_wrapper(
    url="http://localhost:8000/score",
    data={"text": "Hello world"},
    score_key="rewards",
    try_max_times=5
)