lightrft.trainer.replay_buffer_utils¶

Utility functions for replay buffer operations in reinforcement learning.

This module provides specialized functions for handling both language model experiences and vision-language model experiences. It includes utilities for batch splitting, sequence padding, and experience creation optimized for distributed training.

Key features: - Automatic detection of experience types - Efficient batch splitting and creation - Sequence padding and padding removal - Support for both packed and unpacked samples

class lightrft.trainer.replay_buffer_utils.BufferItem(sequences: torch.Tensor, action_log_probs: torch.Tensor, base_action_log_probs: torch.Tensor, values: torch.Tensor, returns: torch.Tensor, advantages: torch.Tensor, attention_mask: torch.LongTensor | None, action_mask: torch.BoolTensor | None, info: dict | None, action_entropy: torch.Tensor | None = None)[source]¶

Bases: object

BufferItem is an item of experience data.

Shapes of each tensor: sequences: (S) action_log_probs: (A) base_action_log_probs: (A) values: (1) returns: (1) advantages: (1) attention_mask: (S) action_mask: (A) action_entropy: (A) - Entropy values for high-entropy token filtering

“A” is the number of actions.

action_entropy: torch.Tensor | None = None¶

action_log_probs: torch.Tensor¶

action_mask: torch.BoolTensor | None¶

advantages: torch.Tensor¶

attention_mask: torch.LongTensor | None¶

base_action_log_probs: torch.Tensor¶

info: dict | None¶

returns: torch.Tensor¶

sequences: torch.Tensor¶

values: torch.Tensor¶

class lightrft.trainer.replay_buffer_utils.BufferItemVL(sequences: torch.Tensor, pixel_values: torch.Tensor | None = None, image_grid_thws: torch.Tensor | None = None, pixel_values_videos: torch.Tensor | None = None, video_grid_thws: torch.Tensor | None = None, raw_images: List[Image] | None = None, action_log_probs: torch.Tensor = None, base_action_log_probs: torch.Tensor = None, values: torch.Tensor = None, returns: torch.Tensor = None, advantages: torch.Tensor = None, attention_mask: torch.LongTensor | None = None, action_mask: torch.BoolTensor | None = None, info: dict | None = None, action_entropy: torch.Tensor | None = None)[source]¶

Bases: object

BufferItemVL is an item of experience data.

Shapes of each tensor: sequences: (S) pixel_values: (B*H, W) image_grid_thws: (B, 3) raw_images: Optional[List[Image.Image]] # raw images before processing action_log_probs: (A) base_action_log_probs: (A) values: (1) returns: (1) advantages: (1) attention_mask: (S) action_mask: (A) action_entropy: (A) - Entropy values for high-entropy token filtering

“A” is the number of actions.

action_entropy: torch.Tensor | None = None¶

action_log_probs: torch.Tensor = None¶

action_mask: torch.BoolTensor | None = None¶

advantages: torch.Tensor = None¶

attention_mask: torch.LongTensor | None = None¶

base_action_log_probs: torch.Tensor = None¶

image_grid_thws: torch.Tensor | None = None¶

info: dict | None = None¶

pixel_values: torch.Tensor | None = None¶

pixel_values_videos: torch.Tensor | None = None¶

raw_images: List[Image] | None = None¶

returns: torch.Tensor = None¶

sequences: torch.Tensor¶

values: torch.Tensor = None¶

video_grid_thws: torch.Tensor | None = None¶

lightrft.trainer.replay_buffer_utils.is_vl_experience(experience: Experience | ExperienceVL) → bool[source]¶

Determine if an experience is a vision-language experience.

Checks for the presence of vision-specific attributes to distinguish between language model experiences and vision-language experiences.

Parameters:: experience (Union[Experience, ExperienceVL]) – The experience object to check
Returns:: True if the experience contains vision data, False otherwise
Return type:: bool

Example:

exp = ExperienceVL(...)
if is_vl_experience(exp):
    print("This is a vision-language experience")

lightrft.trainer.replay_buffer_utils.make_experience_batch(items: List, packing_samples: bool = False) → Experience | ExperienceVL[source]¶

Create a batch experience from individual items.

This generic function automatically detects the item type and delegates to the appropriate batch creation function. It handles both packed and unpacked samples efficiently.

Parameters:

items (List) – List of individual experience items to batch
packing_samples (bool) – Whether to pack samples without padding (True) or use padding (False)

Returns:

Batched experience (either Experience or ExperienceVL)

Return type:

Union[Experience, ExperienceVL]

Raises:

ValueError – If items list is empty

Example:

# Create batch from items
items = [BufferItem(...), BufferItem(...)]
batch_exp = make_experience_batch(items, packing_samples=False)

# Create batch from vision-language items
vl_items = [BufferItemVL(...), BufferItemVL(...)]
batch_vl_exp = make_experience_batch(vl_items, packing_samples=True)

lightrft.trainer.replay_buffer_utils.remove_padding_in_sequences(items: List) → List[source]¶

Remove padding from sequences in experience items.

This generic function automatically detects the item type and delegates to the appropriate padding removal function. It removes both left and right padding from sequences to restore their original lengths.

Parameters:: items (List) – List of experience items with padded sequences
Returns:: List of experience items with padding removed
Return type:: List

Example:

# Remove padding from items
padded_items = [BufferItem(sequences=torch.tensor([0,0,1,2,3,0,0]), ...)]
clean_items = remove_padding_in_sequences(padded_items)
# Result: sequences become torch.tensor([1,2,3])

# Remove padding from vision-language items
padded_vl_items = [BufferItemVL(sequences=torch.tensor([0,0,4,5,6,0]), ...)]
clean_vl_items = remove_padding_in_sequences(padded_vl_items)
# Result: sequences become torch.tensor([4,5,6])

lightrft.trainer.replay_buffer_utils.split_experience_batch(experience: Experience | ExperienceVL) → List[source]¶

Split a batch of experiences into individual items.

Automatically detects the experience type and delegates to the appropriate splitting function. This is a generic interface that handles both types of experiences.

Parameters:: experience (Union[Experience, ExperienceVL]) – Batch experience to split into individual items
Returns:: List of individual experience items
Return type:: List

Example:

# Split a batch of experiences
batch_experience = make_experience_batch(items)
individual_items = split_experience_batch(batch_experience)

# Process each item individually
for item in individual_items:
    process_item(item)

lightrft.trainer.replay_buffer_utils.zero_pad_sequences(sequences: List[torch.Tensor], side: str = 'left') → torch.Tensor[source]¶

Zero-pad a list of sequences to the same length.

This utility function pads sequences to the maximum length in the batch, either on the left or right side. It is used for creating batched tensors from variable-length sequences.

Parameters:

sequences (List[torch.Tensor]) – List of sequences to pad (each sequence is a 1D tensor)
side (str) – Padding side, either “left” or “right”

Returns:

Batched tensor of padded sequences

Return type:

torch.Tensor

Raises:

AssertionError – If side is not “left” or “right”

Example:

sequences = [
    torch.tensor([1, 2, 3]),
    torch.tensor([4, 5]),
    torch.tensor([6, 7, 8, 9])
]

# Pad to the right
padded = zero_pad_sequences(sequences, side="right")
# Result: tensor([[1, 2, 3, 0],
#                 [4, 5, 0, 0],
#                 [6, 7, 8, 9]])

# Pad to the left
padded_left = zero_pad_sequences(sequences, side="left")
# Result: tensor([[0, 1, 2, 3],
#                 [0, 0, 4, 5],
#                 [6, 7, 8, 9]])