lightrft.trainer.replay_buffer¶

class lightrft.trainer.replay_buffer.NaiveReplayBuffer(sample_batch_size: int, limit: int = 0, cpu_offload: bool = True, packing_samples: bool = False)[source]¶

Bases: ABC

Naive replay buffer class. It stores experience samples.

Parameters:

sample_batch_size (int) – Batch size when sampling.
limit (int) – Limit of number of experience samples. A number <= 0 means unlimited, defaults to 0.
cpu_offload (bool) – Whether to offload experience to CPU when sampling, defaults to True.
packing_samples (bool) – Whether to use packed samples format, defaults to False.

append(experience: Experience) → None¶

Append experience to the replay buffer.

Parameters:: experience (Experience) – Experience batch to append.

clear() → None[source]¶: Clear all items from the replay buffer.

collate_fn(batch) → Experience[source]¶

Collate function for DataLoader.

Parameters:: batch (List[BufferItem]) – Batch of buffer items.
Returns:: Batched experience.
Return type:: Experience

normalize(attribute: str, strategy) → None[source]¶

Normalize a specified attribute across all items in the buffer.

This method computes the mean and standard deviation of the specified attribute across all items and normalizes them. Currently only supports “advantages”.

Parameters:

attribute (str) – Name of the attribute to normalize (currently only “advantages” is supported).
strategy (Strategy) – Distributed training strategy for all_reduce operations.

sample() → Experience¶

Sample a batch of experiences from the replay buffer.

Returns:: Batch of sampled experiences.
Return type:: Experience