lightrft.trainer.replay_buffer¶
- class lightrft.trainer.replay_buffer.NaiveReplayBuffer(sample_batch_size: int, limit: int = 0, cpu_offload: bool = True, packing_samples: bool = False)[source]¶
Bases:
ABCNaive replay buffer class. It stores experience samples.
- Parameters:
sample_batch_size (int) – Batch size when sampling.
limit (int) – Limit of number of experience samples. A number <= 0 means unlimited, defaults to 0.
cpu_offload (bool) – Whether to offload experience to CPU when sampling, defaults to True.
packing_samples (bool) – Whether to use packed samples format, defaults to False.
- append(experience: Experience) None¶
Append experience to the replay buffer.
- Parameters:
experience (Experience) – Experience batch to append.
- collate_fn(batch) Experience[source]¶
Collate function for DataLoader.
- Parameters:
batch (List[BufferItem]) – Batch of buffer items.
- Returns:
Batched experience.
- Return type:
- normalize(attribute: str, strategy) None[source]¶
Normalize a specified attribute across all items in the buffer.
This method computes the mean and standard deviation of the specified attribute across all items and normalizes them. Currently only supports “advantages”.
- Parameters:
attribute (str) – Name of the attribute to normalize (currently only “advantages” is supported).
strategy (Strategy) – Distributed training strategy for all_reduce operations.
- sample() Experience¶
Sample a batch of experiences from the replay buffer.
- Returns:
Batch of sampled experiences.
- Return type: