framework.middleware.learner¶
learner¶
OffPolicyLearner¶
- class ding.framework.middleware.learner.OffPolicyLearner(*args, **kwargs)[source]¶
- Overview:
The class of the off-policy learner, including data fetching and model training. Use the __call__ method to execute the whole learning process.
- __call__(ctx: OnlineRLContext) None [source]¶
- Output of ctx:
train_output (
Deque
): The training output in deque.
- __init__(cfg: EasyDict, policy: Policy, buffer_: Buffer | List[Tuple[Buffer, float]] | Dict[str, Buffer], reward_model: BaseRewardModel | None = None, log_freq: int = 100) None [source]¶
- Arguments:
cfg (
EasyDict
): Config.policy (
Policy
): The policy to be trained.buffer (
Buffer
): The replay buffer to store the data for training.reward_model (
BaseRewardModel
): Additional reward estimator likes RND, ICM, etc. default to None.log_freq (
int
): The frequency (iteration) of showing log.
HERLearner¶
- class ding.framework.middleware.learner.HERLearner(cfg: EasyDict, policy, buffer_: Buffer | List[Tuple[Buffer, float]] | Dict[str, Buffer], her_reward_model)[source]¶
- Overview:
The class of the learner with the Hindsight Experience Replay (HER). Use the __call__ method to execute the data featching and training process.
- __call__(ctx: OnlineRLContext) None [source]¶
- Output of ctx:
train_output (
Deque
): The deque of training output.
- __init__(cfg: EasyDict, policy, buffer_: Buffer | List[Tuple[Buffer, float]] | Dict[str, Buffer], her_reward_model) None [source]¶
- Arguments:
cfg (
EasyDict
): Config.policy (
Policy
): The policy to be trained.buffer_ (
Buffer
): The replay buffer to store the data for training.her_reward_model (
HerRewardModel
): HER reward model.