Shortcuts

framework.middleware.learner

learner

OffPolicyLearner

class ding.framework.middleware.learner.OffPolicyLearner(*args, **kwargs)[source]
Overview:

The class of the off-policy learner, including data fetching and model training. Use the __call__ method to execute the whole learning process.

__call__(ctx: OnlineRLContext) None[source]
Output of ctx:
  • train_output (Deque): The training output in deque.

__init__(cfg: EasyDict, policy: Policy, buffer_: Buffer | List[Tuple[Buffer, float]] | Dict[str, Buffer], reward_model: BaseRewardModel | None = None, log_freq: int = 100) None[source]
Arguments:
  • cfg (EasyDict): Config.

  • policy (Policy): The policy to be trained.

  • buffer (Buffer): The replay buffer to store the data for training.

  • reward_model (BaseRewardModel): Additional reward estimator likes RND, ICM, etc. default to None.

  • log_freq (int): The frequency (iteration) of showing log.

HERLearner

class ding.framework.middleware.learner.HERLearner(cfg: EasyDict, policy, buffer_: Buffer | List[Tuple[Buffer, float]] | Dict[str, Buffer], her_reward_model)[source]
Overview:

The class of the learner with the Hindsight Experience Replay (HER). Use the __call__ method to execute the data featching and training process.

__call__(ctx: OnlineRLContext) None[source]
Output of ctx:
  • train_output (Deque): The deque of training output.

__init__(cfg: EasyDict, policy, buffer_: Buffer | List[Tuple[Buffer, float]] | Dict[str, Buffer], her_reward_model) None[source]
Arguments:
  • cfg (EasyDict): Config.

  • policy (Policy): The policy to be trained.

  • buffer_ (Buffer): The replay buffer to store the data for training.

  • her_reward_model (HerRewardModel): HER reward model.