Shortcuts

worker.collector.episode_serial_collector

episode_serial_collector

Please Reference ding/worker/collector/episode_serial_collector.py for usage

EpisodeSerialCollector

class ding.worker.collector.episode_serial_collector.EpisodeSerialCollector(cfg: EasyDict, env: BaseEnvManager = None, policy: namedtuple = None, tb_logger: SummaryWriter = None, exp_name: str | None = 'default_experiment', instance_name: str | None = 'collector')[source]
Overview:

Episode collector(n_episode)

Interfaces:

__init__, reset, reset_env, reset_policy, collect, close

Property:

envstep

__del__() None[source]
Overview:

Execute the close command and close the collector. __del__ is automatically called to destroy the collector instance when the collector finishes its work

__init__(cfg: EasyDict, env: BaseEnvManager = None, policy: namedtuple = None, tb_logger: SummaryWriter = None, exp_name: str | None = 'default_experiment', instance_name: str | None = 'collector') None[source]
Overview:

Initialization method.

Arguments:
  • cfg (EasyDict): Config dict

  • env (BaseEnvManager): the subclass of vectorized env_manager(BaseEnvManager)

  • policy (namedtuple): the api namedtuple of collect_mode policy

  • tb_logger (SummaryWriter): tensorboard handle

_output_log(train_iter: int) None[source]
Overview:

Print the output log information. You can refer to Docs/Best Practice/How to understand training generated folders/Serial mode/log/collector for more details.

Arguments:
  • train_iter (int): the number of training iteration.

_reset_stat(env_id: int) None[source]
Overview:

Reset the collector’s state. Including reset the traj_buffer, obs_pool, policy_output_pool and env_info. Reset these states according to env_id. You can refer to base_serial_collector to get more messages.

Arguments:
  • env_id (int): the id where we need to reset the collector’s state

close() None[source]
Overview:

Close the collector. If end_flag is False, close the environment, flush the tb_logger and close the tb_logger.

collect(n_episode: int | None = None, train_iter: int = 0, policy_kwargs: dict | None = None) List[Any][source]
Overview:

Collect n_episode data with policy_kwargs, which is already trained train_iter iterations

Arguments:
  • n_episode (int): the number of collecting data episode

  • train_iter (int): the number of training iteration

  • policy_kwargs (dict): the keyword args for policy forward

Returns:
  • return_data (List): A list containing collected episodes if not get_train_sample, otherwise, return train_samples split by unroll_len.

property envstep: int
Overview:

Print the total envstep count.

Return:
  • envstep (int): The total envstep count.

reset(_policy: namedtuple | None = None, _env: BaseEnvManager | None = None) None[source]
Overview:

Reset the environment and policy. If _env is None, reset the old environment. If _env is not None, replace the old environment in the collector with the new passed in environment and launch. If _policy is None, reset the old policy. If _policy is not None, replace the old policy in the collector with the new passed in policy.

Arguments:
  • policy (Optional[namedtuple]): the api namedtuple of collect_mode policy

  • env (Optional[BaseEnvManager]): instance of the subclass of vectorized env_manager(BaseEnvManager)

reset_env(_env: BaseEnvManager | None = None) None[source]
Overview:

Reset the environment. If _env is None, reset the old environment. If _env is not None, replace the old environment in the collector with the new passed in environment and launch.

Arguments:
  • env (Optional[BaseEnvManager]): instance of the subclass of vectorized env_manager(BaseEnvManager)

reset_policy(_policy: namedtuple | None = None) None[source]
Overview:

Reset the policy. If _policy is None, reset the old policy. If _policy is not None, replace the old policy in the collector with the new passed in policy.

Arguments:
  • policy (Optional[namedtuple]): the api namedtuple of collect_mode policy