worker.collector.episode_serial_collector¶
episode_serial_collector¶
Please Reference ding/worker/collector/episode_serial_collector.py for usage
EpisodeSerialCollector¶
- class ding.worker.collector.episode_serial_collector.EpisodeSerialCollector(cfg: EasyDict, env: BaseEnvManager = None, policy: namedtuple = None, tb_logger: SummaryWriter = None, exp_name: str | None = 'default_experiment', instance_name: str | None = 'collector')[source]¶
- Overview:
Episode collector(n_episode)
- Interfaces:
__init__, reset, reset_env, reset_policy, collect, close
- Property:
envstep
- __del__() None [source]¶
- Overview:
Execute the close command and close the collector. __del__ is automatically called to destroy the collector instance when the collector finishes its work
- __init__(cfg: EasyDict, env: BaseEnvManager = None, policy: namedtuple = None, tb_logger: SummaryWriter = None, exp_name: str | None = 'default_experiment', instance_name: str | None = 'collector') None [source]¶
- Overview:
Initialization method.
- Arguments:
cfg (
EasyDict
): Config dictenv (
BaseEnvManager
): the subclass of vectorized env_manager(BaseEnvManager)policy (
namedtuple
): the api namedtuple of collect_mode policytb_logger (
SummaryWriter
): tensorboard handle
- _output_log(train_iter: int) None [source]¶
- Overview:
Print the output log information. You can refer to Docs/Best Practice/How to understand training generated folders/Serial mode/log/collector for more details.
- Arguments:
train_iter (
int
): the number of training iteration.
- _reset_stat(env_id: int) None [source]¶
- Overview:
Reset the collector’s state. Including reset the traj_buffer, obs_pool, policy_output_pool and env_info. Reset these states according to env_id. You can refer to base_serial_collector to get more messages.
- Arguments:
env_id (
int
): the id where we need to reset the collector’s state
- close() None [source]¶
- Overview:
Close the collector. If end_flag is False, close the environment, flush the tb_logger and close the tb_logger.
- collect(n_episode: int | None = None, train_iter: int = 0, policy_kwargs: dict | None = None) List[Any] [source]¶
- Overview:
Collect n_episode data with policy_kwargs, which is already trained train_iter iterations
- Arguments:
n_episode (
int
): the number of collecting data episodetrain_iter (
int
): the number of training iterationpolicy_kwargs (
dict
): the keyword args for policy forward
- Returns:
return_data (
List
): A list containing collected episodes if not get_train_sample, otherwise, return train_samples split by unroll_len.
- property envstep: int¶
- Overview:
Print the total envstep count.
- Return:
envstep (
int
): The total envstep count.
- reset(_policy: namedtuple | None = None, _env: BaseEnvManager | None = None) None [source]¶
- Overview:
Reset the environment and policy. If _env is None, reset the old environment. If _env is not None, replace the old environment in the collector with the new passed in environment and launch. If _policy is None, reset the old policy. If _policy is not None, replace the old policy in the collector with the new passed in policy.
- Arguments:
policy (
Optional[namedtuple]
): the api namedtuple of collect_mode policyenv (
Optional[BaseEnvManager]
): instance of the subclass of vectorized env_manager(BaseEnvManager)
- reset_env(_env: BaseEnvManager | None = None) None [source]¶
- Overview:
Reset the environment. If _env is None, reset the old environment. If _env is not None, replace the old environment in the collector with the new passed in environment and launch.
- Arguments:
env (
Optional[BaseEnvManager]
): instance of the subclass of vectorized env_manager(BaseEnvManager)
- reset_policy(_policy: namedtuple | None = None) None [source]¶
- Overview:
Reset the policy. If _policy is None, reset the old policy. If _policy is not None, replace the old policy in the collector with the new passed in policy.
- Arguments:
policy (
Optional[namedtuple]
): the api namedtuple of collect_mode policy