Agent

class lzero.agent.alphazero.AlphaZeroAgent(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None)[source]

Bases: object

Overview:

Agent class for executing AlphaZero algorithms which include methods for training, deployment, and batch evaluation.

Interfaces:

__init__, train, deploy, batch_evaluate

Properties:

best

Note

This agent class is tailored for use with the HuggingFace Model Zoo for LightZero (e.g. https://huggingface.co/OpenDILabCommunity/CartPole-v0-AlphaZero),

and provides methods such as “train” and “deploy”.

__init__(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None) None[source]
Overview:

Initialize the AlphaZeroAgent instance with environment parameters, model, and configuration.

Parameters:
  • env_id (-) – Identifier for the environment to be used, registered in gym.

  • seed (-) – Random seed for reproducibility. Defaults to 0.

  • exp_name (-) – Name for the experiment. Defaults to None.

  • model (-) – PyTorch module to be used as the model. If None, a default model is created. Defaults to None.

  • cfg (-) – Configuration for the agent. If None, default configuration will be used. Defaults to None.

  • policy_state_dict (-) – Path to a pre-trained model state dictionary. If provided, state dict will be loaded. Defaults to None.

Note

  • If env_id is not specified, it must be included in cfg.

  • The supported_env_list contains all the environment IDs that are supported by this agent.

batch_evaluate(n_evaluator_episode: int = None) EvalReturn[source]
Overview:

Perform a batch evaluation of the agent over a specified number of episodes: n_evaluator_episode.

Parameters:

n_evaluator_episode (-) – Number of episodes to run the evaluation. If None, uses default value from configuration. Defaults to None.

Returns:

  • An EvalReturn object with evaluation results such as mean and standard deviation of returns.

Note

This method evaluates the agent’s performance across multiple episodes to gauge its effectiveness.

property best
Overview:

Provides access to the best model according to evaluation metrics.

Returns:

  • The agent with the best model loaded.

Note

The best model is saved in the path ./exp_name/ckpt/ckpt_best.pth.tar. When this property is accessed, the agent instance will load the best model state.

deploy(enable_save_replay: bool = False, concatenate_all_replay: bool = False, replay_save_path: str = None, seed: int | List | None = None, debug: bool = False) EvalReturn[source]
Overview:

Deploy the agent for evaluation in the environment, with optional replay saving. The performance of the agent will be evaluated. Average return and standard deviation of the return will be returned. If enable_save_replay is True, replay videos are saved in the specified replay_save_path.

Parameters:
  • enable_save_replay (-) – Flag to enable saving of replay footage. Defaults to False.

  • concatenate_all_replay (-) – Whether to concatenate all replay videos into one file. Defaults to False.

  • replay_save_path (-) – Directory path to save replay videos. Defaults to None, which sets a default path.

  • seed (-) – Seed or list of seeds for environment reproducibility. Defaults to None.

  • debug (-) – Whether to enable the debug mode. Default to False.

Returns:

  • An EvalReturn object containing evaluation metrics such as mean and standard deviation of returns.

supported_env_list = ['Gomoku-play-with-bot', 'TicTacToe-play-with-bot']
train(step: int = 10000000) TrainingReturn[source]
Overview:

Train the agent through interactions with the environment.

Parameters:

step (-) – Total number of environment steps to train for. Defaults to 10 million (1e7).

Returns:

  • A TrainingReturn object containing training information, such as logs and potentially a URL to a training dashboard.

Note

The method involves interacting with the environment, collecting experience, and optimizing the model.

class lzero.agent.muzero.MuZeroAgent(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None)[source]

Bases: object

Overview:

Agent class for executing MuZero algorithms which include methods for training, deployment, and batch evaluation.

Interfaces:

__init__, train, deploy, batch_evaluate

Properties:

best

Note

This agent class is tailored for use with the HuggingFace Model Zoo for LightZero (e.g. https://huggingface.co/OpenDILabCommunity/CartPole-v0-MuZero),

and provides methods such as “train” and “deploy”.

__init__(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None) None[source]
Overview:

Initialize the MuZeroAgent instance with environment parameters, model, and configuration.

Parameters:
  • env_id (-) – Identifier for the environment to be used, registered in gym.

  • seed (-) – Random seed for reproducibility. Defaults to 0.

  • exp_name (-) – Name for the experiment. Defaults to None.

  • model (-) – PyTorch module to be used as the model. If None, a default model is created. Defaults to None.

  • cfg (-) – Configuration for the agent. If None, default configuration will be used. Defaults to None.

  • policy_state_dict (-) – Path to a pre-trained model state dictionary. If provided, state dict will be loaded. Defaults to None.

Note

  • If env_id is not specified, it must be included in cfg.

  • The supported_env_list contains all the environment IDs that are supported by this agent.

batch_evaluate(n_evaluator_episode: int = None) EvalReturn[source]
Overview:

Perform a batch evaluation of the agent over a specified number of episodes: n_evaluator_episode.

Parameters:

n_evaluator_episode (-) – Number of episodes to run the evaluation. If None, uses default value from configuration. Defaults to None.

Returns:

  • An EvalReturn object with evaluation results such as mean and standard deviation of returns.

Note

This method evaluates the agent’s performance across multiple episodes to gauge its effectiveness.

property best
Overview:

Provides access to the best model according to evaluation metrics.

Returns:

  • The agent with the best model loaded.

Note

The best model is saved in the path ./exp_name/ckpt/ckpt_best.pth.tar. When this property is accessed, the agent instance will load the best model state.

deploy(enable_save_replay: bool = False, concatenate_all_replay: bool = False, replay_save_path: str = None, seed: int | List | None = None, debug: bool = False) EvalReturn[source]
Overview:

Deploy the agent for evaluation in the environment, with optional replay saving. The performance of the agent will be evaluated. Average return and standard deviation of the return will be returned. If enable_save_replay is True, replay videos are saved in the specified replay_save_path.

Parameters:
  • enable_save_replay (-) – Flag to enable saving of replay footage. Defaults to False.

  • concatenate_all_replay (-) – Whether to concatenate all replay videos into one file. Defaults to False.

  • replay_save_path (-) – Directory path to save replay videos. Defaults to None, which sets a default path.

  • seed (-) – Seed or list of seeds for environment reproducibility. Defaults to None.

  • debug (-) – Whether to enable the debug mode. Default to False.

Returns:

  • An EvalReturn object containing evaluation metrics such as mean and standard deviation of returns.

supported_env_list = ['Gomoku-play-with-bot', 'BreakoutNoFrameskip-v4', 'CartPole-v0', 'LunarLander-v2', 'MsPacmanNoFrameskip-v4', 'Pendulum-v1', 'PongNoFrameskip-v4', 'TicTacToe-play-with-bot']
train(step: int = 10000000) TrainingReturn[source]
Overview:

Train the agent through interactions with the environment.

Parameters:

step (-) – Total number of environment steps to train for. Defaults to 10 million (1e7).

Returns:

  • A TrainingReturn object containing training information, such as logs and potentially a URL to a training dashboard.

Note

The method involves interacting with the environment, collecting experience, and optimizing the model.

class lzero.agent.efficientzero.EfficientZeroAgent(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None)[source]

Bases: object

Overview:

Agent class for executing EfficientZero algorithms which include methods for training, deployment, and batch evaluation.

Interfaces:

__init__, train, deploy, batch_evaluate

Properties:

best

Note

This agent class is tailored for use with the HuggingFace Model Zoo for LightZero (e.g. https://huggingface.co/OpenDILabCommunity/CartPole-v0-EfficientZero),

and provides methods such as “train” and “deploy”.

__init__(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None) None[source]
Overview:

Initialize the EfficientZeroAgent instance with environment parameters, model, and configuration.

Parameters:
  • env_id (-) – Identifier for the environment to be used, registered in gym.

  • seed (-) – Random seed for reproducibility. Defaults to 0.

  • exp_name (-) – Name for the experiment. Defaults to None.

  • model (-) – PyTorch module to be used as the model. If None, a default model is created. Defaults to None.

  • cfg (-) – Configuration for the agent. If None, default configuration will be used. Defaults to None.

  • policy_state_dict (-) – Path to a pre-trained model state dictionary. If provided, state dict will be loaded. Defaults to None.

Note

  • If env_id is not specified, it must be included in cfg.

  • The supported_env_list contains all the environment IDs that are supported by this agent.

batch_evaluate(n_evaluator_episode: int = None) EvalReturn[source]
Overview:

Perform a batch evaluation of the agent over a specified number of episodes: n_evaluator_episode.

Parameters:

n_evaluator_episode (-) – Number of episodes to run the evaluation. If None, uses default value from configuration. Defaults to None.

Returns:

  • An EvalReturn object with evaluation results such as mean and standard deviation of returns.

Note

This method evaluates the agent’s performance across multiple episodes to gauge its effectiveness.

property best
Overview:

Provides access to the best model according to evaluation metrics.

Returns:

  • The agent with the best model loaded.

Note

The best model is saved in the path ./exp_name/ckpt/ckpt_best.pth.tar. When this property is accessed, the agent instance will load the best model state.

deploy(enable_save_replay: bool = False, concatenate_all_replay: bool = False, replay_save_path: str = None, seed: int | List | None = None, debug: bool = False) EvalReturn[source]
Overview:

Deploy the agent for evaluation in the environment, with optional replay saving. The performance of the agent will be evaluated. Average return and standard deviation of the return will be returned. If enable_save_replay is True, replay videos are saved in the specified replay_save_path.

Parameters:
  • enable_save_replay (-) – Flag to enable saving of replay footage. Defaults to False.

  • concatenate_all_replay (-) – Whether to concatenate all replay videos into one file. Defaults to False.

  • replay_save_path (-) – Directory path to save replay videos. Defaults to None, which sets a default path.

  • seed (-) – Seed or list of seeds for environment reproducibility. Defaults to None.

  • debug (-) – Whether to enable the debug mode. Default to False.

Returns:

  • An EvalReturn object containing evaluation metrics such as mean and standard deviation of returns.

supported_env_list = ['BreakoutNoFrameskip-v4', 'CartPole-v0', 'LunarLander-v2', 'MsPacmanNoFrameskip-v4', 'Pendulum-v1', 'PongNoFrameskip-v4']
train(step: int = 10000000) TrainingReturn[source]
Overview:

Train the agent through interactions with the environment.

Parameters:

step (-) – Total number of environment steps to train for. Defaults to 10 million (1e7).

Returns:

  • A TrainingReturn object containing training information, such as logs and potentially a URL to a training dashboard.

Note

The method involves interacting with the environment, collecting experience, and optimizing the model.

class lzero.agent.gumbel_muzero.GumbelMuZeroAgent(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None)[source]

Bases: object

Overview:

Agent class for executing Gumbel MuZero algorithms which include methods for training, deployment, and batch evaluation.

Interfaces:

__init__, train, deploy, batch_evaluate

Properties:

best

Note

This agent class is tailored for use with the HuggingFace Model Zoo for LightZero (e.g. https://huggingface.co/OpenDILabCommunity/CartPole-v0-GumbelMuZero),

and provides methods such as “train” and “deploy”.

__init__(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None) None[source]
Overview:

Initialize the GumbelMuZeroAgent instance with environment parameters, model, and configuration.

Parameters:
  • env_id (-) – Identifier for the environment to be used, registered in gym.

  • seed (-) – Random seed for reproducibility. Defaults to 0.

  • exp_name (-) – Name for the experiment. Defaults to None.

  • model (-) – PyTorch module to be used as the model. If None, a default model is created. Defaults to None.

  • cfg (-) – Configuration for the agent. If None, default configuration will be used. Defaults to None.

  • policy_state_dict (-) – Path to a pre-trained model state dictionary. If provided, state dict will be loaded. Defaults to None.

Note

  • If env_id is not specified, it must be included in cfg.

  • The supported_env_list contains all the environment IDs that are supported by this agent.

batch_evaluate(n_evaluator_episode: int = None) EvalReturn[source]
Overview:

Perform a batch evaluation of the agent over a specified number of episodes: n_evaluator_episode.

Parameters:

n_evaluator_episode (-) – Number of episodes to run the evaluation. If None, uses default value from configuration. Defaults to None.

Returns:

  • An EvalReturn object with evaluation results such as mean and standard deviation of returns.

Note

This method evaluates the agent’s performance across multiple episodes to gauge its effectiveness.

property best
Overview:

Provides access to the best model according to evaluation metrics.

Returns:

  • The agent with the best model loaded.

Note

The best model is saved in the path ./exp_name/ckpt/ckpt_best.pth.tar. When this property is accessed, the agent instance will load the best model state.

deploy(enable_save_replay: bool = False, concatenate_all_replay: bool = False, replay_save_path: str = None, seed: int | List | None = None, debug: bool = False) EvalReturn[source]
Overview:

Deploy the agent for evaluation in the environment, with optional replay saving. The performance of the agent will be evaluated. Average return and standard deviation of the return will be returned. If enable_save_replay is True, replay videos are saved in the specified replay_save_path.

Parameters:
  • enable_save_replay (-) – Flag to enable saving of replay footage. Defaults to False.

  • concatenate_all_replay (-) – Whether to concatenate all replay videos into one file. Defaults to False.

  • replay_save_path (-) – Directory path to save replay videos. Defaults to None, which sets a default path.

  • seed (-) – Seed or list of seeds for environment reproducibility. Defaults to None.

  • debug (-) – Whether to enable the debug mode. Default to False.

Returns:

  • An EvalReturn object containing evaluation metrics such as mean and standard deviation of returns.

supported_env_list = ['Gomoku-play-with-bot', 'CartPole-v0', 'TicTacToe-play-with-bot']
train(step: int = 10000000) TrainingReturn[source]
Overview:

Train the agent through interactions with the environment.

Parameters:

step (-) – Total number of environment steps to train for. Defaults to 10 million (1e7).

Returns:

  • A TrainingReturn object containing training information, such as logs and potentially a URL to a training dashboard.

Note

The method involves interacting with the environment, collecting experience, and optimizing the model.

class lzero.agent.sampled_efficientzero.SampledEfficientZeroAgent(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None)[source]

Bases: object

Overview:

Agent class for executing Sampled EfficientZero algorithms which include methods for training, deployment, and batch evaluation.

Interfaces:

__init__, train, deploy, batch_evaluate

Properties:

best

Note

This agent class is tailored for use with the HuggingFace Model Zoo for LightZero (e.g. https://huggingface.co/OpenDILabCommunity/CartPole-v0-SampledEfficientZero),

and provides methods such as “train” and “deploy”.

__init__(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None) None[source]
Overview:

Initialize the SampledEfficientZeroAgent instance with environment parameters, model, and configuration.

Parameters:
  • env_id (-) – Identifier for the environment to be used, registered in gym.

  • seed (-) – Random seed for reproducibility. Defaults to 0.

  • exp_name (-) – Name for the experiment. Defaults to None.

  • model (-) – PyTorch module to be used as the model. If None, a default model is created. Defaults to None.

  • cfg (-) – Configuration for the agent. If None, default configuration will be used. Defaults to None.

  • policy_state_dict (-) – Path to a pre-trained model state dictionary. If provided, state dict will be loaded. Defaults to None.

Note

  • If env_id is not specified, it must be included in cfg.

  • The supported_env_list contains all the environment IDs that are supported by this agent.

batch_evaluate(n_evaluator_episode: int = None) EvalReturn[source]
Overview:

Perform a batch evaluation of the agent over a specified number of episodes: n_evaluator_episode.

Parameters:

n_evaluator_episode (-) – Number of episodes to run the evaluation. If None, uses default value from configuration. Defaults to None.

Returns:

  • An EvalReturn object with evaluation results such as mean and standard deviation of returns.

Note

This method evaluates the agent’s performance across multiple episodes to gauge its effectiveness.

property best
Overview:

Provides access to the best model according to evaluation metrics.

Returns:

  • The agent with the best model loaded.

Note

The best model is saved in the path ./exp_name/ckpt/ckpt_best.pth.tar. When this property is accessed, the agent instance will load the best model state.

deploy(enable_save_replay: bool = False, concatenate_all_replay: bool = False, replay_save_path: str = None, seed: int | List | None = None, debug: bool = False) EvalReturn[source]
Overview:

Deploy the agent for evaluation in the environment, with optional replay saving. The performance of the agent will be evaluated. Average return and standard deviation of the return will be returned. If enable_save_replay is True, replay videos are saved in the specified replay_save_path.

Parameters:
  • enable_save_replay (-) – Flag to enable saving of replay footage. Defaults to False.

  • concatenate_all_replay (-) – Whether to concatenate all replay videos into one file. Defaults to False.

  • replay_save_path (-) – Directory path to save replay videos. Defaults to None, which sets a default path.

  • seed (-) – Seed or list of seeds for environment reproducibility. Defaults to None.

  • debug (-) – Whether to enable the debug mode. Default to False.

Returns:

  • An EvalReturn object containing evaluation metrics such as mean and standard deviation of returns.

supported_env_list = ['BreakoutNoFrameskip-v4', 'CartPole-v0', 'LunarLanderContinuous-v2', 'MsPacmanNoFrameskip-v4', 'Pendulum-v1', 'PongNoFrameskip-v4']
train(step: int = 10000000) TrainingReturn[source]
Overview:

Train the agent through interactions with the environment.

Parameters:

step (-) – Total number of environment steps to train for. Defaults to 10 million (1e7).

Returns:

  • A TrainingReturn object containing training information, such as logs and potentially a URL to a training dashboard.

Note

The method involves interacting with the environment, collecting experience, and optimizing the model.

class lzero.agent.sampled_alphazero.SampledAlphaZeroAgent(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None)[source]

Bases: object

Overview:

Agent class for executing AlphaZero algorithms which include methods for training, deployment, and batch evaluation.

Interfaces:

__init__, train, deploy, batch_evaluate

Properties:

best

Note

This agent class is tailored for use with the HuggingFace Model Zoo for LightZero (e.g. https://huggingface.co/OpenDILabCommunity/CartPole-v0-AlphaZero),

and provides methods such as “train” and “deploy”.

__init__(env_id: str = None, seed: int = 0, exp_name: str = None, model: Module | None = None, cfg: EasyDict | dict | None = None, policy_state_dict: str = None) None[source]
Overview:

Initialize the SampledAlphaZeroAgent instance with environment parameters, model, and configuration.

Parameters:
  • env_id (-) – Identifier for the environment to be used, registered in gym.

  • seed (-) – Random seed for reproducibility. Defaults to 0.

  • exp_name (-) – Name for the experiment. Defaults to None.

  • model (-) – PyTorch module to be used as the model. If None, a default model is created. Defaults to None.

  • cfg (-) – Configuration for the agent. If None, default configuration will be used. Defaults to None.

  • policy_state_dict (-) – Path to a pre-trained model state dictionary. If provided, state dict will be loaded. Defaults to None.

Note

  • If env_id is not specified, it must be included in cfg.

  • The supported_env_list contains all the environment IDs that are supported by this agent.

batch_evaluate(n_evaluator_episode: int = None) EvalReturn[source]
Overview:

Perform a batch evaluation of the agent over a specified number of episodes: n_evaluator_episode.

Parameters:

n_evaluator_episode (-) – Number of episodes to run the evaluation. If None, uses default value from configuration. Defaults to None.

Returns:

  • An EvalReturn object with evaluation results such as mean and standard deviation of returns.

Note

This method evaluates the agent’s performance across multiple episodes to gauge its effectiveness.

property best
Overview:

Provides access to the best model according to evaluation metrics.

Returns:

  • The agent with the best model loaded.

Note

The best model is saved in the path ./exp_name/ckpt/ckpt_best.pth.tar. When this property is accessed, the agent instance will load the best model state.

deploy(enable_save_replay: bool = False, concatenate_all_replay: bool = False, replay_save_path: str = None, seed: int | List | None = None, debug: bool = False) EvalReturn[source]
Overview:

Deploy the agent for evaluation in the environment, with optional replay saving. The performance of the agent will be evaluated. Average return and standard deviation of the return will be returned. If enable_save_replay is True, replay videos are saved in the specified replay_save_path.

Parameters:
  • enable_save_replay (-) – Flag to enable saving of replay footage. Defaults to False.

  • concatenate_all_replay (-) – Whether to concatenate all replay videos into one file. Defaults to False.

  • replay_save_path (-) – Directory path to save replay videos. Defaults to None, which sets a default path.

  • seed (-) – Seed or list of seeds for environment reproducibility. Defaults to None.

  • debug (-) – Whether to enable the debug mode. Default to False.

Returns:

  • An EvalReturn object containing evaluation metrics such as mean and standard deviation of returns.

supported_env_list = ['Gomoku-play-with-bot', 'TicTacToe-play-with-bot']
train(step: int = 10000000) TrainingReturn[source]
Overview:

Train the agent through interactions with the environment.

Parameters:

step (-) – Total number of environment steps to train for. Defaults to 10 million (1e7).

Returns:

  • A TrainingReturn object containing training information, such as logs and potentially a URL to a training dashboard.

Note

The method involves interacting with the environment, collecting experience, and optimizing the model.