Entry

class lzero.entry.train_alphazero.train_alphazero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]

Bases:

Overview:: The train entry for AlphaZero.

Parameters:

input_cfg (-) – Config in dict type. Tuple[dict, dict] type means [user_config, create_cfg].
seed (-) – Random seed.
env_setting (-) – A list with 3 elements: BaseEnv subclass, collector env config, and evaluator env config.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like exp_name/ckpt/ckpt_best.pth.tar.
max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.

Returns:

Converged policy.

Return type:

policy (Policy)

__init__(**kwargs)

class lzero.entry.eval_alphazero.eval_alphazero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, num_episodes_each_seed: int = 1, print_seed_details: int = False)[source]

Bases:

Overview:: The eval entry for AlphaZero.

Parameters:

input_cfg (-) – Config in dict type. Tuple[dict, dict] type means [user_config, create_cfg].
seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like exp_name/ckpt/ckpt_best.pth.tar.
max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.

Returns:

Converged policy.

Return type:

policy (Policy)

__init__(**kwargs)

class lzero.entry.train_muzero.train_muzero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]

Bases:

Overview:: The train entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero, Gumbel Muzero.

Parameters:

input_cfg (-) – Config in dict type. Tuple[dict, dict] type means [user_config, create_cfg].
seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like exp_name/ckpt/ckpt_best.pth.tar.
max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.

Returns:

Converged policy.

Return type:

policy (Policy)

__init__(**kwargs)

class lzero.entry.eval_muzero.eval_muzero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, num_episodes_each_seed: int = 1, print_seed_details: int = False)[source]

Bases:

Overview:: The eval entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero, StochasticMuZero, GumbelMuZero, UniZero, etc.

Parameters:

input_cfg (-) – Config in dict type. Tuple[dict, dict] type means [user_config, create_cfg].
seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like exp_name/ckpt/ckpt_best.pth.tar.

Returns:

Converged policy.

Return type:

policy (Policy)

__init__(**kwargs)

class lzero.entry.train_muzero_with_gym_env.train_muzero_with_gym_env(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]

Bases:

Overview:: The train entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero. We create a gym environment using env_id parameter, and then convert it to the format required by LightZero using LightZeroEnvWrapper class. Please refer to the get_wrappered_env method for more details.

Parameters:

input_cfg (-) – Config in dict type. Tuple[dict, dict] type means [user_config, create_cfg].
seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like exp_name/ckpt/ckpt_best.pth.tar.
max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.

Returns:

Converged policy.

Return type:

policy (Policy)

__init__(**kwargs)

class lzero.entry.eval_muzero_with_gym_env.eval_muzero_with_gym_env(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, num_episodes_each_seed: int = 1, print_seed_details: int = False)[source]

Bases:

Overview:: The eval entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero. We create a gym environment using env_id parameter, and then convert it to the format required by LightZero using LightZeroEnvWrapper class. Please refer to the get_wrappered_env method for more details.

Parameters:

input_cfg (-) – Config in dict type. Tuple[dict, dict] type means [user_config, create_cfg].
seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like exp_name/ckpt/ckpt_best.pth.tar.

Returns:

Converged policy.

Return type:

policy (Policy)

__init__(**kwargs)

class lzero.entry.train_muzero_with_reward_model.train_muzero_with_reward_model(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]

Bases:

Overview:: The train entry for MCTS+RL algorithms augmented with reward_model.

Parameters:

input_cfg (-) – Config in dict type. Tuple[dict, dict] type means [user_config, create_cfg].
seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like exp_name/ckpt/ckpt_best.pth.tar.
max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.

Returns:

Converged policy.

Return type:

policy (Policy)

__init__(**kwargs)