Entry
- class lzero.entry.train_alphazero.train_alphazero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]
Bases:
- Overview:
The train entry for AlphaZero.
- Parameters:
input_cfg (-) – Config in dict type.
Tuple[dict, dict]
type means [user_config, create_cfg].seed (-) – Random seed.
env_setting (-) – A list with 3 elements:
BaseEnv
subclass, collector env config, and evaluator env config.model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like
exp_name/ckpt/ckpt_best.pth.tar
.max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.
- Returns:
Converged policy.
- Return type:
policy (
Policy
)
- __init__(**kwargs)
- class lzero.entry.eval_alphazero.eval_alphazero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, num_episodes_each_seed: int = 1, print_seed_details: int = False)[source]
Bases:
- Overview:
The eval entry for AlphaZero.
- Parameters:
input_cfg (-) – Config in dict type.
Tuple[dict, dict]
type means [user_config, create_cfg].seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like
exp_name/ckpt/ckpt_best.pth.tar
.max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.
- Returns:
Converged policy.
- Return type:
policy (
Policy
)
- __init__(**kwargs)
- class lzero.entry.train_muzero.train_muzero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]
Bases:
- Overview:
The train entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero, Gumbel Muzero.
- Parameters:
input_cfg (-) – Config in dict type.
Tuple[dict, dict]
type means [user_config, create_cfg].seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like
exp_name/ckpt/ckpt_best.pth.tar
.max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.
- Returns:
Converged policy.
- Return type:
policy (
Policy
)
- __init__(**kwargs)
- class lzero.entry.eval_muzero.eval_muzero(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, num_episodes_each_seed: int = 1, print_seed_details: int = False)[source]
Bases:
- Overview:
The eval entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero, StochasticMuZero, GumbelMuZero, UniZero, etc.
- Parameters:
input_cfg (-) – Config in dict type.
Tuple[dict, dict]
type means [user_config, create_cfg].seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like
exp_name/ckpt/ckpt_best.pth.tar
.
- Returns:
Converged policy.
- Return type:
policy (
Policy
)
- __init__(**kwargs)
- class lzero.entry.train_muzero_with_gym_env.train_muzero_with_gym_env(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]
Bases:
- Overview:
The train entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero. We create a gym environment using env_id parameter, and then convert it to the format required by LightZero using LightZeroEnvWrapper class. Please refer to the get_wrappered_env method for more details.
- Parameters:
input_cfg (-) – Config in dict type.
Tuple[dict, dict]
type means [user_config, create_cfg].seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like
exp_name/ckpt/ckpt_best.pth.tar
.max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.
- Returns:
Converged policy.
- Return type:
policy (
Policy
)
- __init__(**kwargs)
- class lzero.entry.eval_muzero_with_gym_env.eval_muzero_with_gym_env(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, num_episodes_each_seed: int = 1, print_seed_details: int = False)[source]
Bases:
- Overview:
The eval entry for MCTS+RL algorithms, including MuZero, EfficientZero, Sampled EfficientZero. We create a gym environment using env_id parameter, and then convert it to the format required by LightZero using LightZeroEnvWrapper class. Please refer to the get_wrappered_env method for more details.
- Parameters:
input_cfg (-) – Config in dict type.
Tuple[dict, dict]
type means [user_config, create_cfg].seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like
exp_name/ckpt/ckpt_best.pth.tar
.
- Returns:
Converged policy.
- Return type:
policy (
Policy
)
- __init__(**kwargs)
- class lzero.entry.train_muzero_with_reward_model.train_muzero_with_reward_model(input_cfg: Tuple[dict, dict], seed: int = 0, model: Module | None = None, model_path: str | None = None, max_train_iter: int | None = 10000000000, max_env_step: int | None = 10000000000)[source]
Bases:
- Overview:
The train entry for MCTS+RL algorithms augmented with reward_model.
- Parameters:
input_cfg (-) – Config in dict type.
Tuple[dict, dict]
type means [user_config, create_cfg].seed (-) – Random seed.
model (-) – Instance of torch.nn.Module.
model_path (-) – The pretrained model path, which should point to the ckpt file of the pretrained model, and an absolute path is recommended. In LightZero, the path is usually something like
exp_name/ckpt/ckpt_best.pth.tar
.max_train_iter (-) – Maximum policy update iterations in training.
max_env_step (-) – Maximum collected environment interaction steps.
- Returns:
Converged policy.
- Return type:
policy (
Policy
)
- __init__(**kwargs)