How to Set Configuration Files in LightZero

In the LightZero framework, to run a specific algorithm in a specific environment, you need to set the corresponding configuration files. The configuration files mainly consist of two parts: main_config and create_config. Among them, main_config defines the main parameters for running the algorithm, such as environment settings and policy settings, while create_config specifies the specific environment class and policy class to be used and their reference paths.

To run a specific algorithm in a custom environment, you can find the default config file corresponding to different algorithms <algo> for the existing environment <env> under the path zoo/<env>/config/<env>_<algo>_config. Then, based on this, you can mainly modify the part corresponding to env and then perform debugging and optimization.

Below, we use atari_muzero_config.py as an example to explain the configuration file settings in detail.

1. main_config

The main_config dictionary contains the main parameter settings for running the algorithm, which are mainly divided into two parts: env and policy.

1.1 Main Parameters in the env Part

  • env_id: Specifies the environment to be used.

  • obs_shape: The dimension of the environment observation.

  • collector_env_num: The number of parallel environments used to collect data in the experience replay collector.

  • evaluator_env_num: The number of parallel environments used to evaluate policy performance in the evaluator.

  • n_evaluator_episode: The number of episodes run by each environment in the evaluator.

  • manager: Specifies the type of environment manager, mainly used to control the parallelization mode of the environment.

1.2 Main Parameters in the policy Part

  • model: Specifies the neural network model used by the policy, including the input dimension of the model, the number of frame stacking, the action space dimension of the model output, whether the model needs to use downsampling, whether to use self-supervised learning auxiliary loss, the action encoding type, the Normalization mode used in the network, etc.

  • cuda: Specifies whether to migrate the model to the GPU for training.

  • reanalyze_noise: Whether to introduce noise during MCTS reanalysis, which can increase exploration.

  • env_type: Marks the environment type faced by the MuZero algorithm. According to different environment types, the MuZero algorithm will have some differences in detail processing.

  • game_segment_length: The length of the sequence (game segment) used for self-play.

  • random_collect_episode_num: The number of randomly collected episodes, providing initial data for exploration.

  • eps: Exploration control parameters, including whether to use the epsilon-greedy method for control, the update method of control parameters, the starting value, the termination value, the decay rate, etc.

  • use_augmentation: Whether to use data augmentation.

  • update_per_collect: The number of updates after each data collection.

  • batch_size: The batch size sampled during the update.

  • optim_type: Optimizer type.

  • piecewise_decay_lr_scheduler: Whether to use piecewise constant learning rate decay.

  • learning_rate: Initial learning rate.

  • num_simulations: The number of simulations used in the MCTS algorithm.

  • reanalyze_ratio: Reanalysis coefficient, controlling the probability of reanalysis.

  • ssl_loss_weight: The weight of the self-supervised learning loss function.

  • n_episode: The number of episodes run by each environment in the parallel collector.

  • eval_freq: Policy evaluation frequency (measured by training steps).

  • replay_buffer_size: The capacity of the experience replay buffer.

Two frequently changed parameter setting areas are also specially mentioned here, annotated by comments:

# ==============================================================
# begin of the most frequently changed config specified by the user
# ==============================================================
# These are parameters that need to be adjusted frequently based on the actual situation
# ==============================================================
# end of the most frequently changed config specified by the user
# ==============================================================

to remind users that these parameters often need to be adjusted, such as collector_env_num, num_simulations, update_per_collect, batch_size, max_env_step, etc. Adjusting these parameters can optimize the algorithm performance and accelerate the training speed.

2. create_config

The create_config dictionary specifies the specific environment class and policy class to be used and their reference paths, mainly containing two parts: env and policy.

2.1 Settings in the env Part

env=dict(
    type='atari_lightzero',
    import_names=['zoo.atari.envs.atari_lightzero_env'],
),

Here, type specifies the environment name to be used, and env_name specifies the reference path where the environment class is located. The predefined atari_lightzero_env is used here. If you want to use a custom environment class, you need to change type to the custom environment class name and modify the import_names parameter accordingly.

2.2 Settings in the policy Part

policy=dict(
    type='muzero',
    import_names=['lzero.policy.muzero'],
),

Here, type specifies the policy name to be used, and import_names specifies the reference path where the policy class is located. The predefined MuZero algorithm in LightZero is used here. If you want to use a custom policy class, you need to change type to the custom policy class and modify the import_names parameter to the reference path where the custom policy is located.

3. Running the Algorithm

After completing the configuration, call the following in the main function:

if __name__ == "__main__":
    from lzero.entry import train_muzero
    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)

This will run the MuZero algorithm on the configured environment for training. [main_config, create_config] specifies the configuration used for training, seed specifies the random number seed, and max_env_step specifies the maximum number of environment interaction steps.

4. Notes

The above briefly introduces the methods for configuring algorithms for custom environments under the LightZero framework, and hopes to be helpful to you. Please pay attention to the following points during the configuration process:

  • When using a custom environment, be sure to write the environment class according to the environment interface standards defined by the LightZero framework, otherwise errors may occur.

  • Different algorithms and environments require different configuration parameters. Before configuring, you need to thoroughly understand the principles of the algorithm and the characteristics of the environment, and you can refer to relevant academic papers to set parameters reasonably.

  • If you want to run an algorithm supported by LightZero on a custom environment, you can first use the default policy configuration of the algorithm, and then optimize and adjust according to the actual training situation.

  • When configuring the number of parallel environments, the number should be set reasonably according to your computing resources to avoid problems of insufficient memory due to too many parallel environments.

  • You can use tools such as tensorboard to monitor the training situation and solve problems in time. For details, please refer to the Log System Documentation.

  • Wish you a smooth experience using the LightZero framework!