LightZero’s Logging and Monitoring System
LightZero is a powerful MCTS and reinforcement learning framework that generates comprehensive log files and model checkpoints during the training process. In this article, we will take an in-depth look at LightZero’s logging and monitoring system, focusing on the file directory structure after running the framework and the contents of each log file.
File Directory Structure
When we conduct an experiment using LightZero, such as training a MuZero agent in the CartPole environment, the framework organizes the output files as follows:
cartpole_muzero
├── ckpt
│ ├── ckpt_best.pth.tar
│ ├── iteration_0.pth.tar
│ └── iteration_10000.pth.tar
├── log
│ ├── buffer
│ │ └── buffer_logger.txt
│ ├── collector
│ │ └── collector_logger.txt
│ ├── evaluator
│ │ └── evaluator_logger.txt
│ ├── learner
│ │ └── learner_logger.txt
│ └── serial
│ └── events.out.tfevents.1626453528.CN0014009700M.local
├── formatted_total_config.py
└── total_config.py
As we can see, the main body of the output files consists of two folders: log
and ckpt
, which store detailed log information and model checkpoints, respectively. The total_config.py
and formatted_total_config.py
files record the configuration information for this experiment. For more details on their specific meanings, please refer to the Configuration System Documentation.
Log File Analysis
Collector Logs
The log/collector/collector_logger.txt
file records various metrics of the collector’s interaction with the environment during the current collection stage, including:
episode_count
: The number of episodes collected in this stageenvstep_count
: The number of environment interaction steps collected in this stagetrain_sample_count
: The number of training samples collected in this stageavg_envstep_per_episode
: The average number of environment interaction steps per episodeavg_sample_per_episode
: The average number of samples per episodeavg_envstep_per_sec
: The average number of environment interaction steps collected per secondavg_train_sample_per_sec
: The average number of training samples collected per secondavg_episode_per_sec
: The average number of episodes collected per secondcollect_time
: The total time spent on data collection in this stagereward_mean
: The average reward obtained during the collection process in this stagereward_std
: The standard deviation of rewards collected in this stageeach_reward
: The reward value of each episode’s interaction with the environmentreward_max
: The maximum single reward collected in this stagereward_min
: The minimum single reward collected in this stagetotal_envstep_count
: The cumulative total number of environment interaction steps collected by the collectortotal_train_sample_count
: The cumulative total number of samples collected by the collectortotal_episode_count
: The cumulative total number of episodes collected by the collectortotal_duration
: The total running time of the collector
Evaluator Logs
The log/evaluator/evaluator_logger.txt
file records various metrics of the evaluator’s interaction with the environment during the current evaluation stage, including:
[INFO]
: Log prompts for each completed episode by the evaluator, including the final reward and current episode counttrain_iter
: The number of completed training iterations of the modelckpt_name
: The path of the model checkpoint used in this evaluationepisode_count
: The number of episodes in this evaluationenvstep_count
: The total number of environment interaction steps in this evaluationevaluate_time
: The total time spent on this evaluationavg_envstep_per_episode
: The average number of environment interaction steps per evaluation episodeavg_envstep_per_sec
: The average number of environment interaction steps per second in this evaluationavg_time_per_episode
: The average time per episode in this evaluationreward_mean
: The average reward obtained in this evaluationreward_std
: The standard deviation of rewards in this evaluationeach_reward
: The reward value of each episode’s interaction with the environment by the evaluatorreward_max
: The maximum reward obtained in this evaluationreward_min
: The minimum reward obtained in this evaluation
Learner Logs
The log/learner/learner_logger.txt
file records various information about the learner during the model training process, including:
Neural network structure: Describes the overall architecture of the MuZero model, including the representation network, dynamics network, prediction network, etc.
Learner status: Displays the current learning rate, loss function values, optimizer monitoring metrics, etc., in a tabular format
Tensorboard Log Files
To facilitate experiment management, LightZero saves all scattered log files in the log/serial
folder as a single Tensorboard log file, named in the format events.out.tfevents.<timestamp>.<hostname>
. Through Tensorboard, users can monitor the trends of various metrics during the training process in real-time.
Checkpoint Files
The ckpt
folder stores the checkpoint files of the model parameters:
ckpt_best.pth.tar
: The model parameters that achieved the best performance during evaluationiteration_<iteration_number>.pth.tar
: The model parameters periodically saved during the training process
If you need to load the saved model, you can use methods like torch.load('ckpt_best.pth.tar')
to read them.
Conclusion
LightZero provides users with a comprehensive logging and monitoring system, helping researchers and developers gain deep insights into the entire training process of reinforcement learning agents. By analyzing the metrics of the collector, evaluator, and learner, we can grasp the progress and effectiveness of the algorithm in real-time and optimize the training strategy accordingly. At the same time, the standardized organization of checkpoint files ensures the reproducibility of experiments. LightZero’s well-developed logging and monitoring system will undoubtedly become a powerful assistant for users in algorithm research and practical applications.