LightZero’s Logging and Monitoring System
LightZero is a powerful MCTS and reinforcement learning framework that generates comprehensive log files and model checkpoints during the training process. In this article, we will take an in-depth look at LightZero’s logging and monitoring system, focusing on the file directory structure after running the framework and the contents of each log file.
File Directory Structure
When we conduct an experiment using LightZero, such as training a MuZero agent in the CartPole environment, the framework organizes the output files as follows:
cartpole_muzero
├── ckpt
│ ├── ckpt_best.pth.tar
│ ├── iteration_0.pth.tar
│ └── iteration_10000.pth.tar
├── log
│ ├── buffer
│ │ └── buffer_logger.txt
│ ├── collector
│ │ └── collector_logger.txt
│ ├── evaluator
│ │ └── evaluator_logger.txt
│ ├── learner
│ │ └── learner_logger.txt
│ └── serial
│ └── events.out.tfevents.1626453528.CN0014009700M.local
├── formatted_total_config.py
└── total_config.py
As we can see, the main body of the output files consists of two folders: log
and ckpt
, which store detailed log information and model checkpoints, respectively. The total_config.py
and formatted_total_config.py
files record the configuration information for this experiment. For more details on their specific meanings, please refer to the Configuration System Documentation.
Log File Analysis
Collector Logs
The log/collector/collector_logger.txt
file records various metrics of the collector’s interaction with the environment during the current collection stage, including:
episode_count
: The number of episodes collected in this stageenvstep_count
: The number of environment interaction steps collected in this stageavg_envstep_per_episode
: The average number of environment interaction steps per episodeavg_envstep_per_sec
: The average number of environment interaction steps collected per secondavg_episode_per_sec
: The average number of episodes collected per secondcollect_time
: The total time spent on data collection in this stagereward_mean
: The average reward obtained during the collection process in this stagereward_std
: The standard deviation of rewards collected in this stagereward_max
: The maximum single reward collected in this stagereward_min
: The minimum single reward collected in this stagetotal_envstep_count
: The cumulative total number of environment interaction steps collected by the collectortotal_episode_count
: The cumulative total number of episodes collected by the collectortotal_duration
: The total running time of the collectorvisit_entropy
: The entropy of the visit distribution at the root node in MCTS, measuring the uniformity of node visits
Evaluator Logs
The log/evaluator/evaluator_logger.txt
file records various metrics of the evaluator’s interaction with the environment during the current evaluation stage, including:
[INFO]
: Log prompts for each completed episode by the evaluator, including the final reward and current episode counttrain_iter
: The number of completed training iterations of the modelckpt_name
: The path of the model checkpoint used in this evaluationepisode_count
: The number of episodes in this evaluationenvstep_count
: The total number of environment interaction steps in this evaluationevaluate_time
: The total time spent on this evaluationavg_envstep_per_episode
: The average number of environment interaction steps per evaluation episodeavg_envstep_per_sec
: The average number of environment interaction steps per second in this evaluationavg_time_per_episode
: The average time per episode in this evaluationreward_mean
: The average reward obtained in this evaluationreward_std
: The standard deviation of rewards in this evaluationeval_episode_return
: The reward value of each episode’s interaction with the environment by the evaluatorreward_max
: The maximum reward obtained in this evaluationreward_min
: The minimum reward obtained in this evaluationeval_episode_return_mean
: The average reward obtained in this evaluation
Learner Logs
The log/learner/learner_logger.txt
file records various information about the learner during the model training process, including:
Neural network structure: Describes the overall architecture of the MuZero model, including the representation network, dynamics network, prediction network, etc.
Learner status: Displays the current learning rate, loss function values, optimizer monitoring metrics, etc., in a tabular format:
analysis/dormant_ratio_encoder_avg
: The average dormant ratio in the encoder, indicating inactive neuronsanalysis/dormant_ratio_dynamics_avg
: The average dormant ratio in the dynamics modelanalysis/latent_state_l2_norms_avg
: The average L2 norm of the latent statecollect_mcts_temperature_avg
: The average temperature parameter of MCTS during data collection, affecting explorationcur_lr_avg
: The current learning rateweighted_total_loss_avg
: The weighted average total losstotal_loss_avg
: The average total losspolicy_loss_avg
: The average policy losspolicy_entropy_avg
: The average policy entropytarget_policy_entropy_avg
: The average target policy entropyreward_loss_avg
: The average reward lossvalue_loss_avg
: The average value lossconsistency_loss_avg
: The average consistency lossvalue_priority_avg
: The average priority based on value in experience replaytarget_reward_avg
: The average target rewardtarget_value_avg
: The average target valuepredicted_rewards_avg
: The average predicted rewardspredicted_values_avg
: The average predicted valuestransformed_target_reward_avg
: The transformed average target rewardtransformed_target_value_avg
: The transformed average target valuetotal_grad_norm_before_clip_avg
: The total gradient norm before clipping
Tensorboard Log Files
To facilitate experiment management, LightZero saves all scattered log files in the log/serial
folder as a single Tensorboard log file, named in the format events.out.tfevents.<timestamp>.<hostname>
. Through Tensorboard, users can monitor the trends of various metrics during the training process in real-time.
Checkpoint Files
The ckpt
folder stores the checkpoint files of the model parameters:
ckpt_best.pth.tar
: The model parameters that achieved the best performance during evaluationiteration_<iteration_number>.pth.tar
: The model parameters periodically saved during the training process
If you need to load the saved model, you can use methods like torch.load('ckpt_best.pth.tar')
to read them.
Conclusion
LightZero provides users with a comprehensive logging and monitoring system, helping researchers and developers gain deep insights into the entire training process of reinforcement learning agents. By analyzing the metrics of the collector, evaluator, and learner, we can grasp the progress and effectiveness of the algorithm in real-time and optimize the training strategy accordingly. At the same time, the standardized organization of checkpoint files ensures the reproducibility of experiments. LightZero’s well-developed logging and monitoring system will undoubtedly become a powerful assistant for users in algorithm research and practical applications.