Learn From DI-zoo =============================== What is DI-zoo ------------------------------- DI-zoo is a collection of reinforcement learning environments wrapped with DI-engine. It covers the vast majority of reinforcement learning environments, including basic environments like `OpenAI Gym `_, as well as more complex environments such as `SMAC `_. Besides, for each environment, DI-zoo provides the entries of different algorithms with their optimal configurations. The structure of DI-zoo ------------------------------- For a certain environment/policy pair, in order to run RL experiment in DI-engine, DI-zoo mainly provides two files: the ``config.py`` file, including the key configuration required as well as the entry point to run the RL experiment; the ``env.py`` file, containing the encapsulation of the environment to run in DI-engine. .. note:: Besides, some environment/policy pairs also possess a ``main.py`` entry file, which is the training pipeline left over from the previous version. Here we briefly show the structure of DI-zoo based on the CartPole environment and DQN algorithm. .. code-block:: dizoo/ classic_control/ cartpole/ config/cartpole_dqn_config.py # Config entry/cartpole_dqn_main.py # Main envs/cartpole_env.py # Env How to use DI-zoo ------------------------------- You can directly execute the ``config.py`` file provided by DI-zoo to train a certain environment/policy pair. For CartPole/DQN, you can easily perform the RL experiment with the following code: .. code-block:: bash python dizoo/classic_control/cartpole/config/cartpole_dqn_config.py DI-engine also provides the CLI tool for users, you can type the following command in your terminal: .. code-block:: bash ding -v If the terminal returns the correct information, you can use this CLI tool for the common training and evaluation, and you can type ``ding -h`` for further usage。 To train CartPole/DQN, you can directly type the following command in the terminal: .. code-block:: bash ding -m serial -c cartpole_dqn_config.py -s 0 where ``-m serial`` means that the training pipeline you call is ``serial_pipeline``. ``-c cartpole_dqn_config.py`` means that the ``config`` file you use is ``cartpole_dqn_config.py``. ``-s 0`` means ``seed`` is 0. Customization of DI-zoo ------------------------------- You can customize your training process or tune the performance of your RL experiment by changing the configuration in ``config.py``. Here we use ``cartpole_dqn_config.py`` as an example: .. code-block:: python from easydict import EasyDict cartpole_dqn_config = dict( exp_name='cartpole_dqn_seed0', env=dict( collector_env_num=8, evaluator_env_num=5, n_evaluator_episode=5, stop_value=195, replay_path='cartpole_dqn_seed0/video', ), policy=dict( cuda=False, load_path='cartpole_dqn_seed0/ckpt/ckpt_best.pth.tar', # necessary for eval model=dict( obs_shape=4, action_shape=2, encoder_hidden_size_list=[128, 128, 64], dueling=True, ), nstep=1, discount_factor=0.97, learn=dict( batch_size=64, learning_rate=0.001, ), collect=dict(n_sample=8), eval=dict(evaluator=dict(eval_freq=40, )), other=dict( eps=dict( type='exp', start=0.95, end=0.1, decay=10000, ), replay_buffer=dict(replay_buffer_size=20000, ), ), ), ) cartpole_dqn_config = EasyDict(cartpole_dqn_config) main_config = cartpole_dqn_config cartpole_dqn_create_config = dict( env=dict( type='cartpole', import_names=['dizoo.classic_control.cartpole.envs.cartpole_env'], ), env_manager=dict(type='base'), policy=dict(type='dqn'), replay_buffer=dict( type='deque', import_names=['ding.data.buffer.deque_buffer_wrapper'] ), ) cartpole_dqn_create_config = EasyDict(cartpole_dqn_create_config) create_config = cartpole_dqn_create_config if __name__ == "__main__": # or you can enter `ding -m serial -c cartpole_dqn_config.py -s 0` from ding.entry import serial_pipeline serial_pipeline((main_config, create_config), seed=0) The two dictionary objects ``cartpole_dqn_config`` and ``cartpole_dqn_create_config`` contain the key configurations required for CartPole/DQN training. You can change the behavior of your training pipeline by changing the configuration here. For example, by changing ``cartpole_dqn_config.policy.cuda`` , you can choose whether to use your cuda device to run the entire training process. If you want to use other training pipelines provided by DI-engine, or use your own customized training pipelines, you only need to change the ``__main__`` function of ``config.py`` that calls the training pipeline. For example, you can change the ``serial_pipeline`` in the example to ``parallel_pipeline`` to call the parallel training pipeline. For the CLI tool ``ding``, you can also change the previous cli command to .. code-block:: bash ding -m parallel -c cartpole_dqn_config.py -s 0 to call ``parallel_pipeline``. .. note :: To customize the training pipeline, you can refer to `serial_pipeline `_ , or refer to `DQN example `_, using the the `middleware <../03_system/middleware.html>`_ provided by DI-engine to build the pipeline. If you want to use your own environment in DI-engine, you can just inherit ``BaseEnv`` implemented by DI-engine. For this part you can refer to `How to migrate your environment to DI-engine <../04_best_practice/ding_env.html>`_ List of algorithms and environments supported by DI-zoo --------------------------------------------------------- `The algorithm documentation of DI-engine <../12_policies/index.html>`_ `The environment documentation of DI-engine <../13_envs/index.html>`_ `List of supported algorithms `_ `List of supported environments `_