Simple Reinforcement Learning ############################## .. toctree:: :maxdepth: 2 **DI-drive** + **DI-engine** make RL for Autonomous Driving very easy. Here we show how to use **DI-drive** to run a simple Reinforcement Learning driving policy with Carla and MetaDrive separately. Prerequisites ===================== Ubuntu 16.04 system + Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz + 32G memory + GPU1060 Carla RL Tutorial ======================= DI-drive support several RL policies and provide a simple RL env running with Carla server. The policy takes a small Bird-eye View image together with current speed as input, and directly outputs control signals. RL training using DI-engine -------------------------------------- We build simple RL demos that can run varies RL algorithm with the aforementioned simple environment setting. All the code can be found in ``demo/simple_rl``, including training, evaluating and testing. Here we show how to run the DQN demo. It follows the standard deployment of a DI-engine RL entry. Other RL policies run in same way. .. code:: bash cd demo/simple_rl python simple_rl_train.py -p dqn The config part defines the env and policy settings. Notes that you need to change the Carla server host and port, and modify the environment nums according to yours. By default it uses 8 Carla servers on `localhost` with port from 9000 to 9016. .. code:: python train_config = dict( exp_name=..., env=dict( ... ), server=[ dict(carla_host='localhost', carla_ports=[9000, 9016, 2]), ], policy=dict( ... ), ) For more details about how to tune parameters in DQN, you can see **DI-engine**'s doc. Usually you may concern about the replay buffer size and sample num per collection. When you see the information in terminal that contains the content in the following picture, it means that you are beginning to train the model. .. figure:: ../../figs/rl_tutorial_log.png :alt: rl_tutorial_log :align: center :width: 1000px In the process of training, you can use the tensorboard as a monitor, the default log path is in your working directory. .. code:: bash tensorboard --logdir='./log' After running for about 24 hours, you will get: .. figure:: ../../figs/rl_tutorial_tb.png :alt: rl_tutorial_tb :align: center :width: 800px Evaluate and test the trained model --------------------------------------- After training, you can evaluate the trained model on a benchmark suite. Simply run the following code. .. code:: bash python simple_rl_eval.py -p dqn -c PATH_TO_YOUR_CKPT You may need to change Carla server numbers and settints, change the suite you want to evaluate, and add your pre-trained weights in policy's config. .. code:: python eval_config = dict( env=dict( env_num=5, ... ), server=[dict( carla_host='localhost', carla_ports=[9000, 9010, 2] )], policy=dict( cuda=True, ckpt_path='path/to/your/model', eval=dict( evaluator=dict( suite='FullTown02-v1', ... ), ), ... ), ... ) The default DQN policy can have nice probability to complete navigation in `FullTown02-v2`, with traffic lights ignored. Also, you can test the policy in a town route with a visualized screen. Simply run the following code. .. code:: bash python simple_rl_test.py -p dqn -c PATH_TO_YOUR_CKPT You may need to change Carla server settints, switch on/off visualization or save a replay gif/video and add your pre-trained weights in policy's config. .. code:: python test_config = dict( env=dict( ... visualize=dict( type='birdview', outputs=['show'], # or 'gif', 'video' save_dir='', frame_skip=3, # avoid to be too large ), ), server=[dict( carla_host='localhost', carla_ports=[9000, 9002, 2] )], policy=dict( cuda=True, ckpt_path='path/to/your/model', eval=dict( evaluator=dict( render=True, ... ), ), ... ), ) MetaDrive RL Tutorial =========================== DI-drive provide a simple entry that can run MetaDrive default environments with DI-engine policies. The training entry can be found in `demo/metadrive` .. code:: bash cd demo/metadrive python basic_env_train.py MetaDrive has standard `gym` environments. Adding an ``EnvWrapper`` together with other components and pipeline in DI-engine, RL experiments is able to run. Here, we use "MetaDrive-1000envs-v0" to train and "MetaDrive-validation-v0" to evaluate an on-policy PPO policy. You can modify the env num in config to suit your device. .. code:: python metadrive_basic_config = dict( env=dict( ... collector_env_num=4, evaluator_env_num=1, ), ... )