Reinforcement Learning Environments ######################################## .. toctree:: :maxdepth: 2 SUMO Environment ==================== Configuration ----------------- The configuration of sumo env is stored in a config ``.yaml`` file. You can look at the default config file to see how to modify env settings. .. code:: python import yaml from easy_dict import EasyDict from smartcross.env import SumoEnv with open('smartcross/envs/sumo_wj3_default_config.yaml') as f: cfg = yaml.safe_load(f) cfg = EasyDict(cfg) env = SumoEnv(config=cfg.env) The env configuration consists of basic definition and observation\\action\\reward settings. The basic definition includes the cumo config file, episode length and light duration. The obs-action-reward define the detail setting of each contains. .. code:: yaml env: sumocfg_path: 'wj3/rl_wj.sumocfg' max_episode_steps: 1500 green_duration: 10 yellow_duration: 3 obs: ... action: ... reward: ... Observation ---------------- We provide several types of observations of a traffic cross. If `use_centrolized_obs` is set to `True`, the observation of each cross will be concatenated into one vector. The contents of the observation can be modified by setting `obs_type`. The following observation is supported now. - phase: One-hot phase vector of current cross signal - lane_pos_vec: Lane occupancy in each grid position. The grid num can be set with `lane_grid_num` - traffic_volume: Traffic volume of each lane. Vehicle num / lane length * volume ratio - queue_len: Vehicle waiting queue length of each lane. Waiting num / lane length * volume ratio Action ------------- Sumo environment supports changing cross signal to target phase. The action space is set to multi-discrete for each cross to reduce action num. Reward ------------- The reward can be set with `reward_type`. Reward of each cross is calculated separately. If `use_centrolized_obs` is set True, the reward of each cross will be summed up. - queue_len: Vehicle waiting queue num of each lane - wait_time: Wait time increment of vehicles in each lane - delay_time: Delay time of all vehicles in incomming and outgoing lanes - pressure: Pressure of a cross Multi-agent --------------- **DI-smartcross** supports a one-step configurable multi-agent RL training. It is only necessary to add ``multi_agent`` in **DI-engine** config file to convert common PPO into MAPPO, and change the ``use_centrolized_obs`` in environment config into ``True``. The policy and observations can be automatically changed to run individual agent for each cross. CityFlow Environment ============================= Configuration ----------------- CityFlow simulator has its own config `json` file, with roadnet file, flow file and replay file defined in it. DI-smartcross adds some extra configs together with CityFlow's config file path in DI-engine's env config. .. code:: python main_config = dict( env=dict( obs_type=['phase', 'lane_vehicle_num', 'lane_waiting_vehicle_num'], max_episode_duration=1000, green_duration=30, yellow_duration=5, red_duration=0, ... ), ... ) Observation ---------------- We provide several types of observations of each cross. - phase: One-hot phase vector of current cross signal - lane_vehicle_num: vehicle nums of each incoming lane - lane_waiting_vehicle_num: waiting vehicle nums of each incoming lane Action ------------- CityFlow environment supports changing cross signal to target phase. The action space is set to multi-discrete for each cross to reduce action num. Reward ------------- CityFlow environment uses pressure of each cross as reward Roadnets ============== .. toctree:: :maxdepth: 1 envs/wj3_env envs/rl_arterial7_env envs/cf_grid_env .. `Beijing Wangjing 3 Crossings <./envs/wj3_env.html>`_ .. ----------------------------------------------------------------- .. `RL Arterial 7 Crossings <./envs/rl_arterial7_env.html>`_ .. ----------------------------------------------------------------- .. `CityFlow Grid Env <./envs/cf_grid_env.html>`_ .. -----------------------------------------------------------------