Reinforcement Learning Environments¶
SUMO Environment¶
Configuration¶
The configuration of sumo env is stored in a config .yaml
file. You can look at the default config file to see how to modify env settings.
import yaml
from easy_dict import EasyDict
from smartcross.env import SumoEnv
with open('smartcross/envs/sumo_wj3_default_config.yaml') as f:
cfg = yaml.safe_load(f)
cfg = EasyDict(cfg)
env = SumoEnv(config=cfg.env)
The env configuration consists of basic definition and observation\action\reward settings. The basic definition includes the cumo config file, episode length and light duration. The obs-action-reward define the detail setting of each contains.
env:
sumocfg_path: 'wj3/rl_wj.sumocfg'
max_episode_steps: 1500
green_duration: 10
yellow_duration: 3
obs:
...
action:
...
reward:
...
Observation¶
We provide several types of observations of a traffic cross. If use_centrolized_obs is set to True, the observation of each cross will be concatenated into one vector. The contents of the observation can be modified by setting obs_type. The following observation is supported now.
phase: One-hot phase vector of current cross signal
lane_pos_vec: Lane occupancy in each grid position. The grid num can be set with lane_grid_num
traffic_volume: Traffic volume of each lane. Vehicle num / lane length * volume ratio
queue_len: Vehicle waiting queue length of each lane. Waiting num / lane length * volume ratio
Action¶
Sumo environment supports changing cross signal to target phase. The action space is set to multi-discrete for each cross to reduce action num.
Reward¶
The reward can be set with reward_type. Reward of each cross is calculated separately. If use_centrolized_obs is set True, the reward of each cross will be summed up.
queue_len: Vehicle waiting queue num of each lane
wait_time: Wait time increment of vehicles in each lane
delay_time: Delay time of all vehicles in incomming and outgoing lanes
pressure: Pressure of a cross
Multi-agent¶
DI-smartcross supports a one-step configurable multi-agent RL training.
It is only necessary to add multi_agent
in DI-engine config file to convert common PPO into MAPPO,
and change the use_centrolized_obs
in environment config into True
. The policy and observations can
be automatically changed to run individual agent for each cross.
CityFlow Environment¶
Configuration¶
CityFlow simulator has its own config json file, with roadnet file, flow file and replay file defined in it. DI-smartcross adds some extra configs together with CityFlow’s config file path in DI-engine’s env config.
main_config = dict(
env=dict(
obs_type=['phase', 'lane_vehicle_num', 'lane_waiting_vehicle_num'],
max_episode_duration=1000,
green_duration=30,
yellow_duration=5,
red_duration=0,
...
),
...
)
Observation¶
We provide several types of observations of each cross.
phase: One-hot phase vector of current cross signal
lane_vehicle_num: vehicle nums of each incoming lane
lane_waiting_vehicle_num: waiting vehicle nums of each incoming lane
Action¶
CityFlow environment supports changing cross signal to target phase. The action space is set to multi-discrete for each cross to reduce action num.
Reward¶
CityFlow environment uses pressure of each cross as reward