Gym-Super-Mario-Bros¶

Overview¶

Here is the “Super Mario Bros” series of games, in which players need to control Mario to move and jump, avoid the pits and enemies in the process of leading to the end, gain more gold coins to get higher scores. This game also has many interesting props to enhance player experiences. gym-super-mario-bros , this environment is encapsulated from “Super Mario Bros” of Nintendo after OpenAI Gym. Here is the screeshot of game:

Installation¶

Installation Method¶

pip install gym-super-mario-bros

Verify Installation¶

Run the following Python program, and if no errors are reported, the installation is successful.

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    env.render()

env.close()

Solution for Installation Failure¶

A common error:

Traceback (most recent call last):
File "test_mario.py", line 13, in <module>
    state, reward, done, info = env.step(env.action_space.sample())
File "/Users/wangzilin/opt/anaconda3/envs/mario_test/lib/python3.8/site-packages/nes_py/wrappers/joypad_space.py", line 74, in step
    return self.env.step(self._action_map[action])
File "/Users/wangzilin/opt/anaconda3/envs/mario_test/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 50, in step
    observation, reward, terminated, truncated, info = self.env.step(action)
ValueError: not enough values to unpack (expected 5, got 4)

Due to the updates of gym-super-mario-bros code base cannot keep up with the updates of gym code base sometimes, while executing pip install gym-super-mario-bros, the latest gym would be installed by default.The solution is to downgrade gym. Here gym-super-mario-bros version is 7.4.0, gym version is 0.26.2. We may choose to downgrade gym version to 0.25.1 to solve problems.

pip install gym==0.25.1

Environment Introduction¶

Game Rule¶

The simulator has two built-in games, Super Mario Bros. and Super Mario Bros.2 . For detailed gameplay and rules, please refer to the wikipedia link at the end of the text. For Super Mario Bros, in addition to the 32 level, the game also offers the option to play any individual level with one life, one random level (not currently supported in Super Mario Bros 2.).

# Super Mario Bros. 3 lifes from 1-1 to 8-4
env = gym_super_mario_bros.make('SuperMarioBros-v0')
# Super Mario Bros 2. 3 lifes from 1-1 to 8-4
env = gym_super_mario_bros.make('SuperMarioBros2-v0')
# 1 life 3-2
env = gym_super_mario_bros.make('SuperMarioBros-3-2-v0')
# 1 life Random level 1-4 2-4 3-4 4-4 (Game end after death,environment would choose another level to begin a new game randomly.)
env = gym.make('SuperMarioBrosRandomStages-v0', stages=['1-4', '2-4', '3-4', '4-4'])

Keyboard Interaction¶

When you have a display device for rendering, you can try to operate with the keyboard. The environment provides a command line interface, which starts as follows:

# Start 1-4 level
gym_super_mario_bros -e 'SuperMarioBrosRandomStages-v0' -m 'human' --stages '1-4'

Action Space¶

The action space of gym-super-mario-bros contains the whole 256 discrete actions from Nintendo. To compress this size (and to facilitate learning by the intelligences), the environment provides the action wrapper JoypadSpace by default to reduce the action dimension: the optional set of actions and their meanings are as follows:

# actions for the simple run right environment
RIGHT_ONLY = [
    ['NOOP'],
    ['right'],
    ['right', 'A'],
    ['right', 'B'],
    ['right', 'A', 'B'],
]


# actions for very simple movement
SIMPLE_MOVEMENT = [
    ['NOOP'],
    ['right'],
    ['right', 'A'],
    ['right', 'B'],
    ['right', 'A', 'B'],
    ['A'],
    ['left'],
]


# actions for more complex movement
COMPLEX_MOVEMENT = [
    ['NOOP'],
    ['right'],
    ['right', 'A'],
    ['right', 'B'],
    ['right', 'A', 'B'],
    ['A'],
    ['left'],
    ['left', 'A'],
    ['left', 'B'],
    ['left', 'A', 'B'],
    ['down'],
    ['up'],
]

for instance:

env = gym_super_mario_bros.make('SuperMarioBros-v0')
# use SIMPLE_MOVEMENT
env = JoypadSpace(env, SIMPLE_MOVEMENT)

# or set your own action space to choose actions like jump to the left and to the right side
env = JoypadSpace(env, [["right"], ["right", "A"]])

For the 7-dimensional discrete action space represented by SIMPLE_MOVEMENT, the definition using the gym environment space can be expressed as:

action_space = gym.spaces.Discrete(7)

State Space¶

The state space input to gym-super-mario-bros is the image information, and the tensor matrix in three dimensions (datatype=uint8). In addition, the different versions of the game correspond to the same image resolution 240*256*3, but the higher the version, the more abbreviated the image is (pixel blocking), as follows:

>>> # View observation space
>>> gym_super_mario_bros.make('SuperMarioBros-v3').observation_space
Box([[[0 0 0]
[0 0 0]
[0 0 0]
...
[0 0 0]
[0 0 0]
[0 0 0]]], [[[255 255 255]
[255 255 255]
[255 255 255]
...
[255 255 255]
[255 255 255]
[255 255 255]]], (240, 256, 3), uint8)

The corresponding game screenshots of v3 are as follows:

Reward Space¶

We hope Mario could more likely to move to the right side , and move faster to the end successfully, the setting of the reward for each frame consists of three parts as follows:

v:represents the difference in Mario’s x-coordinate (which can be interpreted as the velocity to the right) between two consecutive frames, with positive and negative.
c:represents the time used per frame, simply understood as a negative REVERSE for each frame, is used to push the intelligence to reach the end faster.
d:represents penalty for death, giving a high penalty of -15 if Mario dies.

Total reward r = v + c + d

Reward is being clipped to (-15,15)

Termination Conditions¶

For gym-super-mario-bros ,the termination condition for each episode of the environment is that any of the following conditions are encountered.

Mario wins
Mario is dead
Countdown ends

Additional information contained in info¶

At each step of interaction with the environment , the environment returns the info dictionary, which contains information about the coins acquired, the current accumulated score, the time remaining, and Mario’s current coordinates. The details are as follows:

More Information¶
Key	Type	Description
coins	int	The number of collected coins
flag_get	bool	True if Mario reached a flag or ax
life	int	The number of lives left, i.e., {3, 2, 1}
score	int	The cumulative in-game score
stage	int	The current stage, i.e., {1, …, 4}
status	str	Mario’s status, i.e., {‘small’, ‘tall’, ‘fireball’}
time	int	The time left on the clock
world	int	The current world, i.e., {1, …, 8}
x_pos	int	Mario’s x position in the stage (from the left)
y_pos	int	Mario’s y position in the stage (from the bottom)

Built-in Environment¶

There are several built-in environments, including "SuperMarioBros-v0"、 "SuperMarioBros-v1"、 "SuperMarioBros-v2"、"SuperMarioBros-v3" for Super Mario Bros. And “"SuperMarioBros2-v0"、 "SuperMarioBros2-v1" for Super Mario Bros. 2. In addition, Super Mario Bros. also allows you to select specific levels to break into, such as "SuperMarioBros-1-1-v0" .

Video Store¶

gym.wrappers.RecordVideo class is used to store video:

import gym
import time
from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT

video_dir_path = 'mario_videos'
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)
env = gym.wrappers.RecordVideo(
    env,
    video_folder=video_dir_path,
    episode_trigger=lambda episode_id: True,
    name_prefix='mario-video-{}'.format(time.ctime())
)

# run 1 episode
env.reset()
while True:
    state, reward, done, info = env.step(env.action_space.sample())
    if done or info['time'] < 250:
        break
print("Your mario video is saved in {}".format(video_dir_path))
try:
    # There is a problem with the destructor of the environment, so an exception is needed to avoid error reporting
    del env
except Exception:
    pass

DI-zoo Runnable Code Example¶

Offers a complete gym-super-mario-bros environment config, use DQN as baseline. Please run mario_dqn_main.py doc under DI-engine/dizoo/mario catalogue.

from easydict import EasyDict

mario_dqn_config = dict(
    exp_name='mario_dqn_seed0',
    env=dict(
        collector_env_num=8,
        evaluator_env_num=8,
        n_evaluator_episode=8,
        stop_value=100000,
        replay_path='mario_dqn_seed0/video',
    ),
    policy=dict(
        cuda=True,
        model=dict(
            obs_shape=[4, 84, 84],
            action_shape=2,
            encoder_hidden_size_list=[128, 128, 256],
            dueling=True,
        ),
        nstep=3,
        discount_factor=0.99,
        learn=dict(
            update_per_collect=10,
            batch_size=32,
            learning_rate=0.0001,
            target_update_freq=500,
        ),
        collect=dict(n_sample=96, ),
        eval=dict(evaluator=dict(eval_freq=2000, )),
        other=dict(
            eps=dict(
                type='exp',
                start=1.,
                end=0.05,
                decay=250000,
            ),
            replay_buffer=dict(replay_buffer_size=100000, ),
        ),
    ),
)
mario_dqn_config = EasyDict(mario_dqn_config)
main_config = mario_dqn_config
mario_dqn_create_config = dict(
    env_manager=dict(type='subprocess'),
    policy=dict(type='dqn'),
)
mario_dqn_create_config = EasyDict(mario_dqn_create_config)
create_config = mario_dqn_create_config
# you can run `python3 -u mario_dqn_main.py`

Benchmark Algorithm Performance¶

SuperMarioBros-x-x-v0
- SuperMarioBros-1-1-v0 + DQN
- SuperMarioBros-1-2-v0 + DQN
- SuperMarioBros-1-3-v0 + DQN

References¶

gym-super-mario-bros source code
Super Mario Bros. wikipedia-Super Mario Bros.
Super Mario Bros 2. wikipedia-Super Mario Bros 2.