Envs

class lzero.envs.wrappers.lightzero_env_wrapper.LightZeroEnvWrapper(env: Env, cfg: EasyDict)[source]

Bases: Wrapper

Overview:

Package the classic_control, box2d environment into the format required by LightZero. Wrap obs as a dict, containing keys: obs, action_mask and to_play.

Interface:

__init__, reset, step

Properties:
  • env (gym.Env): the environment to wrap.

__init__(env: Env, cfg: EasyDict) None[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties according to running mean and std.

Parameters:

env (-) – the environment to wrap.

_is_protocol = False
property _np_random
property action_space: Space[ActType]

Returns the action space of the environment.

classmethod class_name()

Returns the class name of the wrapper.

close()

Closes the environment.

property metadata: dict

Returns the environment metadata.

property np_random: RandomNumberGenerator

Returns the environment np_random.

property observation_space: Space

Returns the observation space of the environment.

render(*args: Tuple[Any], **kwargs: Dict[str, Any]) RenderFrame | List[RenderFrame] | None
property render_mode: str | None

Returns the environment render_mode.

reset(**kwargs)[source]
Overview:

Resets the state of the environment and reset properties.

Parameters:

kwargs (-) – Reset with this key argumets

Returns:

New observation after reset

Return type:

  • observation (Any)

property reward_range: Tuple[SupportsFloat, SupportsFloat]

Return the reward range of the environment.

seed(seed=None)

Seeds the environment.

property spec

Returns the environment specification.

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward, and update data_count, and also update the self.rms property once after integrating with the input action.

Parameters:

action (-) – the given action to step with.

Returns:

normalized observation after the input action and updated self.rms - reward (Any) : amount of reward returned after previous action - done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results - info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type:

  • self.observation(observation)

property unwrapped: Env

Returns the base environment of the wrapper.

class lzero.envs.wrappers.action_discretization_env_wrapper.ActionDiscretizationEnvWrapper(env: Env, cfg: EasyDict)[source]

Bases: Wrapper

Overview:

The modified environment with manually discretized action space. For each dimension, equally dividing the original continuous action into each_dim_disc_size bins and using their Cartesian product to obtain handcrafted discrete actions.

Interface:

__init__, reset, step

Properties:
  • env (gym.Env): the environment to wrap.

__init__(env: Env, cfg: EasyDict) None[source]
Overview:

Initialize self. See help(type(self)) for accurate signature; setup the properties according to running mean and std.

Parameters:

env (-) – the environment to wrap.

_is_protocol = False
property _np_random
property action_space: Space[ActType]

Returns the action space of the environment.

classmethod class_name()

Returns the class name of the wrapper.

close()

Closes the environment.

property metadata: dict

Returns the environment metadata.

property np_random: RandomNumberGenerator

Returns the environment np_random.

property observation_space: Space

Returns the observation space of the environment.

render(*args: Tuple[Any], **kwargs: Dict[str, Any]) RenderFrame | List[RenderFrame] | None
property render_mode: str | None

Returns the environment render_mode.

reset(**kwargs)[source]
Overview:

Resets the state of the environment and reset properties.

Parameters:

kwargs (-) – Reset with this key argumets

Returns:

New observation after reset

Return type:

  • observation (Any)

property reward_range: Tuple[SupportsFloat, SupportsFloat]

Return the reward range of the environment.

seed(seed=None)

Seeds the environment.

property spec

Returns the environment specification.

step(action)[source]
Overview:

Step the environment with the given action. Repeat action, sum reward, and update data_count, and also update the self.rms property once after integrating with the input action.

Parameters:

action (-) – the given action to step with.

Returns:

normalized observation after the input action and updated self.rms - reward (Any) : amount of reward returned after previous action - done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results - info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type:

  • self.observation(observation)

property unwrapped: Env

Returns the base environment of the wrapper.