Envs

class lzero.envs.wrappers.lightzero_env_wrapper.LightZeroEnvWrapper(env: Env, cfg: EasyDict)[source]

Bases: Wrapper

Overview:

Package the classic_control, box2d environment into the format required by LightZero. Wrap obs as a dict, containing keys: obs, action_mask and to_play.

Interface:

__init__, reset, step

Properties:

env (gym.Env): the environment to wrap.

__init__(env: Env, cfg: EasyDict) → None[source]

Overview:: Initialize self. See help(type(self)) for accurate signature; setup the properties according to running mean and std.

Parameters:: env (-) – the environment to wrap.

_is_protocol = False

property _np_random

property action_space: Space[ActType]: Returns the action space of the environment.

classmethod class_name(): Returns the class name of the wrapper.

close(): Closes the environment.

property metadata: dict: Returns the environment metadata.

property np_random: RandomNumberGenerator: Returns the environment np_random.

property observation_space: Space: Returns the observation space of the environment.

render(*args: Tuple[Any], **kwargs: Dict[str, Any]) → RenderFrame | List[RenderFrame] | None

property render_mode: str | None: Returns the environment render_mode.

reset(**kwargs)[source]

Overview:: Resets the state of the environment and reset properties.

Parameters:

kwargs (-) – Reset with this key argumets

Returns:

New observation after reset

Return type:

observation (Any)

property reward_range: Tuple[SupportsFloat, SupportsFloat]: Return the reward range of the environment.

seed(seed=None): Seeds the environment.

property spec: Returns the environment specification.

step(action)[source]

Overview:: Step the environment with the given action. Repeat action, sum reward, and update data_count, and also update the self.rms property once after integrating with the input action.

Parameters:

action (-) – the given action to step with.

Returns:

normalized observation after the input action and updated self.rms - reward (Any) : amount of reward returned after previous action - done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results - info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type:

self.observation(observation)

property unwrapped: Env: Returns the base environment of the wrapper.

class lzero.envs.wrappers.action_discretization_env_wrapper.ActionDiscretizationEnvWrapper(env: Env, cfg: EasyDict)[source]

Bases: Wrapper

Overview:

The modified environment with manually discretized action space. For each dimension, equally dividing the original continuous action into each_dim_disc_size bins and using their Cartesian product to obtain handcrafted discrete actions.

Interface:

__init__, reset, step

Properties:

env (gym.Env): the environment to wrap.

__init__(env: Env, cfg: EasyDict) → None[source]

Overview:: Initialize self. See help(type(self)) for accurate signature; setup the properties according to running mean and std.

Parameters:: env (-) – the environment to wrap.

_is_protocol = False

property _np_random

property action_space: Space[ActType]: Returns the action space of the environment.

classmethod class_name(): Returns the class name of the wrapper.

close(): Closes the environment.

property metadata: dict: Returns the environment metadata.

property np_random: RandomNumberGenerator: Returns the environment np_random.

property observation_space: Space: Returns the observation space of the environment.

render(*args: Tuple[Any], **kwargs: Dict[str, Any]) → RenderFrame | List[RenderFrame] | None

property render_mode: str | None: Returns the environment render_mode.

reset(**kwargs)[source]

Overview:: Resets the state of the environment and reset properties.

Parameters:

kwargs (-) – Reset with this key argumets

Returns:

New observation after reset

Return type:

observation (Any)

property reward_range: Tuple[SupportsFloat, SupportsFloat]: Return the reward range of the environment.

seed(seed=None): Seeds the environment.

property spec: Returns the environment specification.

step(action)[source]

Overview:: Step the environment with the given action. Repeat action, sum reward, and update data_count, and also update the self.rms property once after integrating with the input action.

Parameters:

action (-) – the given action to step with.

Returns:

normalized observation after the input action and updated self.rms - reward (Any) : amount of reward returned after previous action - done (Bool) : whether the episode has ended, in which case further step() calls will return undefined results - info (Dict) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type:

self.observation(observation)

property unwrapped: Env: Returns the base environment of the wrapper.