Envs
- class lzero.envs.wrappers.lightzero_env_wrapper.LightZeroEnvWrapper(env: Env, cfg: EasyDict)[source]
Bases:
Wrapper
- Overview:
Package the classic_control, box2d environment into the format required by LightZero. Wrap obs as a dict, containing keys: obs, action_mask and to_play.
- Interface:
__init__
,reset
,step
- Properties:
env (
gym.Env
): the environment to wrap.
- __init__(env: Env, cfg: EasyDict) None [source]
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties according to running mean and std.
- Parameters:
env (-) – the environment to wrap.
- _is_protocol = False
- property _np_random
- property action_space: Space[ActType]
Returns the action space of the environment.
- classmethod class_name()
Returns the class name of the wrapper.
- close()
Closes the environment.
- property metadata: dict
Returns the environment metadata.
- property np_random: RandomNumberGenerator
Returns the environment np_random.
- property observation_space: Space
Returns the observation space of the environment.
- render(*args: Tuple[Any], **kwargs: Dict[str, Any]) RenderFrame | List[RenderFrame] | None
- property render_mode: str | None
Returns the environment render_mode.
- reset(**kwargs)[source]
- Overview:
Resets the state of the environment and reset properties.
- Parameters:
kwargs (-) – Reset with this key argumets
- Returns:
New observation after reset
- Return type:
observation (
Any
)
- property reward_range: Tuple[SupportsFloat, SupportsFloat]
Return the reward range of the environment.
- seed(seed=None)
Seeds the environment.
- property spec
Returns the environment specification.
- step(action)[source]
- Overview:
Step the environment with the given action. Repeat action, sum reward, and update
data_count
, and also update theself.rms
property once after integrating with the inputaction
.
- Parameters:
action (-) – the given action to step with.
- Returns:
normalized observation after the input action and updated
self.rms
- reward (Any
) : amount of reward returned after previous action - done (Bool
) : whether the episode has ended, in which case further step() calls will return undefined results - info (Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)- Return type:
self.observation(observation)
- property unwrapped: Env
Returns the base environment of the wrapper.
- class lzero.envs.wrappers.action_discretization_env_wrapper.ActionDiscretizationEnvWrapper(env: Env, cfg: EasyDict)[source]
Bases:
Wrapper
- Overview:
The modified environment with manually discretized action space. For each dimension, equally dividing the original continuous action into
each_dim_disc_size
bins and using their Cartesian product to obtain handcrafted discrete actions.- Interface:
__init__
,reset
,step
- Properties:
env (
gym.Env
): the environment to wrap.
- __init__(env: Env, cfg: EasyDict) None [source]
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature; setup the properties according to running mean and std.
- Parameters:
env (-) – the environment to wrap.
- _is_protocol = False
- property _np_random
- property action_space: Space[ActType]
Returns the action space of the environment.
- classmethod class_name()
Returns the class name of the wrapper.
- close()
Closes the environment.
- property metadata: dict
Returns the environment metadata.
- property np_random: RandomNumberGenerator
Returns the environment np_random.
- property observation_space: Space
Returns the observation space of the environment.
- render(*args: Tuple[Any], **kwargs: Dict[str, Any]) RenderFrame | List[RenderFrame] | None
- property render_mode: str | None
Returns the environment render_mode.
- reset(**kwargs)[source]
- Overview:
Resets the state of the environment and reset properties.
- Parameters:
kwargs (-) – Reset with this key argumets
- Returns:
New observation after reset
- Return type:
observation (
Any
)
- property reward_range: Tuple[SupportsFloat, SupportsFloat]
Return the reward range of the environment.
- seed(seed=None)
Seeds the environment.
- property spec
Returns the environment specification.
- step(action)[source]
- Overview:
Step the environment with the given action. Repeat action, sum reward, and update
data_count
, and also update theself.rms
property once after integrating with the inputaction
.
- Parameters:
action (-) – the given action to step with.
- Returns:
normalized observation after the input action and updated
self.rms
- reward (Any
) : amount of reward returned after previous action - done (Bool
) : whether the episode has ended, in which case further step() calls will return undefined results - info (Dict
) : contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)- Return type:
self.observation(observation)
- property unwrapped: Env
Returns the base environment of the wrapper.