Basic Usage#
Simple Example#
The following code performs a deterministic action on the
click-test-2
environment
(Instruction: Click button ONE.).
import time
import gymnasium
from miniwob.action import ActionTypes
env = gymnasium.make('miniwob/click-test-2-v1', render_mode='human')
# Wrap the code in try-finally to ensure proper cleanup.
try:
# Start a new episode.
obs, info = env.reset()
time.sleep(2) # Only here to let you look at the environment.
# Find the HTML element with text "ONE".
for element in obs["dom_elements"]:
if element["text"] == "ONE":
break
# Click on the element.
action = env.action_space.sample() # Template for the action.
action["action_type"] = env.action_space_config.action_types.index(
ActionTypes.CLICK_ELEMENT
)
action["ref"] = element["ref"]
obs, reward, terminated, _, _ = env.step(action)
# Check if the action was correct.
assert reward >= 0 # Should be around 0.8 since 2 seconds has passed.
assert terminated is True
time.sleep(2)
finally:
env.close()
Environment Initialization#
env = gymnasium.make('miniwob/click-test-2-v1', render_mode='human')
Common arguments include:
render_mode
: Render mode. Supported values are:None
(default): Headless Chrome, which does not show the browser window.'human'
: Show the browser window.
action_space_config
: Configuration for the action space. See the Action Space section for details. Supported values are:An
ActionSpaceConfig
object.A preset name, which will instantiate an
ActionSpaceConfig
object.
Observation Space#
Observation Object#
In all MiniWoB++ environments, an observation is a dict
with the following fields:
Key |
Type |
Description |
---|---|---|
|
|
Task instruction string. |
|
|
Environment-specific task field keys. (TODO: Implement this in code) |
|
|
Task field values extracted from the task instruction. (TODO: Implement this in code) |
|
|
Screenshot as RGB values for each pixel. Note that some elements such as opened dropdown may not be captured in the screenshot. |
|
|
List of feature dicts, each describing a DOM elements (see below). |
DOM Element Features#
Each feature dict in dom_elements
has the following fields:
Key |
Type |
Description |
---|---|---|
|
|
Non-zero integer ID.
|
|
|
|
|
|
Left coordinate relative to the screen (can be negative). |
|
|
Top coordinate relative to the screen (can be negative). |
|
|
Element width. |
|
|
Element height. |
|
|
HTML tag.
|
|
|
Text content, which is non-empty only for leaf elements.[1] |
|
|
Value of |
|
|
HTML |
|
|
HTML |
|
|
Background color as RGBA value. |
|
|
Foreground color as RGBA value. |
|
|
Binary flags:
|
Action Space#
Supported Actions#
MiniWoB++ environments support the following actions.
Name |
Description |
---|---|
|
Do nothing for the current step. |
|
Click on the specified coordinates. |
|
Double-click on the specified coordinates. |
|
Start dragging on the specified coordinates. |
|
Stop dragging on the specified coordinates. |
|
Click on the specified element. |
|
Double-click on the specified element. |
|
Start dragging on the specified element. |
|
Stop dragging on the specified element. |
|
Scroll up on the mouse wheel. |
|
Scroll down on the mouse wheel. |
|
Press the specified key or key combination. |
|
Type the specified string. |
|
Type the value of the specified task field. |
|
Click on the specified element, and then type the specified string. |
|
Click on the specified element, and then type the value of the specified task field. |
Action Configs#
The list of selected actions, along with other configurations, can be customized
by passing a miniwob.action.ActionSpaceConfig
object to the action_space_config
argument
during environment construction.
The ActionSpaceConfig
object has the following fields:
Key |
Type |
Description |
---|---|---|
|
|
An ordered sequence of action types to include. |
|
|
Screen width. Will be overridden by the environment constructor. |
|
|
Screen height. Will be overridden by the environment constructor. |
|
|
If specified, bin the x and y coordinates to these numbers of bins. Mouse actions will be executed at the middle of the specified partition. |
|
|
An ordered sequence of allowed keys and key combinations for the |
|
|
Maximum text length for the |
|
|
Character set for the |
Action Object#
An action is a dict
whose field inclusion depends on the selected actions:
Key |
Type |
Description |
|
|
Action type index from the |
|
|
Coordinates. Included when any |
|
|
Element |
|
|
Key index from the |
|
|
Text to type. Included when any |
|
|
Task field index. Included when any |
Presets#
The following preset names can be specified in place of the ActionSpaceConfig
object:
(TODO: Implement this in code)
all_supported
: Select all supported actions, including redundant ones.shi17
: The action space from (Shi et al., 2017) World of Bits: An Open-Domain Platform for Web-Based Agents.liu18
: The action space from (Liu et al., 2018) Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration.humphreys22
: The action space from (Humphreys et al., 2022) A data-driven approach for learning to control computers.