Basic Usage#
Simple Example#
The following code performs a deterministic action on the
click-test-2 environment
(Instruction: Click button ONE.).
import time
import gymnasium
from miniwob.action import ActionTypes
env = gymnasium.make('miniwob/click-test-2-v1', render_mode='human')
# Wrap the code in try-finally to ensure proper cleanup.
try:
  # Start a new episode.
  obs, info = env.reset()
  time.sleep(2)       # Only here to let you look at the environment.
  
  # Find the HTML element with text "ONE".
  for element in obs["dom_elements"]:
    if element["text"] == "ONE":
      break
  # Click on the element.
  action = env.action_space.sample()     # Template for the action.
  action["action_type"] = env.action_space_config.action_types.index(
      ActionTypes.CLICK_ELEMENT
  )
  action["ref"] = element["ref"]
  obs, reward, terminated, _, _ = env.step(action)
  # Check if the action was correct. 
  assert reward >= 0      # Should be around 0.8 since 2 seconds has passed.
  assert terminated is True
  time.sleep(2)
finally:
  env.close()
Environment Initialization#
env = gymnasium.make('miniwob/click-test-2-v1', render_mode='human')
Common arguments include:
- render_mode: Render mode. Supported values are:- None(default): Headless Chrome, which does not show the browser window.
- 'human': Show the browser window.
 
- action_space_config: Configuration for the action space. See the Action Space section for details. Supported values are:- An - ActionSpaceConfigobject.
- A preset name, which will instantiate an - ActionSpaceConfigobject.
 
Observation Space#
Observation Object#
In all MiniWoB++ environments, an observation is a dict with the following fields:
| Key | Type | Description | 
|---|---|---|
| 
 | 
 | Task instruction string. | 
| 
 | 
 | Environment-specific task field keys. (TODO: Implement this in code) | 
| 
 | 
 | Task field values extracted from the task instruction. (TODO: Implement this in code) | 
| 
 | 
 | Screenshot as RGB values for each pixel. Note that some elements such as opened dropdown may not be captured in the screenshot. | 
| 
 | 
 | List of feature dicts, each describing a DOM elements (see below). | 
DOM Element Features#
Each feature dict in dom_elements has the following fields:
| Key | Type | Description | 
|---|---|---|
| 
 | 
 | Non-zero integer ID. 
 | 
| 
 | 
 | 
 | 
| 
 | 
 | Left coordinate relative to the screen (can be negative). | 
| 
 | 
 | Top coordinate relative to the screen (can be negative). | 
| 
 | 
 | Element width. | 
| 
 | 
 | Element height. | 
| 
 | 
 | HTML tag. 
 | 
| 
 | 
 | Text content, which is non-empty only for leaf elements.[1] | 
| 
 | 
 | Value of  | 
| 
 | 
 | HTML  | 
| 
 | 
 | HTML  | 
| 
 | 
 | Background color as RGBA value. | 
| 
 | 
 | Foreground color as RGBA value. | 
| 
 | 
 | Binary flags: 
 | 
Action Space#
Supported Actions#
MiniWoB++ environments support the following actions.
| Name | Description | 
|---|---|
| 
 | Do nothing for the current step. | 
| 
 | Click on the specified coordinates. | 
| 
 | Double-click on the specified coordinates. | 
| 
 | Start dragging on the specified coordinates. | 
| 
 | Stop dragging on the specified coordinates. | 
| 
 | Click on the specified element. | 
| 
 | Double-click on the specified element. | 
| 
 | Start dragging on the specified element. | 
| 
 | Stop dragging on the specified element. | 
| 
 | Scroll up on the mouse wheel. | 
| 
 | Scroll down on the mouse wheel. | 
| 
 | Press the specified key or key combination. | 
| 
 | Type the specified string. | 
| 
 | Type the value of the specified task field. | 
| 
 | Click on the specified element, and then type the specified string. | 
| 
 | Click on the specified element, and then type the value of the specified task field. | 
Action Configs#
The list of selected actions, along with other configurations, can be customized
by passing a miniwob.action.ActionSpaceConfig object to the action_space_config argument
during environment construction.
The ActionSpaceConfig object has the following fields:
| Key | Type | Description | 
|---|---|---|
| 
 | 
 | An ordered sequence of action types to include. | 
| 
 | 
 | Screen width. Will be overridden by the environment constructor. | 
| 
 | 
 | Screen height. Will be overridden by the environment constructor. | 
| 
 | 
 | If specified, bin the x and y coordinates to these numbers of bins. Mouse actions will be executed at the middle of the specified partition. | 
| 
 | 
 | An ordered sequence of allowed keys and key combinations for the  | 
| 
 | 
 | Maximum text length for the  | 
| 
 | 
 | Character set for the  | 
Action Object#
An action is a dict whose field inclusion depends on the selected actions:
| Key | Type | Description | 
| 
 | 
 | Action type index from the  | 
| 
 | 
 | Coordinates. Included when any  | 
| 
 | 
 | Element  | 
| 
 | 
 | Key index from the  | 
| 
 | 
 | Text to type. Included when any  | 
| 
 | 
 | Task field index. Included when any  | 
Presets#
The following preset names can be specified in place of the ActionSpaceConfig object:
(TODO: Implement this in code)
- all_supported: Select all supported actions, including redundant ones.
- shi17: The action space from (Shi et al., 2017) World of Bits: An Open-Domain Platform for Web-Based Agents.
- liu18: The action space from (Liu et al., 2018) Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration.
- humphreys22: The action space from (Humphreys et al., 2022) A data-driven approach for learning to control computers.