openadapt-grounding

UI element grounding for improved action accuracy.

Repository: OpenAdaptAI/openadapt-grounding

Installation

pip install openadapt[grounding]
# or
pip install openadapt-grounding

Overview

The grounding package provides UI element detection and grounding to improve:

Click accuracy by targeting element centers
Robustness to UI changes
Visual understanding of interfaces

Features

Element Detection

Detect UI elements in screenshots:

Buttons
Text fields
Links
Icons
Menus

Bounding Box Extraction

Get precise coordinates for UI elements.

Set-of-Mark (SoM) Prompting

Overlay numbered markers on detected elements for LMM prompting.

Python API

from openadapt_grounding import ElementDetector, SoMPrompt

# Detect elements in a screenshot
detector = ElementDetector()
elements = detector.detect(screenshot_path)

for element in elements:
    print(f"{element.label}: {element.bbox}")

# Create Set-of-Mark prompt
som = SoMPrompt(screenshot_path)
marked_image, element_map = som.create()

# element_map: {1: "Submit button", 2: "Email field", ...}

Integration with Policy Execution

from openadapt_ml import AgentPolicy
from openadapt_grounding import ElementDetector

# Create policy with grounding
policy = AgentPolicy.from_checkpoint(
    "model.pt",
    grounding=ElementDetector()
)

# Actions will use grounded coordinates
observation = load_screenshot()
action = policy.predict(observation)

CLI Commands

Detect Elements

openadapt ground detect screenshot.png

Output:

Found 12 elements:
  1. Button: "Submit" at (450, 320, 520, 350)
  2. TextField: "Email" at (200, 200, 400, 230)
  ...

Create SoM Image

openadapt ground som screenshot.png --output marked.png

Key Exports

Export	Description
`ElementDetector`	Detects UI elements
`SoMPrompt`	Creates Set-of-Mark prompts
`BoundingBox`	Element coordinates
`Element`	Detected element data

Models

Model	Size	Accuracy	Speed
`omniparser`	1.2GB	High	Medium
`som-base`	500MB	Medium	Fast
`custom`	-	-	-

Related Resources

Set-of-Mark Paper
OpenAdaptAI/SoM - SoM implementation

Related Packages

openadapt-ml - Use grounding in policy learning and execution
openadapt-capture - Apply grounding to demonstrations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

openadapt-grounding

Installation

Overview

Features

Element Detection

Bounding Box Extraction

Set-of-Mark (SoM) Prompting

Python API

Integration with Policy Execution

CLI Commands

Detect Elements

Create SoM Image

Key Exports

Models

Related Resources

Related Packages

Uh oh!

FilesExpand file tree

grounding.md

Latest commit

History

grounding.md

File metadata and controls

openadapt-grounding

Installation

Overview

Features

Element Detection

Bounding Box Extraction

Set-of-Mark (SoM) Prompting

Python API

Integration with Policy Execution

CLI Commands

Detect Elements

Create SoM Image

Key Exports

Models

Related Resources

Related Packages