UI element grounding for improved action accuracy.
Repository: OpenAdaptAI/openadapt-grounding
pip install openadapt[grounding]
# or
pip install openadapt-groundingThe grounding package provides UI element detection and grounding to improve:
- Click accuracy by targeting element centers
- Robustness to UI changes
- Visual understanding of interfaces
Detect UI elements in screenshots:
- Buttons
- Text fields
- Links
- Icons
- Menus
Get precise coordinates for UI elements.
Overlay numbered markers on detected elements for LMM prompting.
from openadapt_grounding import ElementDetector, SoMPrompt
# Detect elements in a screenshot
detector = ElementDetector()
elements = detector.detect(screenshot_path)
for element in elements:
print(f"{element.label}: {element.bbox}")
# Create Set-of-Mark prompt
som = SoMPrompt(screenshot_path)
marked_image, element_map = som.create()
# element_map: {1: "Submit button", 2: "Email field", ...}from openadapt_ml import AgentPolicy
from openadapt_grounding import ElementDetector
# Create policy with grounding
policy = AgentPolicy.from_checkpoint(
"model.pt",
grounding=ElementDetector()
)
# Actions will use grounded coordinates
observation = load_screenshot()
action = policy.predict(observation)openadapt ground detect screenshot.pngOutput:
Found 12 elements:
1. Button: "Submit" at (450, 320, 520, 350)
2. TextField: "Email" at (200, 200, 400, 230)
...
openadapt ground som screenshot.png --output marked.png| Export | Description |
|---|---|
ElementDetector |
Detects UI elements |
SoMPrompt |
Creates Set-of-Mark prompts |
BoundingBox |
Element coordinates |
Element |
Detected element data |
| Model | Size | Accuracy | Speed |
|---|---|---|---|
omniparser |
1.2GB | High | Medium |
som-base |
500MB | Medium | Fast |
custom |
- | - | - |
- Set-of-Mark Paper
- OpenAdaptAI/SoM - SoM implementation
- openadapt-ml - Use grounding in policy learning and execution
- openadapt-capture - Apply grounding to demonstrations