Skip to content

Commit f9d06a2

Browse files
abrichrclaude
andauthored
feat: initial schemas — ComputerState, Action, UINode, Episode (#1)
Canonical Pydantic v2 schemas for computer-use agents, converging three existing schema formats (openadapt-ml, openadapt-evals, omnimcp) into one shared package with zero ML dependencies. Includes: - ComputerState: screen state with UI element graph - UINode: element with role, bbox, hierarchy, platform anchors - Action + ActionTarget: typed actions with node_id > description > coords - ActionResult: explicit execution outcomes with error taxonomy - Episode + Step: complete task trajectories - FailureRecord: classified failures for dataset pipelines - _compat: converters from all 3 existing formats - 43 tests passing Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 057335f commit f9d06a2

File tree

15 files changed

+2079
-1
lines changed

15 files changed

+2079
-1
lines changed

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
__pycache__/
2+
*.pyc
3+
*.pyo
4+
.venv/
5+
dist/
6+
build/
7+
*.egg-info/
8+
.pytest_cache/
9+
.env

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 OpenAdapt
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 116 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,117 @@
11
# openadapt-types
2-
Canonical Pydantic schemas for computer-use agents: ComputerState, Action, ActionResult, UINode
2+
3+
Canonical Pydantic schemas for computer-use agents.
4+
5+
```
6+
pip install openadapt-types
7+
```
8+
9+
## What's in the box
10+
11+
| Schema | Purpose |
12+
|--------|---------|
13+
| `ComputerState` | Screen state: screenshot + UI element graph + window context |
14+
| `UINode` | Single UI element with role, bbox, hierarchy, platform anchors |
15+
| `Action` | Agent action with typed action space + flexible targeting |
16+
| `ActionTarget` | Where to act: `node_id` > `description` > `(x, y)` coordinates |
17+
| `ActionResult` | Execution outcome with error taxonomy + state delta |
18+
| `Episode` / `Step` | Complete task trajectory (observation → action → result) |
19+
| `FailureRecord` | Classified failure for dataset pipelines |
20+
21+
## Quick start
22+
23+
```python
24+
from openadapt_types import (
25+
Action, ActionTarget, ActionType,
26+
ComputerState, UINode, BoundingBox,
27+
)
28+
29+
# Describe what's on screen
30+
state = ComputerState(
31+
viewport=(1920, 1080),
32+
nodes=[
33+
UINode(node_id="n0", role="window", name="My App", children_ids=["n1"]),
34+
UINode(node_id="n1", role="button", name="Submit", parent_id="n0",
35+
bbox=BoundingBox(x=500, y=400, width=100, height=40)),
36+
],
37+
)
38+
39+
# Agent decides what to do
40+
action = Action(
41+
type=ActionType.CLICK,
42+
target=ActionTarget(node_id="n1"),
43+
reasoning="Click Submit to proceed",
44+
)
45+
46+
# Render element tree for LLM prompts
47+
print(state.to_text_tree())
48+
# [n0] window: My App
49+
# [n1] button: Submit
50+
```
51+
52+
## Action targeting
53+
54+
`ActionTarget` supports three grounding strategies (in priority order):
55+
56+
```python
57+
# 1. Element-based (preferred — most robust)
58+
ActionTarget(node_id="n1")
59+
60+
# 2. Description-based (resolved by grounding module)
61+
ActionTarget(description="the blue submit button")
62+
63+
# 3. Coordinate-based (fallback)
64+
ActionTarget(x=550, y=420)
65+
ActionTarget(x=0.29, y=0.39, is_normalized=True)
66+
```
67+
68+
Agents SHOULD produce `node_id` or `description`. The runtime resolves to coordinates.
69+
70+
## Compatibility with existing schemas
71+
72+
Converters for three existing OpenAdapt schema formats:
73+
74+
```python
75+
from openadapt_types._compat import (
76+
from_benchmark_observation, # openadapt-evals BenchmarkObservation
77+
from_benchmark_action, # openadapt-evals BenchmarkAction
78+
from_ml_observation, # openadapt-ml Observation
79+
from_ml_action, # openadapt-ml Action
80+
from_omnimcp_screen_state, # omnimcp ScreenState
81+
from_omnimcp_action_decision, # omnimcp ActionDecision
82+
)
83+
84+
# Convert existing data
85+
state = from_benchmark_observation(obs.__dict__)
86+
action = from_benchmark_action(act.__dict__)
87+
```
88+
89+
## JSON Schema
90+
91+
Export for language-agnostic tooling:
92+
93+
```python
94+
import json
95+
from openadapt_types import ComputerState, Action, Episode
96+
97+
# Get JSON Schema
98+
schema = ComputerState.model_json_schema()
99+
print(json.dumps(schema, indent=2))
100+
```
101+
102+
## Design principles
103+
104+
- **Pydantic v2** — runtime validation, JSON Schema export, fast serialization
105+
- **Pixels + structure** — always capture both visual and semantic UI state
106+
- **Node graph** — full element tree, not just focused element
107+
- **Platform-agnostic** — same schema for Windows, macOS, Linux, web
108+
- **Extension-friendly**`raw`, `attributes`, `metadata` fields everywhere
109+
- **Backward compatible**`_compat` converters for gradual migration
110+
111+
## Dependencies
112+
113+
Just `pydantic>=2.0`. No ML libraries, no heavy deps.
114+
115+
## License
116+
117+
MIT

openadapt_types/__init__.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
"""openadapt-types: Canonical Pydantic schemas for computer-use agents.
2+
3+
This package provides the shared type definitions used across the OpenAdapt
4+
ecosystem and designed for adoption by any computer-use agent project.
5+
6+
Quick start::
7+
8+
from openadapt_types import ComputerState, Action, ActionType, UINode
9+
10+
state = ComputerState(
11+
viewport=(1920, 1080),
12+
nodes=[
13+
UINode(node_id="n0", role="button", name="Submit"),
14+
],
15+
)
16+
17+
action = Action(
18+
type=ActionType.CLICK,
19+
target=ActionTarget(node_id="n0"),
20+
)
21+
"""
22+
23+
from openadapt_types.action import (
24+
Action,
25+
ActionResult,
26+
ActionTarget,
27+
ActionType,
28+
)
29+
from openadapt_types.computer_state import (
30+
BoundingBox,
31+
ComputerState,
32+
ElementRole,
33+
ProcessInfo,
34+
UINode,
35+
)
36+
from openadapt_types.episode import Episode, Step
37+
from openadapt_types.failure import FailureCategory, FailureRecord
38+
39+
__version__ = "0.1.0"
40+
41+
__all__ = [
42+
# computer_state
43+
"BoundingBox",
44+
"ComputerState",
45+
"ElementRole",
46+
"ProcessInfo",
47+
"UINode",
48+
# action
49+
"Action",
50+
"ActionResult",
51+
"ActionTarget",
52+
"ActionType",
53+
# episode
54+
"Episode",
55+
"Step",
56+
# failure
57+
"FailureCategory",
58+
"FailureRecord",
59+
]

0 commit comments

Comments
 (0)