You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: reframe positioning with multi-pillar strategy (#991)
* docs: reframe positioning with multi-pillar strategy and honest scoping
- README: Replace "Demo-Conditioned Prompting" with "Trajectory-Conditioned
Disambiguation" showing the 2x2 experimental matrix (prompting validated,
fine-tuning in progress). Add OpenCUA industry validation.
- Landing page strategy: Lead with capture-to-deployment pipeline, add
specialization pillar, update competitor table for March 2026 landscape
(Agent S3, OpenCUA, Browser Use, CUA/Bytebot). Add honesty notes for
proof points.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct OpenCUA attribution to macOS a11y code reuse
OpenCUA reused OpenAdapt's macOS accessibility tree capture code
(AX API traversal functions + oa_atomacos dependency), not the full
capture-to-deployment pipeline. The recorder architecture came from
DuckTrack. Updated README, landing page strategy, competitor table,
and proof points to reflect this accurately.
Evidence: arxiv.org/html/2508.09123v3 Section 2.2, OpenCUA README
"Acknowledge" section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: review fixes — accuracy, claims, and add builders section
- Use 46.7% consistently (not 33-47% range)
- Change "core goal" to "planned" in 2x2 matrix
- Drop "superhuman" for Agent S3 (barely above human baseline)
- Fix possessive "our" to "OpenAdapt's" in competitor table
- Add "Built for Builders" section for non-technical users
- Renumber subsequent sections
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Zero-shot VLMs fail on GUI tasks not due to lack of capability, but due to **ambiguity in UI affordances**. OpenAdapt resolves this by conditioning agents on human demonstrations — "show, don't tell."
170
170
171
-
| Traditional Agent | OpenAdapt Agent |
172
-
|-------------------|-----------------|
173
-
| User writes prompts | User records demonstration |
174
-
| Ambiguous instructions | Grounded in actual UI |
|**Fine-tuning**| Standard SFT (baseline) |**Demo-conditioned FT** (planned) |
177
175
178
-
**Retrieval powers BOTH training AND evaluation**: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the [publication roadmap](docs/publication-roadmap.md) for methodology and limitations.
176
+
The bottom-right cell is OpenAdapt's unique value: training models to **use** demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.
177
+
178
+
**Validated result**: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the [research thesis](https://github.com/OpenAdaptAI/openadapt-ml/blob/main/docs/research_thesis.md) for methodology and the [publication roadmap](docs/publication-roadmap.md) for limitations.
179
+
180
+
**Industry validation**: [OpenCUA](https://github.com/xlang-ai/OpenCUA) (NeurIPS 2025 Spotlight, XLANG Lab) [reused OpenAdapt's macOS accessibility capture code](https://arxiv.org/html/2508.09123v3) in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.
Copy file name to clipboardExpand all lines: docs/design/landing-page-strategy.md
+77-35Lines changed: 77 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,9 @@ OpenAdapt has evolved from a monolithic application (v0.46.0) to a **modular met
40
40
4.**Open Source (MIT License)**: Full transparency, no vendor lock-in
41
41
42
42
**Key Innovation**:
43
-
-**Trajectory-conditioned disambiguation of UI affordances** - validated experiment showing 33% -> 100% first-action accuracy with demo conditioning
43
+
-**Trajectory-conditioned disambiguation of UI affordances** — the only open-source CUA framework that conditions agents on recorded demonstrations at runtime (validated: 46.7% → 100% first-action accuracy)
44
+
-**Specialization over scale** — fine-tuned Qwen3-VL-2B outperforms Claude Sonnet 4.5 and GPT-5.1 on action accuracy (42.9% vs 11.2% vs 23.2%) on an internal benchmark
45
+
-**Capture-to-deployment pipeline** — record → retrieve → train → deploy. [OpenCUA](https://github.com/xlang-ai/OpenCUA) (NeurIPS 2025 Spotlight) [reused OpenAdapt's macOS accessibility capture code](https://arxiv.org/html/2508.09123v3) in their AgentNetTool
44
46
-**Set-of-Marks (SoM) mode**: 100% accuracy on synthetic benchmarks using element IDs instead of coordinates
- But: "Agents conditioned on relevant demos — at inference AND during training"
231
+
- Proof: 46.7% → 100% first-action accuracy with demo conditioning (validated, n=45). No other open-source CUA framework does runtime demo conditioning.
232
+
- Note: This is first-action accuracy on tasks sharing a navigation entry point. Multi-step and cross-domain evaluation is ongoing on Windows Agent Arena.
233
+
234
+
3.**Specialization Over Scale**
235
+
- Not: "Use the biggest model available"
236
+
- But: "A 2B model fine-tuned on your workflows outperforms frontier models"
237
+
- Proof: Qwen3-VL-2B (42.9%) vs Claude Sonnet 4.5 (11.2%) on action accuracy (internal benchmark, synthetic login task)
|**Anthropic Computer Use**| First-mover, Claude integration, simple API | Proprietary, cloud-only, no customization | Open source, model-agnostic, trainable |
264
-
|**UI-TARS (ByteDance)**| Strong benchmark scores, research backing | Closed source, not productized | Open source, deployable, extensible |
265
-
|**Traditional RPA (UiPath, etc.)**| Enterprise-proven, large ecosystems | Brittle selectors, no AI reasoning, expensive | AI-first, learns from demos, affordable |
266
-
|**GPT-4V + Custom Code**| Powerful model, flexibility | Requires building everything, no structure | Ready-made SDK, training pipeline, benchmarks |
271
+
|**Anthropic Computer Use**| 72.5% OSWorld (near-human), simple API | Proprietary, cloud-only, no customization, per-action cost | Open source, model-agnostic, trainable, runs locally |
272
+
|**Agent S3 (Simular)**| 72.6% OSWorld, open source | Zero-shot only, no demo conditioning, no fine-tuning pipeline | Demo-conditioned agents, capture-to-train pipeline |
273
+
|**OpenCUA (XLANG Lab)**| NeurIPS Spotlight, 45% OSWorld, open models (7B-72B) | Zero-shot at inference — demos used only for training, not runtime | Runtime demo conditioning (unique); OpenCUA reused OpenAdapt's macOS a11y code |
274
+
|**Browser Use**| 50k+ GitHub stars, 89% WebVoyager | Browser-only, no desktop, no training pipeline | Full desktop support, fine-tuning, demo library |
275
+
|**UI-TARS (ByteDance)**| Local models (2B-72B), Apache 2.0 | No demo conditioning, no capture pipeline | End-to-end record→train→deploy, demo retrieval |
276
+
|**CUA / Bytebot**| Container infra, YC-backed | Infrastructure-only, no ML training pipeline | Full pipeline: capture + train + eval + deploy |
277
+
|**Traditional RPA (UiPath, etc.)**| Enterprise-proven, UiPath Screen Agent #1 on OSWorld | Brittle selectors, expensive ($10K+/yr), requires scripting | AI-first, learns from demos, open source |
267
278
268
279
### 4.2 Positioning Statement
269
280
@@ -352,24 +363,46 @@ Show it once. Let it handle the rest.
352
363
```
353
364
## Why OpenAdapt?
354
365
355
-
### Demonstration-Based Learning
356
-
No prompt engineering required. OpenAdapt learns from how you actually do tasks.
357
-
[Stat: 33% -> 100% first-action accuracy with demo conditioning]
366
+
### Record Once, Automate Forever
367
+
Capture any workflow. OpenAdapt retrieves relevant demos to guide agents
368
+
AND trains specialized models on your recordings.
369
+
[Stat: 46.7% → 100% first-action accuracy with demo conditioning]
370
+
371
+
### Small Models, Big Results
372
+
A 2B model fine-tuned on your workflows outperforms frontier models.
0 commit comments