Skip to content

Latest commit

 

History

History
256 lines (178 loc) · 9.53 KB

File metadata and controls

256 lines (178 loc) · 9.53 KB

OpenAdapt

AI-First Process Automation with Large Multimodal Models

Build Status PyPI version Downloads License: MIT Python 3.10+ Discord

OpenAdapt is the open-source adapter between Large Multimodal Models (LMMs) and traditional desktop and web interfaces. Transform GUI automation through demonstration-based learning rather than complex programming.

🎯 Show, don't tell → Record demonstrations, train intelligent agents, and deploy automation that adapts to any software environment.

Quick Links

🚀 Get Started | 💬 Join Discord | 📖 Documentation | 🌐 Website


✨ What Makes OpenAdapt Different

Traditional Automation OpenAdapt
❌ Complex scripting required ✅ Record demonstrations visually
❌ Brittle, breaks with UI changes ✅ AI adapts to interface variations
❌ Limited to predefined workflows ✅ Learns from human expertise
❌ Programming knowledge needed ✅ Anyone can create automations

Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Core Platform Components

Package Description Repository
openadapt Meta-package with unified CLI This repo
openadapt-capture Event recording and storage openadapt-capture
openadapt-ml ML engine, training, inference openadapt-ml
openadapt-evals Benchmark evaluation openadapt-evals
openadapt-viewer HTML visualization openadapt-viewer
openadapt-grounding UI element localization openadapt-grounding
openadapt-retrieval Multimodal demo retrieval openadapt-retrieval
openadapt-privacy PII/PHI scrubbing openadapt-privacy
openadapt-agent Production execution engine openadapt-agent

Applications & Tools

Package Description Repository
openadapt-tray System tray application openadapt-tray
openadapt-wright AI-powered dev automation openadapt-wright
openadapt-consilium Multi-LLM consensus system openadapt-consilium
openadapt-web Marketing website openadapt-web
openadapt-telemetry Error tracking and analytics openadapt-telemetry

Installation

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+


Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

CLI Reference

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works

OpenAdapt transforms GUI automation through a three-phase demo-conditioned approach that learns from human demonstrations rather than relying solely on programmatic instructions.

The Three-Phase Pipeline

1. DEMONSTRATE → 2. LEARN → 3. EXECUTE
    ↓               ↓           ↓
  Record         Train       Deploy
  Actions        Models      Agents

Phase 1: Demonstrate

Record human demonstrations of GUI tasks using openadapt-capture. All recordings are processed through openadapt-privacy for PII/PHI scrubbing before storage.

Phase 2: Learn

Choose your learning approach:

  • Retrieval Path: Index demonstrations with openadapt-retrieval for runtime context
  • Training Path: Fine-tune vision-language models using openadapt-ml
  • Hybrid: Combine both for maximum effectiveness

Phase 3: Execute

Deploy intelligent agents via openadapt-agent that:

  • Observe the current screen state
  • Apply learned policies with demonstration context
  • Ground actions to specific UI elements via openadapt-grounding
  • Execute actions with built-in safety validation

🧠 Core Innovation: Demo-Conditioned Automation

Instead of complex prompts, OpenAdapt learns from visual demonstrations:

Traditional Approach Demo-Conditioned
Write detailed prompts Record demonstration once
Debug when things break AI adapts to UI changes
Program every edge case Learn from human intuition
Maintain complex scripts Visual examples as documentation

Results: In controlled benchmarks, demonstration context improved first-action accuracy from 46.7% to 100%. Similar demonstrations provide rich context that helps Vision Language Models understand both the what and how of GUI interactions.

🔑 Key Concepts

  • Smart Decision Making: AI decides what to do, precise grounding determines where to click
  • Built-in Safety: Actions are validated before execution to prevent unintended consequences
  • Progressive Learning: From exact replay to intelligent adaptation as the system learns
  • Self-Improving: Successful automations become training data for even better performance

Terminology

Term Description
Observation What the agent perceives (screenshot, accessibility tree)
Action What the agent does (click, type, scroll, etc.)
Trajectory Sequence of observation-action pairs
Demonstration Human-provided example trajectory
Policy Decision-making component that maps observations to actions
Grounding Mapping intent to specific UI elements (coordinates)

Demos


Permissions

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.


Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.


Contributing

  1. Join Discord
  2. Pick an issue from the relevant sub-package repository
  3. Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"

Related Projects


Support


License

MIT License - see LICENSE for details.