|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +**AskUI Vision Agent** is a Python desktop and mobile automation framework that enables AI agents to control computers (Windows, macOS, Linux), mobile devices (Android, iOS), and HMI systems. It supports both programmatic UI automation (RPA-like single-step commands) and agentic intent-based instructions using vision/computer vision models. |
| 8 | + |
| 9 | +**Tech Stack:** Python 3.10+, Pydantic 2, Anthropic SDK, OpenTelemetry, Model Context Protocol (MCP), PDM |
| 10 | + |
| 11 | +## Common Commands |
| 12 | + |
| 13 | +### Development Setup |
| 14 | +```bash |
| 15 | +# Install dependencies |
| 16 | +pdm install |
| 17 | +``` |
| 18 | + |
| 19 | +### Testing |
| 20 | +```bash |
| 21 | +# Run all tests (parallel execution) |
| 22 | +pdm run test |
| 23 | + |
| 24 | +# Run specific test suites |
| 25 | +pdm run test:unit # Unit tests only |
| 26 | +pdm run test:integration # Integration tests only |
| 27 | +pdm run test:e2e # End-to-end tests only |
| 28 | + |
| 29 | +# Run tests with coverage |
| 30 | +pdm run test:cov # All tests with coverage report |
| 31 | +pdm run test:cov:view # View coverage report in browser |
| 32 | +``` |
| 33 | + |
| 34 | +### Code Quality |
| 35 | +```bash |
| 36 | +# Quick QA: type check, format, and fix linting issues (run before commits) |
| 37 | +pdm run qa:fix |
| 38 | + |
| 39 | +# Individual commands |
| 40 | +pdm run typecheck:all # Type checking with mypy |
| 41 | +pdm run format # Format code with ruff |
| 42 | +pdm run lint # Lint code with ruff |
| 43 | +pdm run lint:fix # Auto-fix linting issues |
| 44 | +``` |
| 45 | + |
| 46 | +### Code Generation |
| 47 | +```bash |
| 48 | +# Regenerate gRPC client code from .proto files |
| 49 | +pdm run grpc:gen |
| 50 | + |
| 51 | +# Regenerate Pydantic models from JSON schemas |
| 52 | +pdm run json:gen |
| 53 | +``` |
| 54 | + |
| 55 | +## High-Level Architecture |
| 56 | + |
| 57 | +### Core SDK Architecture |
| 58 | + |
| 59 | +``` |
| 60 | +ComputerAgent (Main SDK Entry Point) |
| 61 | + ↓ |
| 62 | +Agent (Abstract base class for all agents) |
| 63 | + ├── ComputerAgent (Desktop automation) |
| 64 | + ├── AndroidAgent (Mobile Android automation) |
| 65 | + ├── WebVisionAgent (Web-specific automation) |
| 66 | + └── WebTestingAgent (Web testing framework) |
| 67 | +
|
| 68 | + Uses: |
| 69 | + ├── ModelRouter → Model selection/composition |
| 70 | + ├── AgentToolbox → Tool & OS abstraction |
| 71 | + └── Locators → UI element identification |
| 72 | +``` |
| 73 | + |
| 74 | +**Key Flow:** |
| 75 | +1. User calls `agent.click("Submit button")` on `ComputerAgent` |
| 76 | +2. `AgentBase.locate()` routes to appropriate model via `ModelRouter` |
| 77 | +3. Model receives screenshot + locator → returns coordinates |
| 78 | +4. `AgentToolbox.os.click()` → gRPC call to Agent OS |
| 79 | +5. Agent OS performs actual mouse click |
| 80 | + |
| 81 | +### Chat API Architecture |
| 82 | + |
| 83 | +``` |
| 84 | +FastAPI Chat API (Experimental) |
| 85 | + ├── Assistants (AI agent configurations) |
| 86 | + ├── Threads (Conversation sessions) |
| 87 | + ├── Messages (Chat history) |
| 88 | + ├── Runs (Agent execution iterations) |
| 89 | + ├── Files (Attachments & resources) |
| 90 | + ├── MCP Configs (Tool providers) |
| 91 | + └── Workflows & Scheduled Jobs (Automation triggers) |
| 92 | +``` |
| 93 | + |
| 94 | +**Key Flow:** |
| 95 | +1. User → Chat UI (hub.askui.com) → Chat API (FastAPI) |
| 96 | +2. Thread/Messages stored in SQLAlchemy database |
| 97 | +3. Runs execute agent steps in a loop |
| 98 | +4. Agent uses ModelRouter → Tools (MCP servers or direct) → AgentOS |
| 99 | + |
| 100 | +### Model Router & Composition |
| 101 | + |
| 102 | +The `ModelRouter` provides a flexible abstraction for AI model selection: |
| 103 | + |
| 104 | +```python |
| 105 | +# Single model for all tasks |
| 106 | +model = "askui" |
| 107 | + |
| 108 | +# Task-specific models (ActModel, GetModel, LocateModel) |
| 109 | +model = { |
| 110 | + "act": "claude-sonnet-4-20250514", |
| 111 | + "get": "askui", |
| 112 | + "locate": "askui-combo" |
| 113 | +} |
| 114 | + |
| 115 | +# Custom registry |
| 116 | +models = ModelRegistry() |
| 117 | +models.register("my-model", custom_model_instance) |
| 118 | +``` |
| 119 | + |
| 120 | +**Supported Model Providers:** |
| 121 | +- **AskUI Models** (Primary - internally hosted) |
| 122 | +- **Anthropic Claude** (Computer Use, Messages API) |
| 123 | +- **Google Gemini** (via OpenRouter) |
| 124 | +- **Hugging Face Spaces** (Community models) |
| 125 | + |
| 126 | +### Agent OS Abstraction |
| 127 | + |
| 128 | +`AgentOs` provides an abstraction layer for OS-level operations: |
| 129 | + |
| 130 | +``` |
| 131 | +AgentOs (Abstract Interface) |
| 132 | + ├── AskUiControllerClient (gRPC to AskUI Agent OS - primary) |
| 133 | + ├── PlaywrightAgentOs (Web browser automation) |
| 134 | + └── AndroidAgentOs (Android ADB) |
| 135 | +``` |
| 136 | + |
| 137 | +### Locator System |
| 138 | + |
| 139 | +Locators identify UI elements in multiple ways: |
| 140 | + |
| 141 | +- **Text**: Match by text content (exact/similar/contains/regex) |
| 142 | +- **Image**: Match by image file or base64 |
| 143 | +- **Prompt**: Natural language description |
| 144 | +- **Coordinate**: Absolute (x, y) position |
| 145 | +- **Relatable**: Positional relationships (right_of, below, etc.) |
| 146 | + |
| 147 | +Serialization differs by model type (VLM vs. traditional). |
| 148 | + |
| 149 | +### Tool System (MCP) |
| 150 | + |
| 151 | +Tools follow the Model Context Protocol (MCP) for extensibility: |
| 152 | + |
| 153 | +``` |
| 154 | +Tools (MCP Servers) |
| 155 | + ├── Computer: screenshot, click, type, mouse, clipboard |
| 156 | + ├── Android: device control via ADB |
| 157 | + ├── Testing: scenario & feature management |
| 158 | + └── Utility: file ops, data extraction |
| 159 | +``` |
| 160 | + |
| 161 | +Tools are auto-discovered and can be dynamically loaded via MCP configurations. |
| 162 | + |
| 163 | +## Key Code Locations |
| 164 | + |
| 165 | +### Core SDK |
| 166 | +- `src/askui/agent.py` - Main `ComputerAgent` class (user-facing API) |
| 167 | +- `src/askui/agent_base.py` - Abstract `Agent` (base) with shared agent logic |
| 168 | +- `src/askui/android_agent.py` - Android-specific agent |
| 169 | +- `src/askui/web_agent.py` - Web-specific agent |
| 170 | + |
| 171 | +### Models & AI |
| 172 | +- `src/askui/models/` - AI model providers & router factory |
| 173 | +- `src/askui/models/shared/` - Shared abstractions (`Agent`, `Tool`, `MessagesApi`) |
| 174 | +- `src/askui/models/{provider}/` - Provider implementations |
| 175 | +- `src/askui/prompts/` - System prompts for different models |
| 176 | + |
| 177 | +### Tools & OS |
| 178 | +- `src/askui/tools/agent_os.py` - Abstract `AgentOs` interface |
| 179 | +- `src/askui/tools/askui/` - gRPC client for AskUI Agent OS |
| 180 | +- `src/askui/tools/android/` - Android-specific tools |
| 181 | +- `src/askui/tools/playwright/` - Web automation tools |
| 182 | +- `src/askui/tools/mcp/` - MCP client/server implementations |
| 183 | +- `src/askui/tools/testing/` - Test scenario tools |
| 184 | + |
| 185 | +### Locators |
| 186 | +- `src/askui/locators/` - UI element selectors |
| 187 | +- `src/askui/locators/serializers.py` - Locator serialization for models |
| 188 | + |
| 189 | +### Chat API |
| 190 | +- `src/askui/chat/` - FastAPI-based Chat API |
| 191 | +- `src/askui/chat/api/` - REST API routes |
| 192 | +- `src/askui/chat/migrations/` - Alembic migrations & ORM models |
| 193 | + |
| 194 | +### Utilities |
| 195 | +- `src/askui/utils/` - Image processing, API utilities, caching, annotations |
| 196 | +- `src/askui/reporting.py` - Reporting & logging |
| 197 | +- `src/askui/retry.py` - Retry logic with exponential backoff |
| 198 | +- `src/askui/telemetry/` - OpenTelemetry tracing & analytics |
| 199 | + |
| 200 | +## Code Style & Conventions |
| 201 | + |
| 202 | +### General Python Style |
| 203 | +- **Private members**: Use `_` prefix for all private variables, functions, methods, etc. Mark everything private that doesn't need external access. |
| 204 | +- **Type hints**: Required everywhere. Use built-in types (`list`, `dict`, `str | None`) instead of `typing` module types (`List`, `Dict`, `Optional`). |
| 205 | +- **Overrides**: Use `@override` decorator from `typing_extensions` for all overridden methods. |
| 206 | +- **Exceptions**: Never pass literals to exceptions. Assign to variables first: |
| 207 | + ```python |
| 208 | + # Good |
| 209 | + error_msg = f"Thread {thread_id} not found" |
| 210 | + raise FileNotFoundError(error_msg) |
| 211 | + |
| 212 | + # Bad |
| 213 | + raise FileNotFoundError(f"Thread {thread_id} not found") |
| 214 | + ``` |
| 215 | +- **File operations**: Always specify `encoding="utf-8"` for file read/write operations. |
| 216 | +- **Init files**: Create `__init__.py` in each folder. |
| 217 | + |
| 218 | +### FastAPI Specific |
| 219 | +- Use response type in function signature instead of `response_model` in route annotation. |
| 220 | +- Dependencies without defaults should come before arguments with defaults. |
| 221 | + |
| 222 | +### Testing |
| 223 | +- Use `pytest-mock` for mocking wherever possible. |
| 224 | +- Test files in `tests/` follow structure: `test_*.py` with `Test*` classes and `test_*` functions. |
| 225 | +- Timeout: 60 seconds per test (configured in `pyproject.toml`). |
| 226 | + |
| 227 | +### Git Conventions |
| 228 | +- **Never** use `git add .` - explicitly add files related to the task. |
| 229 | +- Use conventional commits format: `feat:`, `fix:`, `docs:`, `style:`, `refactor:`, `test:`, `chore:`. |
| 230 | +- **Before committing**, always run: `pdm run qa:fix` (or individually: `typecheck:all`, `format`, `lint:fix`). |
| 231 | + |
| 232 | +### Docstrings |
| 233 | +- All public functions, classes, and constants require docstrings. |
| 234 | +- Document constructor args in class docstring, omit `__init__` docstring. |
| 235 | +- Use backticks for code references (variables, types, functions). |
| 236 | +- Function references: `click()`, Class references: `ComputerAgent`, Method references: `VisionAgent.click()` |
| 237 | +- Include sections: `Args`, `Returns`, `Raises`, `Example`, `Notes`, `See Also` as needed. |
| 238 | +- Document parameter types in parentheses, add `, optional` for defaults. |
| 239 | + |
| 240 | +### Documentation (docs/) |
| 241 | +When writing or updating documentation in `docs/`: |
| 242 | +- **Never show setting environment variables in Python code** (e.g., `os.environ["ASKUI_WORKSPACE_ID"] = "..."`). This is bad practice. Always instruct users to set environment variables via their shell or system settings. |
| 243 | +- Keep examples concise and focused on the feature being documented. |
| 244 | +- Test all code examples before including them. |
| 245 | +- Use `ComputerAgent` (not `VisionAgent`) in examples. |
| 246 | + |
| 247 | +## Important Patterns |
| 248 | + |
| 249 | +### Composition over Inheritance |
| 250 | +- `AgentToolbox` wraps `AgentOs` implementations |
| 251 | +- `ModelRouter` composes multiple model providers |
| 252 | +- `CompositeReporter` aggregates multiple reporters |
| 253 | + |
| 254 | +### Factory Pattern |
| 255 | +- `ModelRouter.initialize_default_model_registry()` creates model registry |
| 256 | +- Model providers use factory functions for lazy-loading |
| 257 | + |
| 258 | +### Strategy Pattern |
| 259 | +- Truncation strategies for message history |
| 260 | +- Different locator serializers for model types |
| 261 | +- Retry strategies with exponential backoff |
| 262 | + |
| 263 | +### Adapter Pattern |
| 264 | +- `AgentOs` abstraction bridges OS implementations (gRPC, Playwright, ADB) |
| 265 | +- `ModelFacade` adapts models to `ActModel`/`GetModel`/`LocateModel` interfaces |
| 266 | + |
| 267 | +### Dependency Injection |
| 268 | +- Constructor-based DI throughout |
| 269 | +- FastAPI dependencies for Chat API routes |
| 270 | +- `@auto_inject_agent_os` decorator for tools |
| 271 | + |
| 272 | +### Template Method Pattern |
| 273 | +- `Agent._step()` orchestrates tool-calling loop |
| 274 | +- `Agent` provides common structure for all agents |
| 275 | + |
| 276 | +## Database & Observability |
| 277 | + |
| 278 | +### Alembic Migrations |
| 279 | +- Schema versioning in `src/askui/chat/migrations/` |
| 280 | +- ORM models in `migrations/shared/{entity}/models.py` |
| 281 | +- Auto-migration on startup (configurable) |
| 282 | +- SQLAlchemy with async support |
| 283 | + |
| 284 | +### Telemetry |
| 285 | +- OpenTelemetry integration (FastAPI, HTTPX, SQLAlchemy) |
| 286 | +- Structured logging with structlog |
| 287 | +- Correlation IDs for request tracing |
| 288 | +- Prometheus metrics via FastAPI instrumentator |
| 289 | +- Segment Analytics for usage tracking |
| 290 | + |
| 291 | +## Extending the Framework |
| 292 | + |
| 293 | +### Adding Custom Models |
| 294 | +1. Inherit from `ActModel`, `GetModel`, or `LocateModel` |
| 295 | +2. Implement message creation via `MessagesApi` |
| 296 | +3. Register in `ModelRegistry` |
| 297 | +4. Use appropriate locator serializer |
| 298 | + |
| 299 | +### Adding Custom Tools |
| 300 | +1. Implement `Tool` protocol in `models/shared/tools.py` |
| 301 | +2. Register in appropriate MCP server (`api/mcp_servers/{type}.py`) |
| 302 | +3. Use `@auto_inject_agent_os` for AgentOs dependency |
| 303 | +4. Follow Pydantic schema validation |
| 304 | + |
| 305 | +### Adding New Agent Types |
| 306 | +1. Inherit from `Agent` |
| 307 | +2. Implement required abstract methods |
| 308 | +3. Provide appropriate `AgentOs` implementation |
| 309 | +4. Register in agent factory if needed |
| 310 | + |
| 311 | +## Performance & Caching |
| 312 | + |
| 313 | +- Screenshot caching for multi-step operations |
| 314 | +- Token counting before API calls |
| 315 | +- Cached trajectory execution (replay previous interactions) |
| 316 | +- Image downsampling & compression |
| 317 | +- Lazy model initialization (`@functools.cache`) |
| 318 | + |
| 319 | +## Error Handling |
| 320 | + |
| 321 | +Custom exceptions: |
| 322 | +- `ElementNotFoundError` - UI element not found |
| 323 | +- `WaitUntilError` - Timeout waiting for condition |
| 324 | +- `MaxTokensExceededError` - Token limit exceeded |
| 325 | +- `ModelRefusalError` - Model refused to execute |
| 326 | + |
| 327 | +Retry logic with configurable strategies via `src/askui/retry.py`. |
| 328 | + |
| 329 | +## Documentation References |
| 330 | + |
| 331 | +Additional documentation in `docs/`: |
| 332 | +- `chat.md` - Chat API usage |
| 333 | +- `direct-tool-use.md` - Direct tool usage |
| 334 | +- `extracting-data.md` - Data extraction |
| 335 | +- `mcp.md` - MCP servers |
| 336 | +- `observability.md` - Logging and reporting |
| 337 | +- `telemetry.md` - Telemetry data |
| 338 | +- `using-models.md` - Model usage and custom models |
| 339 | + |
| 340 | +Official docs: https://docs.askui.com |
| 341 | +Discord: https://discord.gg/Gu35zMGxbx |
| 342 | + |
| 343 | + |
| 344 | +## Conding Standards |
| 345 | +### Anti-Patterns and Bad Examples |
| 346 | +1) Setting Env Variables In-Code |
| 347 | +```python |
| 348 | +os.environ.set("ANTHROPIC_API_KEY") |
| 349 | +```` |
| 350 | +=> we never want to set env variables by the process itself in-code. We expect them to be set in the environment directly hence explicitly setting is not necessary, or if still necessary, please pass them directly to the Client/... that requires the value. |
| 351 | + |
| 352 | +2) Don't Use Lazy Loading |
| 353 | +=> we want to have imports at the top of files. Use lazy-loading only in very rare edge-cases, e.g. if you have to check with a try-except if a package is available (in this case it should be an optional dependency) |
| 354 | + |
| 355 | +3) Client Config |
| 356 | +All lazy initialized clients should be configurable in the init method |
| 357 | + |
| 358 | +4) Be consisted with the variable namings within one classes (and its subclasses)! |
| 359 | +For example, if a parameter is named client, then the member variable that is passed to it should also be named client |
0 commit comments