Skip to content

Commit 61e42df

Browse files
Merge pull request #224 from askui/chore/modelrouter
- Removes ModelRouter, ModelRegistry, and model_store - Introduces a provider-based configuration system (AgentSettings) - Renames VisionAgent → ComputerAgent and AndroidVisionAgent → AndroidAgent - Updates docs and examples
2 parents 04e961b + 1251cbe commit 61e42df

266 files changed

Lines changed: 6447 additions & 21879 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.template

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,17 @@
11
# Anthropic API
22
ANTHROPIC_API_KEY=
33

4+
# Google API
5+
GOOGLE_API_KEY=
6+
47
# AskUI API
58
ASKUI_INFERENCE_ENDPOINT=
69
ASKUI_TOKEN=
710
ASKUI_WORKSPACE_ID=
811

9-
# TARS API
10-
TARS_URL=
11-
TARS_API_KEY=
12-
1312
# OpenRouter
1413
OPEN_ROUTER_API_KEY=
1514

1615
# Telemetry
1716
ASKUI__VA__TELEMETRY__ENABLED=True # Set to "False" to disable telemetry
1817

19-
# OpenTelemetry Tracing Configuration
20-
#ASKUI__CHAT_API__OTEL__ENABLED=False
21-
#ASKUI__CHAT_API__OTEL__ENDPOINT=http://localhost/v1/traces
22-
#ASKUI__CHAT_API__OTEL__SECRET=
23-
#ASKUI__CHAT_API__OTEL__SERVICE_NAME=chat-api

CLAUDE.md

Lines changed: 359 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,359 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
**AskUI Vision Agent** is a Python desktop and mobile automation framework that enables AI agents to control computers (Windows, macOS, Linux), mobile devices (Android, iOS), and HMI systems. It supports both programmatic UI automation (RPA-like single-step commands) and agentic intent-based instructions using vision/computer vision models.
8+
9+
**Tech Stack:** Python 3.10+, Pydantic 2, Anthropic SDK, OpenTelemetry, Model Context Protocol (MCP), PDM
10+
11+
## Common Commands
12+
13+
### Development Setup
14+
```bash
15+
# Install dependencies
16+
pdm install
17+
```
18+
19+
### Testing
20+
```bash
21+
# Run all tests (parallel execution)
22+
pdm run test
23+
24+
# Run specific test suites
25+
pdm run test:unit # Unit tests only
26+
pdm run test:integration # Integration tests only
27+
pdm run test:e2e # End-to-end tests only
28+
29+
# Run tests with coverage
30+
pdm run test:cov # All tests with coverage report
31+
pdm run test:cov:view # View coverage report in browser
32+
```
33+
34+
### Code Quality
35+
```bash
36+
# Quick QA: type check, format, and fix linting issues (run before commits)
37+
pdm run qa:fix
38+
39+
# Individual commands
40+
pdm run typecheck:all # Type checking with mypy
41+
pdm run format # Format code with ruff
42+
pdm run lint # Lint code with ruff
43+
pdm run lint:fix # Auto-fix linting issues
44+
```
45+
46+
### Code Generation
47+
```bash
48+
# Regenerate gRPC client code from .proto files
49+
pdm run grpc:gen
50+
51+
# Regenerate Pydantic models from JSON schemas
52+
pdm run json:gen
53+
```
54+
55+
## High-Level Architecture
56+
57+
### Core SDK Architecture
58+
59+
```
60+
ComputerAgent (Main SDK Entry Point)
61+
62+
Agent (Abstract base class for all agents)
63+
├── ComputerAgent (Desktop automation)
64+
├── AndroidAgent (Mobile Android automation)
65+
├── WebVisionAgent (Web-specific automation)
66+
└── WebTestingAgent (Web testing framework)
67+
68+
Uses:
69+
├── ModelRouter → Model selection/composition
70+
├── AgentToolbox → Tool & OS abstraction
71+
└── Locators → UI element identification
72+
```
73+
74+
**Key Flow:**
75+
1. User calls `agent.click("Submit button")` on `ComputerAgent`
76+
2. `AgentBase.locate()` routes to appropriate model via `ModelRouter`
77+
3. Model receives screenshot + locator → returns coordinates
78+
4. `AgentToolbox.os.click()` → gRPC call to Agent OS
79+
5. Agent OS performs actual mouse click
80+
81+
### Chat API Architecture
82+
83+
```
84+
FastAPI Chat API (Experimental)
85+
├── Assistants (AI agent configurations)
86+
├── Threads (Conversation sessions)
87+
├── Messages (Chat history)
88+
├── Runs (Agent execution iterations)
89+
├── Files (Attachments & resources)
90+
├── MCP Configs (Tool providers)
91+
└── Workflows & Scheduled Jobs (Automation triggers)
92+
```
93+
94+
**Key Flow:**
95+
1. User → Chat UI (hub.askui.com) → Chat API (FastAPI)
96+
2. Thread/Messages stored in SQLAlchemy database
97+
3. Runs execute agent steps in a loop
98+
4. Agent uses ModelRouter → Tools (MCP servers or direct) → AgentOS
99+
100+
### Model Router & Composition
101+
102+
The `ModelRouter` provides a flexible abstraction for AI model selection:
103+
104+
```python
105+
# Single model for all tasks
106+
model = "askui"
107+
108+
# Task-specific models (ActModel, GetModel, LocateModel)
109+
model = {
110+
"act": "claude-sonnet-4-20250514",
111+
"get": "askui",
112+
"locate": "askui-combo"
113+
}
114+
115+
# Custom registry
116+
models = ModelRegistry()
117+
models.register("my-model", custom_model_instance)
118+
```
119+
120+
**Supported Model Providers:**
121+
- **AskUI Models** (Primary - internally hosted)
122+
- **Anthropic Claude** (Computer Use, Messages API)
123+
- **Google Gemini** (via OpenRouter)
124+
- **Hugging Face Spaces** (Community models)
125+
126+
### Agent OS Abstraction
127+
128+
`AgentOs` provides an abstraction layer for OS-level operations:
129+
130+
```
131+
AgentOs (Abstract Interface)
132+
├── AskUiControllerClient (gRPC to AskUI Agent OS - primary)
133+
├── PlaywrightAgentOs (Web browser automation)
134+
└── AndroidAgentOs (Android ADB)
135+
```
136+
137+
### Locator System
138+
139+
Locators identify UI elements in multiple ways:
140+
141+
- **Text**: Match by text content (exact/similar/contains/regex)
142+
- **Image**: Match by image file or base64
143+
- **Prompt**: Natural language description
144+
- **Coordinate**: Absolute (x, y) position
145+
- **Relatable**: Positional relationships (right_of, below, etc.)
146+
147+
Serialization differs by model type (VLM vs. traditional).
148+
149+
### Tool System (MCP)
150+
151+
Tools follow the Model Context Protocol (MCP) for extensibility:
152+
153+
```
154+
Tools (MCP Servers)
155+
├── Computer: screenshot, click, type, mouse, clipboard
156+
├── Android: device control via ADB
157+
├── Testing: scenario & feature management
158+
└── Utility: file ops, data extraction
159+
```
160+
161+
Tools are auto-discovered and can be dynamically loaded via MCP configurations.
162+
163+
## Key Code Locations
164+
165+
### Core SDK
166+
- `src/askui/agent.py` - Main `ComputerAgent` class (user-facing API)
167+
- `src/askui/agent_base.py` - Abstract `Agent` (base) with shared agent logic
168+
- `src/askui/android_agent.py` - Android-specific agent
169+
- `src/askui/web_agent.py` - Web-specific agent
170+
171+
### Models & AI
172+
- `src/askui/models/` - AI model providers & router factory
173+
- `src/askui/models/shared/` - Shared abstractions (`Agent`, `Tool`, `MessagesApi`)
174+
- `src/askui/models/{provider}/` - Provider implementations
175+
- `src/askui/prompts/` - System prompts for different models
176+
177+
### Tools & OS
178+
- `src/askui/tools/agent_os.py` - Abstract `AgentOs` interface
179+
- `src/askui/tools/askui/` - gRPC client for AskUI Agent OS
180+
- `src/askui/tools/android/` - Android-specific tools
181+
- `src/askui/tools/playwright/` - Web automation tools
182+
- `src/askui/tools/mcp/` - MCP client/server implementations
183+
- `src/askui/tools/testing/` - Test scenario tools
184+
185+
### Locators
186+
- `src/askui/locators/` - UI element selectors
187+
- `src/askui/locators/serializers.py` - Locator serialization for models
188+
189+
### Chat API
190+
- `src/askui/chat/` - FastAPI-based Chat API
191+
- `src/askui/chat/api/` - REST API routes
192+
- `src/askui/chat/migrations/` - Alembic migrations & ORM models
193+
194+
### Utilities
195+
- `src/askui/utils/` - Image processing, API utilities, caching, annotations
196+
- `src/askui/reporting.py` - Reporting & logging
197+
- `src/askui/retry.py` - Retry logic with exponential backoff
198+
- `src/askui/telemetry/` - OpenTelemetry tracing & analytics
199+
200+
## Code Style & Conventions
201+
202+
### General Python Style
203+
- **Private members**: Use `_` prefix for all private variables, functions, methods, etc. Mark everything private that doesn't need external access.
204+
- **Type hints**: Required everywhere. Use built-in types (`list`, `dict`, `str | None`) instead of `typing` module types (`List`, `Dict`, `Optional`).
205+
- **Overrides**: Use `@override` decorator from `typing_extensions` for all overridden methods.
206+
- **Exceptions**: Never pass literals to exceptions. Assign to variables first:
207+
```python
208+
# Good
209+
error_msg = f"Thread {thread_id} not found"
210+
raise FileNotFoundError(error_msg)
211+
212+
# Bad
213+
raise FileNotFoundError(f"Thread {thread_id} not found")
214+
```
215+
- **File operations**: Always specify `encoding="utf-8"` for file read/write operations.
216+
- **Init files**: Create `__init__.py` in each folder.
217+
218+
### FastAPI Specific
219+
- Use response type in function signature instead of `response_model` in route annotation.
220+
- Dependencies without defaults should come before arguments with defaults.
221+
222+
### Testing
223+
- Use `pytest-mock` for mocking wherever possible.
224+
- Test files in `tests/` follow structure: `test_*.py` with `Test*` classes and `test_*` functions.
225+
- Timeout: 60 seconds per test (configured in `pyproject.toml`).
226+
227+
### Git Conventions
228+
- **Never** use `git add .` - explicitly add files related to the task.
229+
- Use conventional commits format: `feat:`, `fix:`, `docs:`, `style:`, `refactor:`, `test:`, `chore:`.
230+
- **Before committing**, always run: `pdm run qa:fix` (or individually: `typecheck:all`, `format`, `lint:fix`).
231+
232+
### Docstrings
233+
- All public functions, classes, and constants require docstrings.
234+
- Document constructor args in class docstring, omit `__init__` docstring.
235+
- Use backticks for code references (variables, types, functions).
236+
- Function references: `click()`, Class references: `ComputerAgent`, Method references: `VisionAgent.click()`
237+
- Include sections: `Args`, `Returns`, `Raises`, `Example`, `Notes`, `See Also` as needed.
238+
- Document parameter types in parentheses, add `, optional` for defaults.
239+
240+
### Documentation (docs/)
241+
When writing or updating documentation in `docs/`:
242+
- **Never show setting environment variables in Python code** (e.g., `os.environ["ASKUI_WORKSPACE_ID"] = "..."`). This is bad practice. Always instruct users to set environment variables via their shell or system settings.
243+
- Keep examples concise and focused on the feature being documented.
244+
- Test all code examples before including them.
245+
- Use `ComputerAgent` (not `VisionAgent`) in examples.
246+
247+
## Important Patterns
248+
249+
### Composition over Inheritance
250+
- `AgentToolbox` wraps `AgentOs` implementations
251+
- `ModelRouter` composes multiple model providers
252+
- `CompositeReporter` aggregates multiple reporters
253+
254+
### Factory Pattern
255+
- `ModelRouter.initialize_default_model_registry()` creates model registry
256+
- Model providers use factory functions for lazy-loading
257+
258+
### Strategy Pattern
259+
- Truncation strategies for message history
260+
- Different locator serializers for model types
261+
- Retry strategies with exponential backoff
262+
263+
### Adapter Pattern
264+
- `AgentOs` abstraction bridges OS implementations (gRPC, Playwright, ADB)
265+
- `ModelFacade` adapts models to `ActModel`/`GetModel`/`LocateModel` interfaces
266+
267+
### Dependency Injection
268+
- Constructor-based DI throughout
269+
- FastAPI dependencies for Chat API routes
270+
- `@auto_inject_agent_os` decorator for tools
271+
272+
### Template Method Pattern
273+
- `Agent._step()` orchestrates tool-calling loop
274+
- `Agent` provides common structure for all agents
275+
276+
## Database & Observability
277+
278+
### Alembic Migrations
279+
- Schema versioning in `src/askui/chat/migrations/`
280+
- ORM models in `migrations/shared/{entity}/models.py`
281+
- Auto-migration on startup (configurable)
282+
- SQLAlchemy with async support
283+
284+
### Telemetry
285+
- OpenTelemetry integration (FastAPI, HTTPX, SQLAlchemy)
286+
- Structured logging with structlog
287+
- Correlation IDs for request tracing
288+
- Prometheus metrics via FastAPI instrumentator
289+
- Segment Analytics for usage tracking
290+
291+
## Extending the Framework
292+
293+
### Adding Custom Models
294+
1. Inherit from `ActModel`, `GetModel`, or `LocateModel`
295+
2. Implement message creation via `MessagesApi`
296+
3. Register in `ModelRegistry`
297+
4. Use appropriate locator serializer
298+
299+
### Adding Custom Tools
300+
1. Implement `Tool` protocol in `models/shared/tools.py`
301+
2. Register in appropriate MCP server (`api/mcp_servers/{type}.py`)
302+
3. Use `@auto_inject_agent_os` for AgentOs dependency
303+
4. Follow Pydantic schema validation
304+
305+
### Adding New Agent Types
306+
1. Inherit from `Agent`
307+
2. Implement required abstract methods
308+
3. Provide appropriate `AgentOs` implementation
309+
4. Register in agent factory if needed
310+
311+
## Performance & Caching
312+
313+
- Screenshot caching for multi-step operations
314+
- Token counting before API calls
315+
- Cached trajectory execution (replay previous interactions)
316+
- Image downsampling & compression
317+
- Lazy model initialization (`@functools.cache`)
318+
319+
## Error Handling
320+
321+
Custom exceptions:
322+
- `ElementNotFoundError` - UI element not found
323+
- `WaitUntilError` - Timeout waiting for condition
324+
- `MaxTokensExceededError` - Token limit exceeded
325+
- `ModelRefusalError` - Model refused to execute
326+
327+
Retry logic with configurable strategies via `src/askui/retry.py`.
328+
329+
## Documentation References
330+
331+
Additional documentation in `docs/`:
332+
- `chat.md` - Chat API usage
333+
- `direct-tool-use.md` - Direct tool usage
334+
- `extracting-data.md` - Data extraction
335+
- `mcp.md` - MCP servers
336+
- `observability.md` - Logging and reporting
337+
- `telemetry.md` - Telemetry data
338+
- `using-models.md` - Model usage and custom models
339+
340+
Official docs: https://docs.askui.com
341+
Discord: https://discord.gg/Gu35zMGxbx
342+
343+
344+
## Conding Standards
345+
### Anti-Patterns and Bad Examples
346+
1) Setting Env Variables In-Code
347+
```python
348+
os.environ.set("ANTHROPIC_API_KEY")
349+
````
350+
=> we never want to set env variables by the process itself in-code. We expect them to be set in the environment directly hence explicitly setting is not necessary, or if still necessary, please pass them directly to the Client/... that requires the value.
351+
352+
2) Don't Use Lazy Loading
353+
=> we want to have imports at the top of files. Use lazy-loading only in very rare edge-cases, e.g. if you have to check with a try-except if a package is available (in this case it should be an optional dependency)
354+
355+
3) Client Config
356+
All lazy initialized clients should be configurable in the init method
357+
358+
4) Be consisted with the variable namings within one classes (and its subclasses)!
359+
For example, if a parameter is named client, then the member variable that is passed to it should also be named client

0 commit comments

Comments
 (0)