Skip to content

Commit 8f3a9ea

Browse files
chore: update docs
1 parent 6217b11 commit 8f3a9ea

14 files changed

Lines changed: 2300 additions & 1143 deletions

README.md

Lines changed: 61 additions & 185 deletions
Original file line numberDiff line numberDiff line change
@@ -7,245 +7,121 @@
77

88
Join the [AskUI Discord](https://discord.gg/Gu35zMGxbx).
99

10-
## Table of Contents
11-
12-
- [🤖 AskUI Vision Agent](#-askui-vision-agent)
13-
- [Table of Contents](#table-of-contents)
14-
- [📖 Introduction](#-introduction)
15-
- [📦 Installation](#-installation)
16-
- [AskUI Python Package](#askui-python-package)
17-
- [AskUI Agent OS](#askui-agent-os)
18-
- [AMD64](#amd64)
19-
- [ARM64](#arm64)
20-
- [AMD64](#amd64-1)
21-
- [ARM64](#arm64-1)
22-
- [ARM64](#arm64-2)
23-
- [🚀 Quickstart](#-quickstart)
24-
- [🧑 Control your devices](#-control-your-devices)
25-
- [🤖 Let AI agents control your devices](#-let-ai-agents-control-your-devices)
26-
- [🔐 Sign up with AskUI](#-sign-up-with-askui)
27-
- [⚙️ Configure environment variables](#️-configure-environment-variables)
28-
- [💻 Example](#-example)
29-
- [🛠️ Extending Agents with Tool Store](#️-extending-agents-with-tool-store)
30-
- [📚 Further Documentation](#-further-documentation)
31-
- [🤝 Contributing](#-contributing)
32-
- [📜 License](#-license)
33-
34-
## 📖 Introduction
35-
36-
AskUI Vision Agent is a powerful automation framework that enables you and AI agents to control your desktop, mobile, and HMI devices and automate tasks. With support for multiple AI models, multi-platform compatibility, and enterprise-ready features,
37-
38-
https://github.com/user-attachments/assets/a74326f2-088f-48a2-ba1c-4d94d327cbdf
39-
40-
**🎯 Key Features**
41-
42-
- Support for Windows, Linux, MacOS, Android and iOS device automation (Citrix supported)
43-
- Support for single-step UI automation commands (RPA like) as well as agentic intent-based instructions
44-
- In-background automation on Windows machines (agent can create a second session; you do not have to watch it take over mouse and keyboard)
45-
- Flexible model use (hot swap of models) and infrastructure for reteaching of models (available on-premise)
46-
- Secure deployment of agents in enterprise environments
47-
48-
## 📦 Installation
49-
50-
### AskUI Python Package
51-
52-
```shell
53-
pip install askui[all]
54-
```
55-
56-
**Requires Python >=3.10**
57-
58-
### AskUI Agent OS
59-
60-
Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system. It is installed on a Desktop OS but can control also mobile devices and HMI devices connected.
10+
## Why AskUI Vision Agent?
6111

62-
It offers powerful features like
12+
Traditional UI automation is fragile. Every time a button moves, a label changes, or a layout shifts, your scripts break. You're stuck maintaining brittle selectors, writing conditional logic for edge cases, and constantly updating tests.
6313

64-
- multi-screen support,
65-
- support for all major operating systems (incl. Windows, MacOS and Linux),
66-
- process visualizations,
67-
- real Unicode character typing
68-
- and more exciting features like application selection, in background automation and video streaming are to be released soon.
14+
**AskUI Vision Agent solves this by combining two powerful approaches:**
6915

70-
<details>
71-
<summary>Windows</summary>
16+
1. **Vision-based automation** - Find UI elements by what they look like or say, not by brittle XPath or CSS selectors
17+
2. **AI-powered agents** - Give high-level instructions and let AI figure out the steps
7218

73-
#### AMD64
74-
[AskUI Installer for AMD64](https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Win-AMD64-Web.exe)
19+
Whether you're automating desktop apps, testing mobile applications, or building RPA workflows, AskUI adapts to UI changes automatically—saving you hours of maintenance work.
7520

76-
#### ARM64
77-
[AskUI Installer for ARM64](https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Win-ARM64-Web.exe)
21+
## Key Features
7822

79-
</details>
23+
- **Multi-platform** - Works on Windows, Linux, MacOS, and Android
24+
- **Two modes** - Single-step UI commands or agentic intent-based instructions
25+
- **Vision-first** - Find elements by text, images, or natural language descriptions
26+
- **Model flexibility** - Anthropic Claude, Google Gemini, AskUI models, or bring your own
27+
- **Extensible** - Add custom tools and capabilities via Model Context Protocol (MCP)
28+
- **Caching** - Save expensive calls to AI APIs by using them only when really necessary
8029

81-
<details>
82-
<summary>Linux</summary>
83-
<br>
84-
85-
**⚠️ Warning:** Agent OS currently does not work on Wayland. Switch to XOrg to use it.
86-
87-
#### AMD64
88-
```shell
89-
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run
90-
bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run
91-
```
92-
93-
#### ARM64
94-
```shell
95-
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run
96-
bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run
97-
```
30+
## Quick Examples
9831

99-
</details>
32+
### Programmatic UI Automation
10033

101-
<details>
102-
<summary>MacOS</summary>
103-
<br>
104-
105-
**⚠️ Warning:** Agent OS currently does not work on MacOS with Intel chips (x86_64/amd64 architecture). Switch to a Mac with Apple Silicon (arm64 architecture), e.g., M1, M2, M3, etc.
106-
107-
#### ARM64
108-
```shell
109-
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run
110-
bash /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run
111-
```
112-
113-
</details>
114-
115-
## 🚀 Quickstart
116-
117-
### 🧑 Control your devices
118-
119-
Double click where-ever the cursor is currently at:
34+
Control your devices with simple, vision-based commands:
12035

12136
```python
12237
from askui import VisionAgent
12338

12439
with VisionAgent() as agent:
125-
agent.click(button="left", repeat=2)
126-
```
127-
128-
By default, the agent works within the context of a display that is selected which defaults to the primary display.
129-
130-
Run the script with `python <file path>`, e.g `python test.py` to see if it works.
40+
# Click on a button by its text
41+
agent.click("Submit")
13142

132-
### 🤖 Let AI agents control your devices
43+
# Type into the currently focused field
44+
agent.type("hello@example.com")
13345

134-
In order to let AI agents control your devices, you need to be able to connect to an AI model (provider). We host some models ourselves and support several other ones, e.g. Anthropic, OpenRouter, Hugging Face, etc. out of the box. If you want to use a model provider or model that is not supported, you can easily plugin your own (see [Custom Models](docs/custom-models.md)).
46+
# Click at specific coordinates
47+
agent.click(x=100, y=200)
13548

136-
For this example, we will us AskUI as the model provider to easily get started.
137-
138-
#### 🔐 Sign up with AskUI
139-
140-
Sign up at [hub.askui.com](https://hub.askui.com) to:
141-
- Activate your **free trial** by signing up (no credit card required)
142-
- Get your workspace ID and access token
143-
144-
#### ⚙️ Configure environment variables
145-
146-
<details>
147-
<summary>Linux & MacOS</summary>
148-
149-
```shell
150-
export ASKUI_WORKSPACE_ID=<your-workspace-id-here>
151-
export ASKUI_TOKEN=<your-token-here>
49+
# Find and click an element by image
50+
agent.click(image="./assets/login-button.png")
15251
```
153-
</details>
15452

155-
<details>
156-
<summary>Windows PowerShell</summary>
53+
### Agentic UI Automation
15754

158-
```shell
159-
$env:ASKUI_WORKSPACE_ID="<your-workspace-id-here>"
160-
$env:ASKUI_TOKEN="<your-token-here>"
161-
```
162-
163-
</details>
164-
165-
#### 💻 Example
55+
Give high-level instructions and let AI handle the details:
16656

16757
```python
16858
from askui import VisionAgent
16959

17060
with VisionAgent() as agent:
171-
# Give complex instructions to the agent (may have problems with virtual displays out of the box, so make sure there is no browser opened on a virtual display that the agent may not see)
61+
# Complex multi-step instruction
17262
agent.act(
173-
"Look for a browser on the current device (checking all available displays, "
174-
"making sure window has focus),"
175-
" open a new window or tab and navigate to https://docs.askui.com"
176-
" and click on 'Search...' to open search panel. If the search panel is already "
177-
"opened, empty the search field so I can start a fresh search."
178-
)
179-
agent.type("Introduction")
180-
# Locates elements by text (you can also use images, natural language descriptions, coordinates, etc. to
181-
# describe what to click on)
182-
agent.click(
183-
"Documentation > Tutorial > Introduction",
63+
"Open a browser, navigate to GitHub, search for 'askui vision-agent', "
64+
"and star the repository"
18465
)
185-
first_paragraph = agent.get(
186-
"What does the first paragraph of the introduction say?"
187-
)
188-
print("\n--------------------------------")
189-
print("FIRST PARAGRAPH:\n")
190-
print(first_paragraph)
191-
print("--------------------------------\n\n")
192-
```
19366

194-
Run the script with `python <file path>`, e.g `python test.py`.
67+
# Extract information from the screen
68+
balance = agent.get("What is my current account balance?")
69+
print(f"Balance: {balance}")
19570

196-
If you see a lot of logs and the first paragraph of the introduction in the console, congratulations! You've successfully let AI agents control your device to automate a task! If you have any issues, please check the [documentation](https://docs.askui.com/01-tutorials/01-your-first-agent#common-issues-and-solutions) or join our [Discord](https://discord.gg/Gu35zMGxbx) for support.
71+
# Combine both approaches
72+
agent.act("Find the login form")
73+
agent.type("user@example.com")
74+
agent.click("Next")
75+
```
19776

198-
### 🛠️ Extending Agents with Tool Store
77+
### Extend with Custom Tools
19978

200-
The Tool Store provides optional tools to extend your agents' capabilities. Import tools from `askui.tools.store` and pass them to `agent.act()` or pass them to the agent constructor as `act_tools`.
79+
Add new capabilities to your agents:
20180

202-
**Example passing tools to `agent.act()`:**
20381
```python
20482
from askui import VisionAgent
20583
from askui.tools.store.computer import ComputerSaveScreenshotTool
20684
from askui.tools.store.universal import PrintToConsoleTool
20785

20886
with VisionAgent() as agent:
20987
agent.act(
210-
"Take a screenshot and save it as demo/demo.png, then print a status message",
88+
"Take a screenshot of the current screen and save it, then confirm",
21189
tools=[
21290
ComputerSaveScreenshotTool(base_dir="./screenshots"),
21391
PrintToConsoleTool()
21492
]
21593
)
21694
```
21795

218-
**Example passing tools to the agent constructor:**
219-
```python
220-
from askui import VisionAgent
221-
from askui.tools.store.computer import ComputerSaveScreenshotTool
222-
from askui.tools.store.universal import PrintToConsoleTool
96+
## Getting Started
22397

224-
with VisionAgent(act_tools=[
225-
ComputerSaveScreenshotTool(base_dir="./screenshots"),
226-
PrintToConsoleTool()
227-
]) as agent:
228-
agent.act("Take a screenshot and save it as demo/demo.png, then print a status message")
229-
```
98+
Ready to build your first agent? Check out our documentation:
23099

231-
Tools are organized by category: `universal/` (work with any agent), `computer/` (require `AgentOs`) works only with VisionAgent and `android/` (require `AndroidAgentOs`) works only with AndroidVisionAgent.
100+
1. **[Start Here](docs/00_Overview.md)** - Overview and core concepts
101+
2. **[Setup](docs/01_Setup.md)** - Installation and configuration
102+
3. **[System Prompts](docs/02_Prompting.md)** - How to write effective instructions
103+
4. **[Models](docs/03_Using-Models-and-BYOM.md)** - Using and customizing AI models
104+
5. **[Caching](docs/04_Caching.md)** - Optimize performance and costs
105+
6. **[Tools](docs/05_Tools.md)** - Extend agent capabilities
106+
7. **[Observability](docs/06_Observability-Telemetry-Tracing.md)** - Monitor and debug agents
232107

233-
## 📚 Further Documentation
108+
**Additional guides:**
109+
- [Extracting Data](docs/extracting-data.md) - Extract structured data from screens and documents
110+
- [File Support](docs/file-support.md) - Work with PDFs, images, and other file types
234111

235-
Aside from our [official documentation](https://docs.askui.com), we also have some additional guides and examples under the [docs](docs) folder that you may find useful, for example:
112+
**Official documentation:** [docs.askui.com](https://docs.askui.com)
236113

237-
- **[Direct Tool Use](docs/direct-tool-use.md)** - How to use the tools, e.g., clipboard, the Agent OS etc.
238-
- **[Extracting Data](docs/extracting-data.md)** - How to extract data from the screen and documents
239-
- **[MCP](docs/mcp.md)** - How to use MCP servers to extend the capabilities of an agent
240-
- **[Observability](docs/observability.md)** - Logging and reporting
241-
- **[Telemetry](docs/telemetry.md)** - Which data we gather and how to disable it
242-
- **[Using Models](docs/using-models.md)** - How to use different models including how to register your own custom models
114+
## Quick Install
243115

244-
## 🤝 Contributing
116+
```bash
117+
pip install askui[all]
118+
```
119+
120+
**Requires Python >=3.10**
245121

246-
We'd love your help! Contributions, ideas, and feedback are always welcome. A proper contribution guide is coming soon—stay tuned!
122+
You'll also need to install AskUI Agent OS for device control. See [Setup Guide](docs/01_Setup.md) for detailed instructions.
247123

248124

249-
## 📜 License
125+
## License
250126

251127
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

0 commit comments

Comments
 (0)