|
7 | 7 |
|
8 | 8 | Join the [AskUI Discord](https://discord.gg/Gu35zMGxbx). |
9 | 9 |
|
10 | | -## Table of Contents |
11 | | - |
12 | | -- [🤖 AskUI Vision Agent](#-askui-vision-agent) |
13 | | - - [Table of Contents](#table-of-contents) |
14 | | - - [📖 Introduction](#-introduction) |
15 | | - - [📦 Installation](#-installation) |
16 | | - - [AskUI Python Package](#askui-python-package) |
17 | | - - [AskUI Agent OS](#askui-agent-os) |
18 | | - - [AMD64](#amd64) |
19 | | - - [ARM64](#arm64) |
20 | | - - [AMD64](#amd64-1) |
21 | | - - [ARM64](#arm64-1) |
22 | | - - [ARM64](#arm64-2) |
23 | | - - [🚀 Quickstart](#-quickstart) |
24 | | - - [🧑 Control your devices](#-control-your-devices) |
25 | | - - [🤖 Let AI agents control your devices](#-let-ai-agents-control-your-devices) |
26 | | - - [🔐 Sign up with AskUI](#-sign-up-with-askui) |
27 | | - - [⚙️ Configure environment variables](#️-configure-environment-variables) |
28 | | - - [💻 Example](#-example) |
29 | | - - [🛠️ Extending Agents with Tool Store](#️-extending-agents-with-tool-store) |
30 | | - - [📚 Further Documentation](#-further-documentation) |
31 | | - - [🤝 Contributing](#-contributing) |
32 | | - - [📜 License](#-license) |
33 | | - |
34 | | -## 📖 Introduction |
35 | | - |
36 | | -AskUI Vision Agent is a powerful automation framework that enables you and AI agents to control your desktop, mobile, and HMI devices and automate tasks. With support for multiple AI models, multi-platform compatibility, and enterprise-ready features, |
37 | | - |
38 | | -https://github.com/user-attachments/assets/a74326f2-088f-48a2-ba1c-4d94d327cbdf |
39 | | - |
40 | | -**🎯 Key Features** |
41 | | - |
42 | | -- Support for Windows, Linux, MacOS, Android and iOS device automation (Citrix supported) |
43 | | -- Support for single-step UI automation commands (RPA like) as well as agentic intent-based instructions |
44 | | -- In-background automation on Windows machines (agent can create a second session; you do not have to watch it take over mouse and keyboard) |
45 | | -- Flexible model use (hot swap of models) and infrastructure for reteaching of models (available on-premise) |
46 | | -- Secure deployment of agents in enterprise environments |
47 | | - |
48 | | -## 📦 Installation |
49 | | - |
50 | | -### AskUI Python Package |
51 | | - |
52 | | -```shell |
53 | | -pip install askui[all] |
54 | | -``` |
55 | | - |
56 | | -**Requires Python >=3.10** |
57 | | - |
58 | | -### AskUI Agent OS |
59 | | - |
60 | | -Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system. It is installed on a Desktop OS but can control also mobile devices and HMI devices connected. |
| 10 | +## Why AskUI Vision Agent? |
61 | 11 |
|
62 | | -It offers powerful features like |
| 12 | +Traditional UI automation is fragile. Every time a button moves, a label changes, or a layout shifts, your scripts break. You're stuck maintaining brittle selectors, writing conditional logic for edge cases, and constantly updating tests. |
63 | 13 |
|
64 | | -- multi-screen support, |
65 | | -- support for all major operating systems (incl. Windows, MacOS and Linux), |
66 | | -- process visualizations, |
67 | | -- real Unicode character typing |
68 | | -- and more exciting features like application selection, in background automation and video streaming are to be released soon. |
| 14 | +**AskUI Vision Agent solves this by combining two powerful approaches:** |
69 | 15 |
|
70 | | -<details> |
71 | | -<summary>Windows</summary> |
| 16 | +1. **Vision-based automation** - Find UI elements by what they look like or say, not by brittle XPath or CSS selectors |
| 17 | +2. **AI-powered agents** - Give high-level instructions and let AI figure out the steps |
72 | 18 |
|
73 | | -#### AMD64 |
74 | | -[AskUI Installer for AMD64](https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Win-AMD64-Web.exe) |
| 19 | +Whether you're automating desktop apps, testing mobile applications, or building RPA workflows, AskUI adapts to UI changes automatically—saving you hours of maintenance work. |
75 | 20 |
|
76 | | -#### ARM64 |
77 | | -[AskUI Installer for ARM64](https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Win-ARM64-Web.exe) |
| 21 | +## Key Features |
78 | 22 |
|
79 | | -</details> |
| 23 | +- **Multi-platform** - Works on Windows, Linux, MacOS, and Android |
| 24 | +- **Two modes** - Single-step UI commands or agentic intent-based instructions |
| 25 | +- **Vision-first** - Find elements by text, images, or natural language descriptions |
| 26 | +- **Model flexibility** - Anthropic Claude, Google Gemini, AskUI models, or bring your own |
| 27 | +- **Extensible** - Add custom tools and capabilities via Model Context Protocol (MCP) |
| 28 | +- **Caching** - Save expensive calls to AI APIs by using them only when really necessary |
80 | 29 |
|
81 | | -<details> |
82 | | -<summary>Linux</summary> |
83 | | -<br> |
84 | | - |
85 | | -**⚠️ Warning:** Agent OS currently does not work on Wayland. Switch to XOrg to use it. |
86 | | - |
87 | | -#### AMD64 |
88 | | -```shell |
89 | | -curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run |
90 | | -bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run |
91 | | -``` |
92 | | - |
93 | | -#### ARM64 |
94 | | -```shell |
95 | | -curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run |
96 | | -bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run |
97 | | -``` |
| 30 | +## Quick Examples |
98 | 31 |
|
99 | | -</details> |
| 32 | +### Programmatic UI Automation |
100 | 33 |
|
101 | | -<details> |
102 | | -<summary>MacOS</summary> |
103 | | -<br> |
104 | | - |
105 | | -**⚠️ Warning:** Agent OS currently does not work on MacOS with Intel chips (x86_64/amd64 architecture). Switch to a Mac with Apple Silicon (arm64 architecture), e.g., M1, M2, M3, etc. |
106 | | - |
107 | | -#### ARM64 |
108 | | -```shell |
109 | | -curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run |
110 | | -bash /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run |
111 | | -``` |
112 | | - |
113 | | -</details> |
114 | | - |
115 | | -## 🚀 Quickstart |
116 | | - |
117 | | -### 🧑 Control your devices |
118 | | - |
119 | | -Double click where-ever the cursor is currently at: |
| 34 | +Control your devices with simple, vision-based commands: |
120 | 35 |
|
121 | 36 | ```python |
122 | 37 | from askui import VisionAgent |
123 | 38 |
|
124 | 39 | with VisionAgent() as agent: |
125 | | - agent.click(button="left", repeat=2) |
126 | | -``` |
127 | | - |
128 | | -By default, the agent works within the context of a display that is selected which defaults to the primary display. |
129 | | - |
130 | | -Run the script with `python <file path>`, e.g `python test.py` to see if it works. |
| 40 | + # Click on a button by its text |
| 41 | + agent.click("Submit") |
131 | 42 |
|
132 | | -### 🤖 Let AI agents control your devices |
| 43 | + # Type into the currently focused field |
| 44 | + agent.type("hello@example.com") |
133 | 45 |
|
134 | | -In order to let AI agents control your devices, you need to be able to connect to an AI model (provider). We host some models ourselves and support several other ones, e.g. Anthropic, OpenRouter, Hugging Face, etc. out of the box. If you want to use a model provider or model that is not supported, you can easily plugin your own (see [Custom Models](docs/custom-models.md)). |
| 46 | + # Click at specific coordinates |
| 47 | + agent.click(x=100, y=200) |
135 | 48 |
|
136 | | -For this example, we will us AskUI as the model provider to easily get started. |
137 | | - |
138 | | -#### 🔐 Sign up with AskUI |
139 | | - |
140 | | -Sign up at [hub.askui.com](https://hub.askui.com) to: |
141 | | -- Activate your **free trial** by signing up (no credit card required) |
142 | | -- Get your workspace ID and access token |
143 | | - |
144 | | -#### ⚙️ Configure environment variables |
145 | | - |
146 | | -<details> |
147 | | -<summary>Linux & MacOS</summary> |
148 | | - |
149 | | -```shell |
150 | | -export ASKUI_WORKSPACE_ID=<your-workspace-id-here> |
151 | | -export ASKUI_TOKEN=<your-token-here> |
| 49 | + # Find and click an element by image |
| 50 | + agent.click(image="./assets/login-button.png") |
152 | 51 | ``` |
153 | | -</details> |
154 | 52 |
|
155 | | -<details> |
156 | | -<summary>Windows PowerShell</summary> |
| 53 | +### Agentic UI Automation |
157 | 54 |
|
158 | | -```shell |
159 | | -$env:ASKUI_WORKSPACE_ID="<your-workspace-id-here>" |
160 | | -$env:ASKUI_TOKEN="<your-token-here>" |
161 | | -``` |
162 | | - |
163 | | -</details> |
164 | | - |
165 | | -#### 💻 Example |
| 55 | +Give high-level instructions and let AI handle the details: |
166 | 56 |
|
167 | 57 | ```python |
168 | 58 | from askui import VisionAgent |
169 | 59 |
|
170 | 60 | with VisionAgent() as agent: |
171 | | - # Give complex instructions to the agent (may have problems with virtual displays out of the box, so make sure there is no browser opened on a virtual display that the agent may not see) |
| 61 | + # Complex multi-step instruction |
172 | 62 | agent.act( |
173 | | - "Look for a browser on the current device (checking all available displays, " |
174 | | - "making sure window has focus)," |
175 | | - " open a new window or tab and navigate to https://docs.askui.com" |
176 | | - " and click on 'Search...' to open search panel. If the search panel is already " |
177 | | - "opened, empty the search field so I can start a fresh search." |
178 | | - ) |
179 | | - agent.type("Introduction") |
180 | | - # Locates elements by text (you can also use images, natural language descriptions, coordinates, etc. to |
181 | | - # describe what to click on) |
182 | | - agent.click( |
183 | | - "Documentation > Tutorial > Introduction", |
| 63 | + "Open a browser, navigate to GitHub, search for 'askui vision-agent', " |
| 64 | + "and star the repository" |
184 | 65 | ) |
185 | | - first_paragraph = agent.get( |
186 | | - "What does the first paragraph of the introduction say?" |
187 | | - ) |
188 | | - print("\n--------------------------------") |
189 | | - print("FIRST PARAGRAPH:\n") |
190 | | - print(first_paragraph) |
191 | | - print("--------------------------------\n\n") |
192 | | -``` |
193 | 66 |
|
194 | | -Run the script with `python <file path>`, e.g `python test.py`. |
| 67 | + # Extract information from the screen |
| 68 | + balance = agent.get("What is my current account balance?") |
| 69 | + print(f"Balance: {balance}") |
195 | 70 |
|
196 | | -If you see a lot of logs and the first paragraph of the introduction in the console, congratulations! You've successfully let AI agents control your device to automate a task! If you have any issues, please check the [documentation](https://docs.askui.com/01-tutorials/01-your-first-agent#common-issues-and-solutions) or join our [Discord](https://discord.gg/Gu35zMGxbx) for support. |
| 71 | + # Combine both approaches |
| 72 | + agent.act("Find the login form") |
| 73 | + agent.type("user@example.com") |
| 74 | + agent.click("Next") |
| 75 | +``` |
197 | 76 |
|
198 | | -### 🛠️ Extending Agents with Tool Store |
| 77 | +### Extend with Custom Tools |
199 | 78 |
|
200 | | -The Tool Store provides optional tools to extend your agents' capabilities. Import tools from `askui.tools.store` and pass them to `agent.act()` or pass them to the agent constructor as `act_tools`. |
| 79 | +Add new capabilities to your agents: |
201 | 80 |
|
202 | | -**Example passing tools to `agent.act()`:** |
203 | 81 | ```python |
204 | 82 | from askui import VisionAgent |
205 | 83 | from askui.tools.store.computer import ComputerSaveScreenshotTool |
206 | 84 | from askui.tools.store.universal import PrintToConsoleTool |
207 | 85 |
|
208 | 86 | with VisionAgent() as agent: |
209 | 87 | agent.act( |
210 | | - "Take a screenshot and save it as demo/demo.png, then print a status message", |
| 88 | + "Take a screenshot of the current screen and save it, then confirm", |
211 | 89 | tools=[ |
212 | 90 | ComputerSaveScreenshotTool(base_dir="./screenshots"), |
213 | 91 | PrintToConsoleTool() |
214 | 92 | ] |
215 | 93 | ) |
216 | 94 | ``` |
217 | 95 |
|
218 | | -**Example passing tools to the agent constructor:** |
219 | | -```python |
220 | | -from askui import VisionAgent |
221 | | -from askui.tools.store.computer import ComputerSaveScreenshotTool |
222 | | -from askui.tools.store.universal import PrintToConsoleTool |
| 96 | +## Getting Started |
223 | 97 |
|
224 | | -with VisionAgent(act_tools=[ |
225 | | - ComputerSaveScreenshotTool(base_dir="./screenshots"), |
226 | | - PrintToConsoleTool() |
227 | | -]) as agent: |
228 | | - agent.act("Take a screenshot and save it as demo/demo.png, then print a status message") |
229 | | -``` |
| 98 | +Ready to build your first agent? Check out our documentation: |
230 | 99 |
|
231 | | -Tools are organized by category: `universal/` (work with any agent), `computer/` (require `AgentOs`) works only with VisionAgent and `android/` (require `AndroidAgentOs`) works only with AndroidVisionAgent. |
| 100 | +1. **[Start Here](docs/00_Overview.md)** - Overview and core concepts |
| 101 | +2. **[Setup](docs/01_Setup.md)** - Installation and configuration |
| 102 | +3. **[System Prompts](docs/02_Prompting.md)** - How to write effective instructions |
| 103 | +4. **[Models](docs/03_Using-Models-and-BYOM.md)** - Using and customizing AI models |
| 104 | +5. **[Caching](docs/04_Caching.md)** - Optimize performance and costs |
| 105 | +6. **[Tools](docs/05_Tools.md)** - Extend agent capabilities |
| 106 | +7. **[Observability](docs/06_Observability-Telemetry-Tracing.md)** - Monitor and debug agents |
232 | 107 |
|
233 | | -## 📚 Further Documentation |
| 108 | +**Additional guides:** |
| 109 | +- [Extracting Data](docs/extracting-data.md) - Extract structured data from screens and documents |
| 110 | +- [File Support](docs/file-support.md) - Work with PDFs, images, and other file types |
234 | 111 |
|
235 | | -Aside from our [official documentation](https://docs.askui.com), we also have some additional guides and examples under the [docs](docs) folder that you may find useful, for example: |
| 112 | +**Official documentation:** [docs.askui.com](https://docs.askui.com) |
236 | 113 |
|
237 | | -- **[Direct Tool Use](docs/direct-tool-use.md)** - How to use the tools, e.g., clipboard, the Agent OS etc. |
238 | | -- **[Extracting Data](docs/extracting-data.md)** - How to extract data from the screen and documents |
239 | | -- **[MCP](docs/mcp.md)** - How to use MCP servers to extend the capabilities of an agent |
240 | | -- **[Observability](docs/observability.md)** - Logging and reporting |
241 | | -- **[Telemetry](docs/telemetry.md)** - Which data we gather and how to disable it |
242 | | -- **[Using Models](docs/using-models.md)** - How to use different models including how to register your own custom models |
| 114 | +## Quick Install |
243 | 115 |
|
244 | | -## 🤝 Contributing |
| 116 | +```bash |
| 117 | +pip install askui[all] |
| 118 | +``` |
| 119 | + |
| 120 | +**Requires Python >=3.10** |
245 | 121 |
|
246 | | -We'd love your help! Contributions, ideas, and feedback are always welcome. A proper contribution guide is coming soon—stay tuned! |
| 122 | +You'll also need to install AskUI Agent OS for device control. See [Setup Guide](docs/01_Setup.md) for detailed instructions. |
247 | 123 |
|
248 | 124 |
|
249 | | -## 📜 License |
| 125 | +## License |
250 | 126 |
|
251 | 127 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
0 commit comments