Skip to content

Commit 483bc4a

Browse files
Merge pull request #220 from askui/feat/refactor-tools-auto-inject-agent-os
refactor(tools): auto-inject agent OS and split computer tool - Split monolithic Computer20250124Tool into 16 individual tool classes - Add ComputerBaseTool/AndroidBaseTool with automatic agent_os injection - Enhance ToolCollection to auto-inject agent_os via required_tags - Introduce ComputerAgentOsFacade for coordinate scaling - Simplify agent initialization by removing beta flag logic - Reorganize geometry types to askui.models.types.geometry - Introduce tool store (askui.tools.store) with optional tools organized by category (computer, android, universal) Improves maintainability and type safety while maintaining backward compatibility.
2 parents 5c5d846 + cbadfa7 commit 483bc4a

62 files changed

Lines changed: 1956 additions & 973 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,27 @@ Join the [AskUI Discord](https://discord.gg/Gu35zMGxbx).
99

1010
## Table of Contents
1111

12-
- [📖 Introduction](#-introduction)
13-
- [📦 Installation](#-installation)
14-
- [AskUI Python Package](#askui-python-package)
15-
- [AskUI Agent OS](#askui-agent-os)
16-
- [🚀 Quickstart](#-quickstart)
12+
- [🤖 AskUI Vision Agent](#-askui-vision-agent)
13+
- [Table of Contents](#table-of-contents)
14+
- [📖 Introduction](#-introduction)
15+
- [📦 Installation](#-installation)
16+
- [AskUI Python Package](#askui-python-package)
17+
- [AskUI Agent OS](#askui-agent-os)
18+
- [AMD64](#amd64)
19+
- [ARM64](#arm64)
20+
- [AMD64](#amd64-1)
21+
- [ARM64](#arm64-1)
22+
- [ARM64](#arm64-2)
23+
- [🚀 Quickstart](#-quickstart)
1724
- [🧑 Control your devices](#-control-your-devices)
1825
- [🤖 Let AI agents control your devices](#-let-ai-agents-control-your-devices)
19-
- [📚 Further Documentation](#-further-documentation)
20-
- [🤝 Contributing](#-contributing)
21-
- [📜 License](#-license)
26+
- [🔐 Sign up with AskUI](#-sign-up-with-askui)
27+
- [⚙️ Configure environment variables](#️-configure-environment-variables)
28+
- [💻 Example](#-example)
29+
- [🛠️ Extending Agents with Tool Store](#️-extending-agents-with-tool-store)
30+
- [📚 Further Documentation](#-further-documentation)
31+
- [🤝 Contributing](#-contributing)
32+
- [📜 License](#-license)
2233

2334
## 📖 Introduction
2435

@@ -184,6 +195,41 @@ Run the script with `python <file path>`, e.g `python test.py`.
184195

185196
If you see a lot of logs and the first paragraph of the introduction in the console, congratulations! You've successfully let AI agents control your device to automate a task! If you have any issues, please check the [documentation](https://docs.askui.com/01-tutorials/01-your-first-agent#common-issues-and-solutions) or join our [Discord](https://discord.gg/Gu35zMGxbx) for support.
186197

198+
### 🛠️ Extending Agents with Tool Store
199+
200+
The Tool Store provides optional tools to extend your agents' capabilities. Import tools from `askui.tools.store` and pass them to `agent.act()` or pass them to the agent constructor as `act_tools`.
201+
202+
**Example passing tools to `agent.act()`:**
203+
```python
204+
from askui import VisionAgent
205+
from askui.tools.store.computer import ComputerSaveScreenshotTool
206+
from askui.tools.store.universal import PrintToConsoleTool
207+
208+
with VisionAgent() as agent:
209+
agent.act(
210+
"Take a screenshot and save it as demo/demo.png, then print a status message",
211+
tools=[
212+
ComputerSaveScreenshotTool(base_dir="./screenshots"),
213+
PrintToConsoleTool()
214+
]
215+
)
216+
```
217+
218+
**Example passing tools to the agent constructor:**
219+
```python
220+
from askui import VisionAgent
221+
from askui.tools.store.computer import ComputerSaveScreenshotTool
222+
from askui.tools.store.universal import PrintToConsoleTool
223+
224+
with VisionAgent(act_tools=[
225+
ComputerSaveScreenshotTool(base_dir="./screenshots"),
226+
PrintToConsoleTool()
227+
]) as agent:
228+
agent.act("Take a screenshot and save it as demo/demo.png, then print a status message")
229+
```
230+
231+
Tools are organized by category: `universal/` (work with any agent), `computer/` (require `AgentOs`) works only with VisionAgent and `android/` (require `AndroidAgentOs`) works only with AndroidVisionAgent.
232+
187233
## 📚 Further Documentation
188234

189235
Aside from our [official documentation](https://docs.askui.com), we also have some additional guides and examples under the [docs](docs) folder that you may find useful, for example:

src/askui/agent.py

Lines changed: 43 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,39 @@
11
import logging
2-
from typing import TYPE_CHECKING, Annotated, Literal, Optional
2+
from typing import Annotated, Literal, Optional
33

4-
from anthropic import Omit, omit
54
from pydantic import ConfigDict, Field, validate_call
6-
from typing_extensions import override
75

86
from askui.agent_base import AgentBase
97
from askui.container import telemetry
108
from askui.locators.locators import Locator
11-
from askui.models.shared.settings import (
12-
COMPUTER_USE_20250124_BETA_FLAG,
13-
COMPUTER_USE_20251124_BETA_FLAG,
14-
ActSettings,
15-
MessageSettings,
16-
)
9+
from askui.models.shared.settings import ActSettings, MessageSettings
1710
from askui.models.shared.tools import Tool
1811
from askui.prompts.system import COMPUTER_AGENT_SYSTEM_PROMPT
19-
from askui.tools.computer import Computer20250124Tool
12+
from askui.tools.computer import (
13+
ComputerGetMousePositionTool,
14+
ComputerKeyboardPressedTool,
15+
ComputerKeyboardReleaseTool,
16+
ComputerKeyboardTapTool,
17+
ComputerListDisplaysTool,
18+
ComputerMouseClickTool,
19+
ComputerMouseHoldDownTool,
20+
ComputerMouseReleaseTool,
21+
ComputerMouseScrollTool,
22+
ComputerMoveMouseTool,
23+
ComputerRetrieveActiveDisplayTool,
24+
ComputerScreenshotTool,
25+
ComputerSetActiveDisplayTool,
26+
ComputerTypeTool,
27+
)
2028
from askui.tools.exception_tool import ExceptionTool
21-
from askui.tools.list_displays_tool import ListDisplaysTool
22-
from askui.tools.retrieve_active_display_tool import RetrieveActiveDisplayTool
23-
from askui.tools.set_active_display_tool import SetActiveDisplayTool
2429

2530
from .models import ModelComposition
2631
from .models.models import ModelChoice, ModelRegistry, Point
2732
from .reporting import CompositeReporter, Reporter
2833
from .retry import Retry
29-
from .tools import AgentToolbox, ModifierKey, PcKey
34+
from .tools import AgentToolbox, ComputerAgentOsFacade, ModifierKey, PcKey
3035
from .tools.askui import AskUiControllerClient
3136

32-
if TYPE_CHECKING:
33-
from anthropic.types import AnthropicBetaParam
34-
3537
logger = logging.getLogger(__name__)
3638

3739

@@ -88,14 +90,35 @@ def __init__(
8890
models=models,
8991
tools=[
9092
ExceptionTool(),
91-
SetActiveDisplayTool(agent_os=self.tools.os),
92-
RetrieveActiveDisplayTool(agent_os=self.tools.os),
93-
ListDisplaysTool(agent_os=self.tools.os),
93+
ComputerGetMousePositionTool(),
94+
ComputerKeyboardPressedTool(),
95+
ComputerKeyboardReleaseTool(),
96+
ComputerKeyboardTapTool(),
97+
ComputerMouseClickTool(),
98+
ComputerMouseHoldDownTool(),
99+
ComputerMouseReleaseTool(),
100+
ComputerMouseScrollTool(),
101+
ComputerMoveMouseTool(),
102+
ComputerScreenshotTool(),
103+
ComputerTypeTool(),
104+
ComputerListDisplaysTool(),
105+
ComputerRetrieveActiveDisplayTool(),
106+
ComputerSetActiveDisplayTool(),
94107
]
95108
+ (act_tools or []),
96109
agent_os=self.tools.os,
97110
model_provider=model_provider,
98111
)
112+
self.act_agent_os_facade: ComputerAgentOsFacade = ComputerAgentOsFacade(
113+
self.tools.os
114+
)
115+
self.act_tool_collection.add_agent_os(self.act_agent_os_facade)
116+
self.act_settings = ActSettings(
117+
messages=MessageSettings(
118+
system=COMPUTER_AGENT_SYSTEM_PROMPT,
119+
thinking={"type": "enabled", "budget_tokens": 2048},
120+
)
121+
)
99122

100123
@telemetry.record_call(exclude={"locator"})
101124
@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
@@ -396,34 +419,6 @@ def mouse_down(
396419
logger.debug("VisionAgent received instruction to mouse_down '%s'", button)
397420
self.tools.os.mouse_down(button)
398421

399-
@override
400-
def _get_default_settings_for_act(self, model: str) -> ActSettings:
401-
computer_use_beta_flag: list[AnthropicBetaParam] | Omit
402-
if "claude-opus-4-5-20251101" in model:
403-
computer_use_beta_flag = [COMPUTER_USE_20251124_BETA_FLAG]
404-
elif (
405-
"claude-sonnet-4-5-20250929" in model
406-
or "claude-haiku-4-5-20251001" in model
407-
or "claude-opus-4-1-20250805" in model
408-
or "claude-opus-4-20250514" in model
409-
or "claude-sonnet-4-20250514" in model
410-
or "claude-3-7-sonnet-20250219" in model
411-
):
412-
computer_use_beta_flag = [COMPUTER_USE_20250124_BETA_FLAG]
413-
else:
414-
computer_use_beta_flag = omit
415-
return ActSettings(
416-
messages=MessageSettings(
417-
system=COMPUTER_AGENT_SYSTEM_PROMPT,
418-
betas=computer_use_beta_flag,
419-
thinking={"type": "enabled", "budget_tokens": 2048},
420-
),
421-
)
422-
423-
@override
424-
def _get_default_tools_for_act(self, model: str) -> list[Tool]:
425-
return self._tools + [Computer20250124Tool(agent_os=self.tools.os)]
426-
427422
@telemetry.record_call()
428423
@validate_call
429424
def keyboard(

src/askui/agent_base.py

Lines changed: 13 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,9 @@
3737
ModelChoice,
3838
ModelName,
3939
ModelRegistry,
40-
Point,
41-
PointList,
4240
TotalModelChoice,
4341
)
42+
from .models.types.geometry import Point, PointList
4443
from .models.types.response_schemas import ResponseSchema
4544
from .reporting import Reporter
4645
from .retry import ConfigurableRetry, Retry
@@ -106,6 +105,11 @@ def __init__(
106105
reporter=self._reporter, models=models or {}
107106
)
108107

108+
self.act_tool_collection = ToolCollection(tools=tools)
109+
110+
self.act_settings = ActSettings()
111+
self.caching_settings = CachingSettings()
112+
109113
def _init_model_router(
110114
self,
111115
reporter: Reporter,
@@ -300,11 +304,9 @@ def act(
300304
[MessageParam(role="user", content=goal)] if isinstance(goal, str) else goal
301305
)
302306
_model = self._get_model(model, "act")
303-
_settings = settings or self._get_default_settings_for_act(_model)
307+
_settings = settings or self.act_settings
304308

305-
_caching_settings: CachingSettings = (
306-
caching_settings or self._get_default_caching_settings_for_act(_model)
307-
)
309+
_caching_settings: CachingSettings = caching_settings or self.caching_settings
308310

309311
tools, on_message, cached_execution_tool = self._patch_act_with_cache(
310312
_caching_settings, _settings, tools, on_message
@@ -323,14 +325,14 @@ def act(
323325
)
324326

325327
def _build_tools(
326-
self, tools: list[Tool] | ToolCollection | None, model: str
328+
self, tools: list[Tool] | ToolCollection | None, _model: str
327329
) -> ToolCollection:
328-
default_tools = self._get_default_tools_for_act(model)
330+
tool_collection = self.act_tool_collection
329331
if isinstance(tools, list):
330-
return ToolCollection(tools=default_tools + tools)
332+
tool_collection.append_tool(*tools)
331333
if isinstance(tools, ToolCollection):
332-
return ToolCollection(default_tools) + tools
333-
return ToolCollection(tools=default_tools)
334+
tool_collection += tools
335+
return tool_collection
334336

335337
def _patch_act_with_cache(
336338
self,
@@ -399,15 +401,6 @@ def _patch_act_with_cache(
399401

400402
return tools, on_message, cached_execution_tool
401403

402-
def _get_default_settings_for_act(self, model: str) -> ActSettings: # noqa: ARG002
403-
return ActSettings()
404-
405-
def _get_default_caching_settings_for_act(self, model: str) -> CachingSettings: # noqa: ARG002
406-
return CachingSettings()
407-
408-
def _get_default_tools_for_act(self, model: str) -> list[Tool]: # noqa: ARG002
409-
return self._tools
410-
411404
@overload
412405
def get(
413406
self,

src/askui/android_agent.py

Lines changed: 22 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
from typing import Annotated, overload
33

44
from pydantic import ConfigDict, Field, validate_call
5-
from typing_extensions import override
65

76
from askui.agent_base import AgentBase
87
from askui.container import telemetry
@@ -64,7 +63,7 @@ class AndroidVisionAgent(AgentBase):
6463
```
6564
"""
6665

67-
@telemetry.record_call(exclude={"model_router", "reporters", "tools"})
66+
@telemetry.record_call(exclude={"model_router", "reporters", "tools", "act_tools"})
6867
@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
6968
def __init__(
7069
self,
@@ -85,25 +84,33 @@ def __init__(
8584
retry=retry,
8685
models=models,
8786
tools=[
88-
AndroidScreenshotTool(self.act_agent_os_facade),
89-
AndroidTapTool(self.act_agent_os_facade),
90-
AndroidTypeTool(self.act_agent_os_facade),
91-
AndroidDragAndDropTool(self.act_agent_os_facade),
92-
AndroidKeyTapEventTool(self.act_agent_os_facade),
93-
AndroidSwipeTool(self.act_agent_os_facade),
94-
AndroidKeyCombinationTool(self.act_agent_os_facade),
95-
AndroidShellTool(self.act_agent_os_facade),
96-
AndroidSelectDeviceBySerialNumberTool(self.act_agent_os_facade),
97-
AndroidSelectDisplayByUniqueIDTool(self.act_agent_os_facade),
98-
AndroidGetConnectedDevicesSerialNumbersTool(self.act_agent_os_facade),
99-
AndroidGetConnectedDisplaysInfosTool(self.act_agent_os_facade),
100-
AndroidGetCurrentConnectedDeviceInfosTool(self.act_agent_os_facade),
87+
AndroidScreenshotTool(),
88+
AndroidTapTool(),
89+
AndroidTypeTool(),
90+
AndroidDragAndDropTool(),
91+
AndroidKeyTapEventTool(),
92+
AndroidSwipeTool(),
93+
AndroidKeyCombinationTool(),
94+
AndroidShellTool(),
95+
AndroidSelectDeviceBySerialNumberTool(),
96+
AndroidSelectDisplayByUniqueIDTool(),
97+
AndroidGetConnectedDevicesSerialNumbersTool(),
98+
AndroidGetConnectedDisplaysInfosTool(),
99+
AndroidGetCurrentConnectedDeviceInfosTool(),
101100
ExceptionTool(),
102101
]
103102
+ (act_tools or []),
104103
agent_os=self.os,
105104
model_provider=model_provider,
106105
)
106+
self.act_tool_collection.add_agent_os(self.act_agent_os_facade)
107+
self.act_settings = ActSettings(
108+
messages=MessageSettings(
109+
system=ANDROID_AGENT_SYSTEM_PROMPT,
110+
thinking={"type": "disabled"},
111+
temperature=0.0,
112+
),
113+
)
107114

108115
@overload
109116
def tap(
@@ -353,13 +360,3 @@ def set_device_by_serial_number(
353360
f"set_device_by_serial_number(device_sn='{device_sn}')",
354361
)
355362
self.os.set_device_by_serial_number(device_sn)
356-
357-
@override
358-
def _get_default_settings_for_act(self, model: str) -> ActSettings:
359-
return ActSettings(
360-
messages=MessageSettings(
361-
system=ANDROID_AGENT_SYSTEM_PROMPT,
362-
thinking={"type": "disabled"},
363-
temperature=0.0,
364-
),
365-
)

src/askui/chat/__main__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,5 @@
1818
reload=False,
1919
workers=1,
2020
log_config=None,
21+
timeout_graceful_shutdown=5,
2122
)

0 commit comments

Comments
 (0)