Skip to content

Commit 86a6639

Browse files
Merge pull request #207 from askui/feat/advanced-prompting
feat: novel prompting paradigm with 6-fold system prompt
2 parents 483bc4a + e836d43 commit 86a6639

23 files changed

Lines changed: 1479 additions & 375 deletions

docs/prompting.md

Lines changed: 335 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,335 @@
1+
# System Prompts
2+
3+
System prompts define how the Vision Agent behaves when executing tasks through the `act()` command. A well-structured system prompt provides the agent with the necessary context, capabilities, and constraints to successfully interact with your application's UI across different devices and platforms.
4+
5+
## Overview
6+
7+
The Vision Agent uses system prompts to understand:
8+
- What actions it can perform
9+
- What device/platform it's operating on
10+
- Specific information about your UI
11+
- How to format execution reports
12+
- Special rules or edge cases to handle
13+
14+
The default prompts work well for general use cases, but customizing them for your specific application can significantly improve reliability and performance.
15+
16+
## Prompt Structure
17+
18+
System prompts should consist of six distinct parts, each wrapped in XML tags:
19+
20+
| Part | Required | Purpose |
21+
|------|----------|---------|
22+
| System Capabilities | Yes | Defines what the agent can do and how it should behave |
23+
| Device Information | Yes | Provides platform-specific context (desktop, mobile, web) |
24+
| UI Information | No (but strongly recommended!) | Custom information about your specific UI |
25+
| Report Format | Yes | Specifies how to format execution results |
26+
| Cache Use | No | Specifices when and how the agent should use cache files |
27+
| Additional Rules | No | Special handling for edge cases or known issues |
28+
29+
### 1. System Capabilities
30+
31+
Defines the agent's core capabilities and operational guidelines.
32+
33+
**Default prompts available:**
34+
- `COMPUTER_USE_CAPABILITIES` - For desktop applications
35+
- `WEB_BROWSER_CAPABILITIES` - For web applications
36+
- `ANDROID_CAPABILITIES` - For Android devices
37+
38+
**Important:** We recommend using the default AskUI capabilities unless you have specific requirements, as custom capabilities can lead to unexpected behavior.
39+
40+
### 2. Device Information
41+
42+
Provides platform-specific context to help the agent understand the environment.
43+
44+
**Default options:**
45+
- `DESKTOP_DEVICE_INFORMATION` - Platform, architecture, internet access
46+
- `WEB_AGENT_DEVICE_INFORMATION` - Browser environment details
47+
- `ANDROID_DEVICE_INFORMATION` - ADB connection, device type
48+
49+
### 3. UI Information
50+
51+
**This is the most important part to customize for your application.**
52+
53+
Provide specific details about your UI that the agent needs to know:
54+
55+
- Location of key functions and features
56+
- Non-standard interaction patterns
57+
- Common navigation paths
58+
- Areas where users typically encounter issues
59+
- Actions that should NOT be performed
60+
61+
### 4. Report Format
62+
63+
Specifies how the agent should format its execution report.
64+
65+
**Default options:**
66+
- `MD_REPORT_FORMAT` - Markdown formatted summary with observations
67+
- `NO_REPORT_FORMAT` - No formal report required
68+
69+
### 5. Cache Use Prompt
70+
71+
Will be added automatically depending on your caching settings.
72+
73+
### 6. Additional Rules
74+
75+
Optional rules for handling specific edge cases or known issues with your application.
76+
77+
**Use cases:**
78+
- Browser-specific workarounds (e.g., Firefox startup wizards)
79+
- Special handling for specific UI states
80+
- Recovery strategies for common failure scenarios
81+
82+
## Creating Custom Prompts
83+
84+
### Using Factory Functions (Recommended)
85+
86+
The simplest way to create custom prompts:
87+
88+
```python
89+
from askui.prompts.act_prompts import create_web_agent_prompt
90+
91+
# Create prompt with custom UI information
92+
prompt = create_web_agent_prompt(
93+
ui_information="""
94+
**Navigation:**
95+
- Main menu is accessible via hamburger icon in top-left corner
96+
- Search functionality is in the header on all pages
97+
98+
**Login Flow:**
99+
- Username field must be filled before password field becomes active
100+
- "Remember me" checkbox should NOT be used in automated tests
101+
102+
**Common Issues:**
103+
- Loading spinner may appear for 2-3 seconds after clicking "Submit"
104+
- Error messages appear as toast notifications in top-right corner
105+
""",
106+
additional_rules="""
107+
- Always wait for the loading spinner to disappear before proceeding
108+
- Never click "Save and Exit" without explicit user confirmation
109+
"""
110+
)
111+
112+
# Use in agent
113+
from askui import WebVisionAgent
114+
from askui.models.shared.settings import ActSettings, MessageSettings
115+
116+
with WebVisionAgent() as agent:
117+
agent.act(
118+
"Log in with username 'testuser' and password 'testpass123'",
119+
# CAUTION: this will also override all other MessageSettings
120+
# eventually provided earlier!
121+
settings=ActSettings(messages=MessageSettings(system=prompt))
122+
)
123+
```
124+
125+
Available factory functions:
126+
- `create_computer_agent_prompt()` - Desktop applications
127+
- `create_web_agent_prompt()` - Web applications
128+
- `create_android_agent_prompt()` - Android devices
129+
130+
### Using ActSystemPrompt Directly
131+
132+
For full control over all prompt components:
133+
134+
```python
135+
from askui.models.shared.prompts import ActSystemPrompt
136+
from askui.prompts.act_prompts import (
137+
WEB_BROWSER_CAPABILITIES,
138+
WEB_AGENT_DEVICE_INFORMATION,
139+
NO_REPORT_FORMAT,
140+
)
141+
142+
prompt = ActSystemPrompt(
143+
system_capabilities=WEB_BROWSER_CAPABILITIES,
144+
device_information=WEB_AGENT_DEVICE_INFORMATION,
145+
ui_information="Your custom UI information here",
146+
report_format=NO_REPORT_FORMAT,
147+
additional_rules="Your additional rules here"
148+
)
149+
```
150+
151+
### Power User Override (Not Recommended)
152+
153+
**Warning:** This feature is intended for power users only and can lead to unexpected behavior.
154+
155+
`ActSystemPrompt` includes a `prompt` field that completely overrides all structured prompt parts when set. This is useful only if you need full control over the exact prompt text:
156+
157+
```python
158+
from askui.models.shared.prompts import ActSystemPrompt
159+
from askui.models.shared.settings import ActSettings, MessageSettings
160+
161+
# Power user override - ignores all other prompt fields
162+
prompt = ActSystemPrompt(
163+
prompt="Your completely custom system prompt here",
164+
# All other fields will be ignored when prompt is set:
165+
system_capabilities="Ignored",
166+
device_information="Ignored",
167+
# ... etc
168+
)
169+
170+
with WebVisionAgent() as agent:
171+
agent.act(
172+
"Your task",
173+
settings=ActSettings(messages=MessageSettings(system=prompt))
174+
)
175+
```
176+
177+
**Important limitations:**
178+
- ⚠️ Using the `prompt` field will trigger a `UserWarning` on model creation
179+
- ⚠️ All structured prompt parts (capabilities, device info, etc.) are completely ignored
180+
- ✅ Other `MessageSettings` fields remain unchanged (betas, thinking, max_tokens, temperature, tool_choice)
181+
- ✅ Only the system prompt text itself is affected - all other settings remain at their configured values
182+
183+
**When to use this:**
184+
- You have extensive experience with prompt engineering
185+
- You need to experiment with completely different prompt structures
186+
- You're conducting research or debugging specific prompt behaviors
187+
188+
**When NOT to use this:**
189+
- For normal customization needs (use factory functions or structured fields instead)
190+
- When you want to maintain the tested structure of default prompts
191+
- In production environments where reliability is critical
192+
193+
### Modifying Default Prompts
194+
195+
You can extend the default prompts with your own content:
196+
197+
```python
198+
from askui.prompts.act_prompts import (
199+
create_computer_agent_prompt,
200+
BROWSER_SPECIFIC_RULES,
201+
)
202+
203+
# Add your own rules to the defaults
204+
custom_rules = f"""
205+
{BROWSER_SPECIFIC_RULES}
206+
207+
**Application-Specific Rules:**
208+
- Always verify the page title before proceeding with actions
209+
- Wait 1 second after navigation before taking screenshots
210+
- Ignore popup notifications that appear during test execution
211+
"""
212+
213+
prompt = create_computer_agent_prompt(
214+
ui_information="E-commerce checkout flow with 3-step process",
215+
additional_rules=custom_rules
216+
)
217+
```
218+
219+
## Best Practices
220+
221+
### Language and Clarity
222+
223+
- **Use consistent English**: Stick to clear English throughout your prompt. Mixed languages or non-English prompts will degrade performance.
224+
- **Be specific and detailed**: Provide as much relevant detail as possible. Over-specification is better than under-specification.
225+
- **Use structured format**: Organize information with bullet points and clear sections.
226+
- **Avoid contradictions**: Ensure rules don't conflict with each other.
227+
228+
### UI Information
229+
230+
- **Document navigation patterns**: Explain how users navigate through your application.
231+
- **Identify unique elements**: Point out non-standard UI components or interactions.
232+
- **Specify timing requirements**: Note any delays, loading states, or async operations.
233+
- **List forbidden actions**: Explicitly state what the agent should NOT do.
234+
235+
### Additional Rules
236+
237+
- **Target specific issues**: Use this section to address known failure scenarios.
238+
- **Provide context**: Explain when and why a rule applies.
239+
- **Include examples**: Show concrete examples of the situation you're addressing.
240+
- **Keep it current**: Update rules as your application changes.
241+
242+
### Testing and Iteration
243+
244+
1. **Start with defaults**: Use default prompts initially to establish a baseline.
245+
2. **Add UI information**: Customize with your application-specific details.
246+
3. **Monitor failures**: Track where the agent struggles or fails.
247+
4. **Refine rules**: Add additional rules to handle discovered edge cases.
248+
5. **Test changes**: Verify that prompt changes improve reliability.
249+
250+
## Available Constants
251+
252+
Import these constants from `askui.prompts.act_prompts`:
253+
254+
**System Capabilities:**
255+
- `GENERAL_CAPABILITIES`
256+
- `COMPUTER_USE_CAPABILITIES`
257+
- `ANDROID_CAPABILITIES`
258+
- `WEB_BROWSER_CAPABILITIES`
259+
260+
**Device Information:**
261+
- `DESKTOP_DEVICE_INFORMATION`
262+
- `ANDROID_DEVICE_INFORMATION`
263+
- `WEB_AGENT_DEVICE_INFORMATION`
264+
265+
**Report Formats:**
266+
- `MD_REPORT_FORMAT`
267+
- `NO_REPORT_FORMAT`
268+
269+
**Additional Rules:**
270+
- `BROWSER_SPECIFIC_RULES`
271+
- `BROWSER_INSTALL_RULES`
272+
- `ANDROID_RECOVERY_RULES`
273+
274+
## Example: Complete Custom Prompt
275+
276+
```python
277+
from askui import WebVisionAgent
278+
from askui.prompts.act_prompts import create_web_agent_prompt
279+
from askui.models.shared.settings import ActSettings, MessageSettings
280+
281+
# Create comprehensive custom prompt
282+
prompt = create_web_agent_prompt(
283+
ui_information="""
284+
**Application Overview:**
285+
- Multi-page e-commerce application with product catalog and checkout
286+
- Uses single-page navigation with URL updates
287+
288+
**Key Features:**
289+
- Product search in header (always visible)
290+
- Shopping cart icon shows item count
291+
- Checkout is 3-step process: Cart → Shipping → Payment
292+
293+
**Important Elements:**
294+
- "Add to Cart" buttons are blue with white text
295+
- Price displays always show currency symbol ($)
296+
- Out-of-stock items show "Notify Me" instead of "Add to Cart"
297+
298+
**Navigation:**
299+
- Home: Click logo in top-left
300+
- Categories: Dropdown menu under "Shop" in header
301+
- Cart: Click cart icon in top-right
302+
- Account: Click user icon in top-right
303+
304+
**Common Patterns:**
305+
- All forms require clicking "Next" or "Continue" to proceed
306+
- Error messages appear in red above form fields
307+
- Success messages appear as green banner at top of page
308+
309+
**Timing Considerations:**
310+
- Product images may take 1-2 seconds to load
311+
- Cart updates trigger 500ms animation
312+
- Checkout validation shows spinner for 1-3 seconds
313+
314+
**DO NOT:**
315+
- Click "Complete Purchase" without explicit user confirmation
316+
- Submit payment information
317+
- Delete items from saved lists
318+
""",
319+
additional_rules="""
320+
- Always verify cart contents before proceeding to checkout
321+
- Wait for page transitions to complete before taking next action
322+
- If "Out of Stock" message appears, report it and stop execution
323+
- Ignore promotional popups that may appear during browsing
324+
"""
325+
)
326+
327+
# Use the prompt
328+
with WebVisionAgent() as agent:
329+
agent.act(
330+
"Find a laptop under $1000 and add it to cart",
331+
# CAUTION: this will also override all other MessageSettings
332+
# eventually provided earlier!
333+
settings=ActSettings(messages=MessageSettings(system=prompt))
334+
)
335+
```

src/askui/agent.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88
from askui.locators.locators import Locator
99
from askui.models.shared.settings import ActSettings, MessageSettings
1010
from askui.models.shared.tools import Tool
11-
from askui.prompts.system import COMPUTER_AGENT_SYSTEM_PROMPT
11+
from askui.prompts.act_prompts import (
12+
create_computer_agent_prompt,
13+
)
1214
from askui.tools.computer import (
1315
ComputerGetMousePositionTool,
1416
ComputerKeyboardPressedTool,
@@ -115,7 +117,7 @@ def __init__(
115117
self.act_tool_collection.add_agent_os(self.act_agent_os_facade)
116118
self.act_settings = ActSettings(
117119
messages=MessageSettings(
118-
system=COMPUTER_AGENT_SYSTEM_PROMPT,
120+
system=create_computer_agent_prompt(),
119121
thinking={"type": "enabled", "budget_tokens": 2048},
120122
)
121123
)

src/askui/agent_base.py

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
from abc import ABC
55
from typing import Annotated, Literal, Optional, Type, overload
66

7-
from anthropic.types.beta import BetaTextBlockParam
87
from dotenv import load_dotenv
98
from pydantic import ConfigDict, Field, field_validator, validate_call
109
from pydantic_settings import BaseSettings, SettingsConfigDict
@@ -17,6 +16,7 @@
1716
from askui.models.shared.agent_on_message_cb import OnMessageCb
1817
from askui.models.shared.settings import ActSettings, CachingSettings
1918
from askui.models.shared.tools import Tool, ToolCollection
19+
from askui.prompts.act_prompts import create_default_prompt
2020
from askui.prompts.caching import CACHE_USE_PROMPT
2121
from askui.tools.agent_os import AgentOs
2222
from askui.tools.android.agent_os import AndroidAgentOs
@@ -368,17 +368,9 @@ def _patch_act_with_cache(
368368
cached_execution_tool,
369369
]
370370
)
371-
if isinstance(settings.messages.system, str):
372-
settings.messages.system = (
373-
settings.messages.system + "\n" + CACHE_USE_PROMPT
374-
)
375-
elif isinstance(settings.messages.system, list):
376-
# Append as a new text block
377-
settings.messages.system = settings.messages.system + [
378-
BetaTextBlockParam(type="text", text=CACHE_USE_PROMPT)
379-
]
380-
else: # Omit or None
381-
settings.messages.system = CACHE_USE_PROMPT
371+
if settings.messages.system is None:
372+
settings.messages.system = create_default_prompt()
373+
settings.messages.system.cache_use = CACHE_USE_PROMPT
382374

383375
# Add caching tools to the tools list
384376
if isinstance(tools, list):

0 commit comments

Comments
 (0)