|
| 1 | +# System Prompts |
| 2 | + |
| 3 | +System prompts define how the Vision Agent behaves when executing tasks through the `act()` command. A well-structured system prompt provides the agent with the necessary context, capabilities, and constraints to successfully interact with your application's UI across different devices and platforms. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The Vision Agent uses system prompts to understand: |
| 8 | +- What actions it can perform |
| 9 | +- What device/platform it's operating on |
| 10 | +- Specific information about your UI |
| 11 | +- How to format execution reports |
| 12 | +- Special rules or edge cases to handle |
| 13 | + |
| 14 | +The default prompts work well for general use cases, but customizing them for your specific application can significantly improve reliability and performance. |
| 15 | + |
| 16 | +## Prompt Structure |
| 17 | + |
| 18 | +System prompts should consist of six distinct parts, each wrapped in XML tags: |
| 19 | + |
| 20 | +| Part | Required | Purpose | |
| 21 | +|------|----------|---------| |
| 22 | +| System Capabilities | Yes | Defines what the agent can do and how it should behave | |
| 23 | +| Device Information | Yes | Provides platform-specific context (desktop, mobile, web) | |
| 24 | +| UI Information | No (but strongly recommended!) | Custom information about your specific UI | |
| 25 | +| Report Format | Yes | Specifies how to format execution results | |
| 26 | +| Cache Use | No | Specifices when and how the agent should use cache files | |
| 27 | +| Additional Rules | No | Special handling for edge cases or known issues | |
| 28 | + |
| 29 | +### 1. System Capabilities |
| 30 | + |
| 31 | +Defines the agent's core capabilities and operational guidelines. |
| 32 | + |
| 33 | +**Default prompts available:** |
| 34 | +- `COMPUTER_USE_CAPABILITIES` - For desktop applications |
| 35 | +- `WEB_BROWSER_CAPABILITIES` - For web applications |
| 36 | +- `ANDROID_CAPABILITIES` - For Android devices |
| 37 | + |
| 38 | +**Important:** We recommend using the default AskUI capabilities unless you have specific requirements, as custom capabilities can lead to unexpected behavior. |
| 39 | + |
| 40 | +### 2. Device Information |
| 41 | + |
| 42 | +Provides platform-specific context to help the agent understand the environment. |
| 43 | + |
| 44 | +**Default options:** |
| 45 | +- `DESKTOP_DEVICE_INFORMATION` - Platform, architecture, internet access |
| 46 | +- `WEB_AGENT_DEVICE_INFORMATION` - Browser environment details |
| 47 | +- `ANDROID_DEVICE_INFORMATION` - ADB connection, device type |
| 48 | + |
| 49 | +### 3. UI Information |
| 50 | + |
| 51 | +**This is the most important part to customize for your application.** |
| 52 | + |
| 53 | +Provide specific details about your UI that the agent needs to know: |
| 54 | + |
| 55 | +- Location of key functions and features |
| 56 | +- Non-standard interaction patterns |
| 57 | +- Common navigation paths |
| 58 | +- Areas where users typically encounter issues |
| 59 | +- Actions that should NOT be performed |
| 60 | + |
| 61 | +### 4. Report Format |
| 62 | + |
| 63 | +Specifies how the agent should format its execution report. |
| 64 | + |
| 65 | +**Default options:** |
| 66 | +- `MD_REPORT_FORMAT` - Markdown formatted summary with observations |
| 67 | +- `NO_REPORT_FORMAT` - No formal report required |
| 68 | + |
| 69 | +### 5. Cache Use Prompt |
| 70 | + |
| 71 | +Will be added automatically depending on your caching settings. |
| 72 | + |
| 73 | +### 6. Additional Rules |
| 74 | + |
| 75 | +Optional rules for handling specific edge cases or known issues with your application. |
| 76 | + |
| 77 | +**Use cases:** |
| 78 | +- Browser-specific workarounds (e.g., Firefox startup wizards) |
| 79 | +- Special handling for specific UI states |
| 80 | +- Recovery strategies for common failure scenarios |
| 81 | + |
| 82 | +## Creating Custom Prompts |
| 83 | + |
| 84 | +### Using Factory Functions (Recommended) |
| 85 | + |
| 86 | +The simplest way to create custom prompts: |
| 87 | + |
| 88 | +```python |
| 89 | +from askui.prompts.act_prompts import create_web_agent_prompt |
| 90 | + |
| 91 | +# Create prompt with custom UI information |
| 92 | +prompt = create_web_agent_prompt( |
| 93 | + ui_information=""" |
| 94 | + **Navigation:** |
| 95 | + - Main menu is accessible via hamburger icon in top-left corner |
| 96 | + - Search functionality is in the header on all pages |
| 97 | +
|
| 98 | + **Login Flow:** |
| 99 | + - Username field must be filled before password field becomes active |
| 100 | + - "Remember me" checkbox should NOT be used in automated tests |
| 101 | +
|
| 102 | + **Common Issues:** |
| 103 | + - Loading spinner may appear for 2-3 seconds after clicking "Submit" |
| 104 | + - Error messages appear as toast notifications in top-right corner |
| 105 | + """, |
| 106 | + additional_rules=""" |
| 107 | + - Always wait for the loading spinner to disappear before proceeding |
| 108 | + - Never click "Save and Exit" without explicit user confirmation |
| 109 | + """ |
| 110 | +) |
| 111 | + |
| 112 | +# Use in agent |
| 113 | +from askui import WebVisionAgent |
| 114 | +from askui.models.shared.settings import ActSettings, MessageSettings |
| 115 | + |
| 116 | +with WebVisionAgent() as agent: |
| 117 | + agent.act( |
| 118 | + "Log in with username 'testuser' and password 'testpass123'", |
| 119 | + # CAUTION: this will also override all other MessageSettings |
| 120 | + # eventually provided earlier! |
| 121 | + settings=ActSettings(messages=MessageSettings(system=prompt)) |
| 122 | + ) |
| 123 | +``` |
| 124 | + |
| 125 | +Available factory functions: |
| 126 | +- `create_computer_agent_prompt()` - Desktop applications |
| 127 | +- `create_web_agent_prompt()` - Web applications |
| 128 | +- `create_android_agent_prompt()` - Android devices |
| 129 | + |
| 130 | +### Using ActSystemPrompt Directly |
| 131 | + |
| 132 | +For full control over all prompt components: |
| 133 | + |
| 134 | +```python |
| 135 | +from askui.models.shared.prompts import ActSystemPrompt |
| 136 | +from askui.prompts.act_prompts import ( |
| 137 | + WEB_BROWSER_CAPABILITIES, |
| 138 | + WEB_AGENT_DEVICE_INFORMATION, |
| 139 | + NO_REPORT_FORMAT, |
| 140 | +) |
| 141 | + |
| 142 | +prompt = ActSystemPrompt( |
| 143 | + system_capabilities=WEB_BROWSER_CAPABILITIES, |
| 144 | + device_information=WEB_AGENT_DEVICE_INFORMATION, |
| 145 | + ui_information="Your custom UI information here", |
| 146 | + report_format=NO_REPORT_FORMAT, |
| 147 | + additional_rules="Your additional rules here" |
| 148 | +) |
| 149 | +``` |
| 150 | + |
| 151 | +### Power User Override (Not Recommended) |
| 152 | + |
| 153 | +**Warning:** This feature is intended for power users only and can lead to unexpected behavior. |
| 154 | + |
| 155 | +`ActSystemPrompt` includes a `prompt` field that completely overrides all structured prompt parts when set. This is useful only if you need full control over the exact prompt text: |
| 156 | + |
| 157 | +```python |
| 158 | +from askui.models.shared.prompts import ActSystemPrompt |
| 159 | +from askui.models.shared.settings import ActSettings, MessageSettings |
| 160 | + |
| 161 | +# Power user override - ignores all other prompt fields |
| 162 | +prompt = ActSystemPrompt( |
| 163 | + prompt="Your completely custom system prompt here", |
| 164 | + # All other fields will be ignored when prompt is set: |
| 165 | + system_capabilities="Ignored", |
| 166 | + device_information="Ignored", |
| 167 | + # ... etc |
| 168 | +) |
| 169 | + |
| 170 | +with WebVisionAgent() as agent: |
| 171 | + agent.act( |
| 172 | + "Your task", |
| 173 | + settings=ActSettings(messages=MessageSettings(system=prompt)) |
| 174 | + ) |
| 175 | +``` |
| 176 | + |
| 177 | +**Important limitations:** |
| 178 | +- ⚠️ Using the `prompt` field will trigger a `UserWarning` on model creation |
| 179 | +- ⚠️ All structured prompt parts (capabilities, device info, etc.) are completely ignored |
| 180 | +- ✅ Other `MessageSettings` fields remain unchanged (betas, thinking, max_tokens, temperature, tool_choice) |
| 181 | +- ✅ Only the system prompt text itself is affected - all other settings remain at their configured values |
| 182 | + |
| 183 | +**When to use this:** |
| 184 | +- You have extensive experience with prompt engineering |
| 185 | +- You need to experiment with completely different prompt structures |
| 186 | +- You're conducting research or debugging specific prompt behaviors |
| 187 | + |
| 188 | +**When NOT to use this:** |
| 189 | +- For normal customization needs (use factory functions or structured fields instead) |
| 190 | +- When you want to maintain the tested structure of default prompts |
| 191 | +- In production environments where reliability is critical |
| 192 | + |
| 193 | +### Modifying Default Prompts |
| 194 | + |
| 195 | +You can extend the default prompts with your own content: |
| 196 | + |
| 197 | +```python |
| 198 | +from askui.prompts.act_prompts import ( |
| 199 | + create_computer_agent_prompt, |
| 200 | + BROWSER_SPECIFIC_RULES, |
| 201 | +) |
| 202 | + |
| 203 | +# Add your own rules to the defaults |
| 204 | +custom_rules = f""" |
| 205 | +{BROWSER_SPECIFIC_RULES} |
| 206 | +
|
| 207 | +**Application-Specific Rules:** |
| 208 | +- Always verify the page title before proceeding with actions |
| 209 | +- Wait 1 second after navigation before taking screenshots |
| 210 | +- Ignore popup notifications that appear during test execution |
| 211 | +""" |
| 212 | + |
| 213 | +prompt = create_computer_agent_prompt( |
| 214 | + ui_information="E-commerce checkout flow with 3-step process", |
| 215 | + additional_rules=custom_rules |
| 216 | +) |
| 217 | +``` |
| 218 | + |
| 219 | +## Best Practices |
| 220 | + |
| 221 | +### Language and Clarity |
| 222 | + |
| 223 | +- **Use consistent English**: Stick to clear English throughout your prompt. Mixed languages or non-English prompts will degrade performance. |
| 224 | +- **Be specific and detailed**: Provide as much relevant detail as possible. Over-specification is better than under-specification. |
| 225 | +- **Use structured format**: Organize information with bullet points and clear sections. |
| 226 | +- **Avoid contradictions**: Ensure rules don't conflict with each other. |
| 227 | + |
| 228 | +### UI Information |
| 229 | + |
| 230 | +- **Document navigation patterns**: Explain how users navigate through your application. |
| 231 | +- **Identify unique elements**: Point out non-standard UI components or interactions. |
| 232 | +- **Specify timing requirements**: Note any delays, loading states, or async operations. |
| 233 | +- **List forbidden actions**: Explicitly state what the agent should NOT do. |
| 234 | + |
| 235 | +### Additional Rules |
| 236 | + |
| 237 | +- **Target specific issues**: Use this section to address known failure scenarios. |
| 238 | +- **Provide context**: Explain when and why a rule applies. |
| 239 | +- **Include examples**: Show concrete examples of the situation you're addressing. |
| 240 | +- **Keep it current**: Update rules as your application changes. |
| 241 | + |
| 242 | +### Testing and Iteration |
| 243 | + |
| 244 | +1. **Start with defaults**: Use default prompts initially to establish a baseline. |
| 245 | +2. **Add UI information**: Customize with your application-specific details. |
| 246 | +3. **Monitor failures**: Track where the agent struggles or fails. |
| 247 | +4. **Refine rules**: Add additional rules to handle discovered edge cases. |
| 248 | +5. **Test changes**: Verify that prompt changes improve reliability. |
| 249 | + |
| 250 | +## Available Constants |
| 251 | + |
| 252 | +Import these constants from `askui.prompts.act_prompts`: |
| 253 | + |
| 254 | +**System Capabilities:** |
| 255 | +- `GENERAL_CAPABILITIES` |
| 256 | +- `COMPUTER_USE_CAPABILITIES` |
| 257 | +- `ANDROID_CAPABILITIES` |
| 258 | +- `WEB_BROWSER_CAPABILITIES` |
| 259 | + |
| 260 | +**Device Information:** |
| 261 | +- `DESKTOP_DEVICE_INFORMATION` |
| 262 | +- `ANDROID_DEVICE_INFORMATION` |
| 263 | +- `WEB_AGENT_DEVICE_INFORMATION` |
| 264 | + |
| 265 | +**Report Formats:** |
| 266 | +- `MD_REPORT_FORMAT` |
| 267 | +- `NO_REPORT_FORMAT` |
| 268 | + |
| 269 | +**Additional Rules:** |
| 270 | +- `BROWSER_SPECIFIC_RULES` |
| 271 | +- `BROWSER_INSTALL_RULES` |
| 272 | +- `ANDROID_RECOVERY_RULES` |
| 273 | + |
| 274 | +## Example: Complete Custom Prompt |
| 275 | + |
| 276 | +```python |
| 277 | +from askui import WebVisionAgent |
| 278 | +from askui.prompts.act_prompts import create_web_agent_prompt |
| 279 | +from askui.models.shared.settings import ActSettings, MessageSettings |
| 280 | + |
| 281 | +# Create comprehensive custom prompt |
| 282 | +prompt = create_web_agent_prompt( |
| 283 | + ui_information=""" |
| 284 | + **Application Overview:** |
| 285 | + - Multi-page e-commerce application with product catalog and checkout |
| 286 | + - Uses single-page navigation with URL updates |
| 287 | +
|
| 288 | + **Key Features:** |
| 289 | + - Product search in header (always visible) |
| 290 | + - Shopping cart icon shows item count |
| 291 | + - Checkout is 3-step process: Cart → Shipping → Payment |
| 292 | +
|
| 293 | + **Important Elements:** |
| 294 | + - "Add to Cart" buttons are blue with white text |
| 295 | + - Price displays always show currency symbol ($) |
| 296 | + - Out-of-stock items show "Notify Me" instead of "Add to Cart" |
| 297 | +
|
| 298 | + **Navigation:** |
| 299 | + - Home: Click logo in top-left |
| 300 | + - Categories: Dropdown menu under "Shop" in header |
| 301 | + - Cart: Click cart icon in top-right |
| 302 | + - Account: Click user icon in top-right |
| 303 | +
|
| 304 | + **Common Patterns:** |
| 305 | + - All forms require clicking "Next" or "Continue" to proceed |
| 306 | + - Error messages appear in red above form fields |
| 307 | + - Success messages appear as green banner at top of page |
| 308 | +
|
| 309 | + **Timing Considerations:** |
| 310 | + - Product images may take 1-2 seconds to load |
| 311 | + - Cart updates trigger 500ms animation |
| 312 | + - Checkout validation shows spinner for 1-3 seconds |
| 313 | +
|
| 314 | + **DO NOT:** |
| 315 | + - Click "Complete Purchase" without explicit user confirmation |
| 316 | + - Submit payment information |
| 317 | + - Delete items from saved lists |
| 318 | + """, |
| 319 | + additional_rules=""" |
| 320 | + - Always verify cart contents before proceeding to checkout |
| 321 | + - Wait for page transitions to complete before taking next action |
| 322 | + - If "Out of Stock" message appears, report it and stop execution |
| 323 | + - Ignore promotional popups that may appear during browsing |
| 324 | + """ |
| 325 | +) |
| 326 | + |
| 327 | +# Use the prompt |
| 328 | +with WebVisionAgent() as agent: |
| 329 | + agent.act( |
| 330 | + "Find a laptop under $1000 and add it to cart", |
| 331 | + # CAUTION: this will also override all other MessageSettings |
| 332 | + # eventually provided earlier! |
| 333 | + settings=ActSettings(messages=MessageSettings(system=prompt)) |
| 334 | + ) |
| 335 | +``` |
0 commit comments