-
Notifications
You must be signed in to change notification settings - Fork 62
Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features. #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features. #236
Changes from 26 commits
Commits
Show all changes
65 commits
Select commit
Hold shift + click to select a range
7f2770c
refactor: migrate act to conversation-based architecture and update c…
philipph-askui d79a5cd
feat: add caching_v2 features and fix otel dependency for tracing
philipph-askui 835860a
feat: change default of `is_cacheable` flag to False
philipph-askui 08a1a0e
fix: update prompts to state of caching_v02
philipph-askui ec8e82b
fix: format, typechecking, liniting issues
philipph-askui 16983b4
fix: sanitizes messages before sending to API as we need to remove pr…
philipph-askui d6932f2
removes old 'llm_provider` field in CacheWritingSettings
philipph-askui 297b5e3
fix: add default cache directory (.askui_cache) to gitignore
philipph-askui cc47181
chore: change logging outputs to INFO
philipph-askui 6e3a52f
fix: removes old cache_writer and makes code use the new cache_manager
philipph-askui 57b887d
fix: handles problems due to tools now having uuid suffixes
philipph-askui 35d6149
fix: adds missing cache parameter handling
philipph-askui 6f08bce
fix: update outdated tests
philipph-askui 76daa92
feat: add method to truncate content for html reports to prevent floo…
philipph-askui 16257dd
fix: migrate caching to conversation-based architecture and add missi…
philipph-askui a4c6449
fix: bug in visual validation during cached execution
philipph-askui ac2caf1
chore: change default value for `visual_validation_threshold` to 10
philipph-askui 586aee3
fix: add explicit conversion to int of mouse move coordinats, as the …
philipph-askui aa93ff7
chore: change log message from warning to info
philipph-askui f04f1c9
fix: remove unnecessary files
philipph-askui 908d55d
fix: duplicate clipping of coordinates
philipph-askui 7f4b95f
fix: multiple bugs and code quality issues
philipph-askui 0b9e13b
fix: change default value of `delay_time_between_actions` from 0.5 to…
philipph-askui b60d1bf
feat: add usage statistics of caching to html reporter
philipph-askui f99082f
fix: bug where cached executions were reported as success when they w…
philipph-askui fdece1d
fix: change name of caching strategies to match new pattern from cach…
philipph-askui 6d961f8
fix: coding quality issue
philipph-askui 1c266ae
chore: add pydantic model for VisualValidationMetadata
philipph-askui ee78cbf
chore: move conversation to models/shared
philipph-askui 4029647
chore: refactor control loop and delete legacy code (custom_agent and…
philipph-askui d135838
feat: add callback system comparable to pytorch lightning
philipph-askui dfc9068
fix: bug in html report that led to a crash for non-cached executions
philipph-askui 19ab09c
feat: add new speaker handoff pattern that is more scalable and gener…
philipph-askui 9e3672b
feat: add conversation_id to conversation
philipph-askui 908b5ce
feat: add on_speaker_switch callback
philipph-askui 1650ff7
chore: resolve joint callback method into methods that handle them in…
philipph-askui 6d17375
chore: refactor usage tracking to integrate via callback
philipph-askui f8b416a
chore: change name of caching strategy `both` to `auto`
philipph-askui 33c72bf
Merge branch 'main' into chore/act_conversation_with_caching
philipph-askui 0b1d45e
fix: add missing `_on_speaker_switch` callback to docs
philipph-askui efe0624
Merge branch 'chore/act_conversation_with_caching' of https://github.…
philipph-askui f3ca227
fix: make tracing span names consistent with function names
philipph-askui 7d63869
chore: rename `handle_result_status` to `_handle_continue_conversation`
philipph-askui 56d3793
chore: move speaker switch into a dedicated function `switch_speaker_…
philipph-askui 6d9b7f4
chore: remove unused `_has_tool_calls` from AgentSpeaker
philipph-askui 77239f0
Merge branch 'main' into chore/act_conversation_with_caching
philipph-askui 70293a4
fix: linting issue (Line too long)
philipph-askui c5728db
chore: remove unused local `speaker` variable
philipph-askui 3c8f50a
fix: run pdm install
philipph-askui 1c1c687
chore: make description and name public member variables of speakers …
philipph-askui 979ebba
chore: remove code quality (remove try-except, add isEnabledFor check…
philipph-askui ece5dfd
rename `_conclude_control_loop` to `_teardown_control_loop`
philipph-askui 7998ac0
chore: refactor `_sanitize_message_for_api` into new `from_message_pa…
philipph-askui 8d75ab6
fix: update docs to reflect latest changes
philipph-askui cb5f1a6
fix: remove try-except in CacheExecutor
philipph-askui 62b817c
chore: add inline comment in ContentBlock conversion of anthropic mes…
philipph-askui 7e04fb4
fix: exclude agent settings from telemetry as it cant be converted to…
philipph-askui 0a286d3
fix: bug in agent response status
philipph-askui 7d8c054
chore: set correct logger name
philipph-askui 698596c
chore: clean up logging in cache verification
philipph-askui b6998bf
fix: refines cache use prompt to prevent the model from using the wro…
philipph-askui 0b02f97
feat: adds hint to "Original" token values that this was the consumed…
philipph-askui b7d06af
chore: removes outdated `SIMPLIFICATION_CONCEPT.md`
philipph-askui 0e42685
fix: index of docs in overivew to align with filenames
philipph-askui c85141f
feat: change default model for vlm_providers to `claude-sonnet-4-6`
philipph-askui File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -169,5 +169,6 @@ reports/ | |
| /askui_chat.db-shm | ||
| /askui_chat.db-wal | ||
| .cache/ | ||
| .askui_cache/* | ||
|
|
||
| bom.json | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,12 +8,12 @@ The caching system works by recording all tool use actions (mouse movements, cli | |
|
|
||
| ## Caching Strategies | ||
|
|
||
| The caching mechanism supports four strategies, configured via the `caching_settings` parameter in the `act()` method: | ||
| The caching mechanism supports three strategies, configured via the `caching_settings` parameter in the `act()` method: | ||
|
|
||
| - **`"no"`** (default): No caching is used. The agent executes normally without recording or replaying actions. | ||
| - **`"write"`**: Records all agent actions to a cache file for future replay. | ||
| - **`"read"`**: Provides tools to the agent to list and execute previously cached trajectories. | ||
| - **`"both"`**: Combines read and write modes - the agent can use existing cached trajectories and will also record new ones. | ||
| - **`None`** (default): No caching is used. The agent executes normally without recording or replaying actions. | ||
| - **`"record"`**: Records all agent actions to a cache file for future replay. | ||
| - **`"execute"`**: Provides tools to the agent to list and execute previously cached trajectories. | ||
| - **`"both"`**: Combines execute and record modes - the agent can use existing cached trajectories and will also record new ones. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like both.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about "auto", as we automatically infer whether to execute or record? |
||
|
|
||
| ## Configuration | ||
|
|
||
|
|
@@ -23,20 +23,20 @@ Caching is configured using the `CachingSettings` class: | |
| from askui.models.shared.settings import CachingSettings, CachedExecutionToolSettings | ||
|
|
||
| caching_settings = CachingSettings( | ||
| strategy="write", # One of: "read", "write", "both", "no" | ||
| strategy="record", # One of: "execute", "record", "both", or None | ||
| cache_dir=".cache", # Directory to store cache files | ||
| filename="my_test.json", # Filename for the cache file (optional for write mode) | ||
| filename="my_test.json", # Filename for the cache file (optional for record mode) | ||
| execute_cached_trajectory_tool_settings=CachedExecutionToolSettings( | ||
| delay_time_between_action=0.5 # Delay in seconds between each cached action | ||
| delay_time_between_actions=0.5 # Delay in seconds between each cached action | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| ### Parameters | ||
|
|
||
| - **`strategy`**: The caching strategy to use (`"read"`, `"write"`, `"both"`, or `"no"`). | ||
| - **`strategy`**: The caching strategy to use (`"execute"`, `"record"`, `"both"`, or `None`). | ||
| - **`cache_dir`**: Directory where cache files are stored. Defaults to `".cache"`. | ||
| - **`filename`**: Name of the cache file to write to or read from. If not specified in write mode, a timestamped filename will be generated automatically (format: `cached_trajectory_YYYYMMDDHHMMSSffffff.json`). | ||
| - **`filename`**: Name of the cache file to write to or read from. If not specified in record mode, a timestamped filename will be generated automatically (format: `cached_trajectory_YYYYMMDDHHMMSSffffff.json`). | ||
| - **`execute_cached_trajectory_tool_settings`**: Configuration for the trajectory execution tool (optional). See [Execution Settings](#execution-settings) below. | ||
|
|
||
| ### Execution Settings | ||
|
|
@@ -47,21 +47,21 @@ The `CachedExecutionToolSettings` class allows you to configure how cached traje | |
| from askui.models.shared.settings import CachedExecutionToolSettings | ||
|
|
||
| execution_settings = CachedExecutionToolSettings( | ||
| delay_time_between_action=0.5 # Delay in seconds between each action (default: 0.5) | ||
| delay_time_between_actions=0.5 # Delay in seconds between each action (default: 0.5) | ||
| ) | ||
| ``` | ||
|
|
||
| #### Parameters | ||
|
|
||
| - **`delay_time_between_action`**: The time to wait (in seconds) between executing consecutive cached actions. This delay helps ensure UI elements can materialize before the next action is executed. Defaults to `0.5` seconds. | ||
| - **`delay_time_between_actions`**: The time to wait (in seconds) between executing consecutive cached actions. This delay helps ensure UI elements can materialize before the next action is executed. Defaults to `0.5` seconds. | ||
|
|
||
| You can adjust this value based on your application's responsiveness: | ||
| - For faster applications or quick interactions, you might use a smaller delay (e.g., `0.1` or `0.2` seconds) | ||
| - For slower applications or complex UI updates, you might need a longer delay (e.g., `1.0` or `2.0` seconds) | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### Writing a Cache (Recording) | ||
| ### Recording a Cache | ||
|
|
||
| Record agent actions to a cache file for later replay: | ||
|
|
||
|
|
@@ -73,7 +73,7 @@ with ComputerAgent() as agent: | |
| agent.act( | ||
| goal="Fill out the login form with username 'admin' and password 'secret123'", | ||
| caching_settings=CachingSettings( | ||
| strategy="write", # you could also use "both" here | ||
| strategy="record", # you could also use "both" here | ||
| cache_dir=".cache", | ||
| filename="login_test.json" | ||
| ) | ||
|
|
@@ -82,7 +82,7 @@ with ComputerAgent() as agent: | |
|
|
||
| After execution, a cache file will be created at `.cache/login_test.json` containing all the tool use actions performed by the agent. | ||
|
|
||
| ### Reading from Cache (Replaying) | ||
| ### Executing from Cache (Replaying) | ||
|
|
||
| Provide the agent with access to previously recorded trajectories: | ||
|
|
||
|
|
@@ -94,13 +94,13 @@ with ComputerAgent() as agent: | |
| agent.act( | ||
| goal="Fill out the login form", | ||
| caching_settings=CachingSettings( | ||
| strategy="read", # you could also use "both" here | ||
| strategy="execute", # you could also use "both" here | ||
| cache_dir=".cache" | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| When using `strategy="read"`, the agent receives two additional tools: | ||
| When using `strategy="execute"`, the agent receives two additional tools: | ||
|
|
||
| 1. **`retrieve_available_trajectories_tool`**: Lists all available cache files in the cache directory | ||
| 2. **`execute_cached_executions_tool`**: Executes a specific cached trajectory | ||
|
|
@@ -109,7 +109,7 @@ The agent will automatically check if a relevant cached trajectory exists and us | |
|
|
||
| ### Referencing Cache Files in Goal Prompts | ||
|
|
||
| When using `strategy="read"` or `strategy="both"`, **you need to inform the agent about which cache files are available and when to use them**. This is done by including cache file information directly in your goal prompt. | ||
| When using `strategy="execute"` or `strategy="both"`, **you need to inform the agent about which cache files are available and when to use them**. This is done by including cache file information directly in your goal prompt. | ||
|
|
||
| #### Explicit Cache File References | ||
|
|
||
|
|
@@ -126,7 +126,7 @@ with ComputerAgent() as agent: | |
| If the cache file "open_website_in_chrome.json" is available, please use it | ||
| for this execution. It will open a new window in Chrome and navigate to the website.""", | ||
| caching_settings=CachingSettings( | ||
| strategy="read", | ||
| strategy="execute", | ||
| cache_dir=".cache" | ||
| ) | ||
| ) | ||
|
|
@@ -149,7 +149,7 @@ with ComputerAgent() as agent: | |
| Check if a cache file named "{test_id}.json" exists. If it does, use it to | ||
| replay the test actions, then verify the results.""", | ||
| caching_settings=CachingSettings( | ||
| strategy="read", | ||
| strategy="execute", | ||
| cache_dir="test_cache" | ||
| ) | ||
| ) | ||
|
|
@@ -171,7 +171,7 @@ with ComputerAgent() as agent: | |
| Choose the most recent one if multiple are available, as it likely contains | ||
| the most up-to-date interaction sequence.""", | ||
| caching_settings=CachingSettings( | ||
| strategy="read", | ||
| strategy="execute", | ||
| cache_dir=".cache" | ||
| ) | ||
| ) | ||
|
|
@@ -195,7 +195,7 @@ with ComputerAgent() as agent: | |
|
|
||
| After each cached execution, verify the step completed successfully before proceeding.""", | ||
| caching_settings=CachingSettings( | ||
| strategy="read", | ||
| strategy="execute", | ||
| cache_dir=".cache" | ||
| ) | ||
| ) | ||
|
|
@@ -219,10 +219,10 @@ with ComputerAgent() as agent: | |
| agent.act( | ||
| goal="Fill out the login form", | ||
| caching_settings=CachingSettings( | ||
| strategy="read", | ||
| strategy="execute", | ||
| cache_dir=".cache", | ||
| execute_cached_trajectory_tool_settings=CachedExecutionToolSettings( | ||
| delay_time_between_action=1.0 # Wait 1 second between each action | ||
| delay_time_between_actions=1.0 # Wait 1 second between each action | ||
| ) | ||
| ) | ||
| ) | ||
|
|
@@ -323,7 +323,7 @@ The delay between actions can be customized using `CachedExecutionToolSettings` | |
| ## Limitations | ||
|
|
||
| - **UI State Sensitivity**: Cached trajectories assume the UI is in the same state as when they were recorded. If the UI has changed, the replay may fail or produce incorrect results. | ||
| - **No on_message Callback**: When using `strategy="write"` or `strategy="both"`, you cannot provide a custom `on_message` callback, as the caching system uses this callback to record actions. | ||
| - **No on_message Callback**: When using `strategy="record"` or `strategy="both"`, you cannot provide a custom `on_message` callback, as the caching system uses this callback to record actions. | ||
| - **Verification Required**: After executing a cached trajectory, the agent should verify that the results are correct, as UI changes may cause partial failures. | ||
|
|
||
| ## Example: Complete Test Workflow | ||
|
|
@@ -340,7 +340,7 @@ with ComputerAgent() as agent: | |
| agent.act( | ||
| goal="Navigate to the login page and log in with username 'testuser' and password 'testpass123'", | ||
| caching_settings=CachingSettings( | ||
| strategy="write", | ||
| strategy="record", | ||
| cache_dir="test_cache", | ||
| filename="user_login.json" | ||
| ) | ||
|
|
@@ -356,10 +356,10 @@ with ComputerAgent() as agent: | |
| the login sequence. It contains the steps to navigate to the login page and | ||
| authenticate with the test credentials.""", | ||
| caching_settings=CachingSettings( | ||
| strategy="read", | ||
| strategy="execute", | ||
| cache_dir="test_cache", | ||
| execute_cached_trajectory_tool_settings=CachedExecutionToolSettings( | ||
| delay_time_between_action=1.0 | ||
| delay_time_between_actions=1.0 | ||
| ) | ||
| ) | ||
| ) | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.