Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .beads/issues.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@
{"id":"openadapt-evals-hvm","title":"VL model fix PR #18 ready to merge","notes":"2026-02-08: openadapt-ml PR #18 was already merged on 2026-01-29. VL model fix is done.","status":"closed","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-29T16:17:03.491938-05:00","created_by":"Richard Abrich","updated_at":"2026-02-08T12:55:19.233249-05:00","closed_at":"2026-02-08T12:55:19.233249-05:00","close_reason":"PR #18 already merged 2026-01-29"}
{"id":"openadapt-evals-mx8","title":"Analyze evaluation results and publish findings","description":"After demo-conditioned evaluation completes, analyze results: success rates, failure modes, demo impact. Create data-driven roadmap for improvements.","status":"open","priority":1,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-02-14T12:23:06.328838-05:00","created_by":"Richard Abrich","updated_at":"2026-02-14T12:23:06.328838-05:00"}
{"id":"openadapt-evals-sz4","title":"RCA: Windows product key prompt recurring issue","status":"closed","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-20T18:59:36.266286-05:00","created_by":"Richard Abrich","updated_at":"2026-01-20T20:32:06.493102-05:00","closed_at":"2026-01-20T20:32:06.493102-05:00","close_reason":"RCA complete - root cause is VERSION mismatch (CLI=11, Dockerfile=11e). Fix documented in RECURRING_ISSUES.md and WINDOWS_PRODUCT_KEY_RCA.md"}
{"id":"openadapt-evals-vcb","title":"Run demo-conditioned WAA evaluation","description":"Once demos are recorded, run WAA evaluation with demo-conditioned agents (RetrievalAugmentedAgent with real demos). Target: measure improvement over zero-shot baseline. Requires real demos from recording task.","notes":"Feb 28: 6 design docs created (code health, marketing, CLI DX, testing, infra, docs). Marketing materials drafted and polished. Prioritization documented in STATUS.md. Tier 1 blockers identified: version fix, PyAutoGUI fail-safe recovery, socat systemd service, auto-open viewer. Next: implement Tier 1 items to unblock reliable eval runs.","status":"open","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-02-14T12:23:04.624305-05:00","created_by":"Richard Abrich","updated_at":"2026-02-28T11:25:44.494548-05:00"}
{"id":"openadapt-evals-vcb","title":"Run demo-conditioned WAA evaluation","description":"Once demos are recorded, run WAA evaluation with demo-conditioned agents (RetrievalAugmentedAgent with real demos). Target: measure improvement over zero-shot baseline. Requires real demos from recording task.","notes":"2026-03-01: GPU grant applications reviewed and rewritten (11 files). Writing done, blocked on eval results (DC signal on harder tasks). Detailed status tracked in openadapt-internal (private repo).","status":"open","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-02-14T12:23:04.624305-05:00","created_by":"Richard Abrich","updated_at":"2026-03-01T23:35:11.042286-05:00"}
{"id":"openadapt-evals-wis","title":"Add pre-flight check to detect Windows install issues","status":"closed","priority":1,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-20T18:59:36.865052-05:00","created_by":"Richard Abrich","updated_at":"2026-01-20T20:32:06.757261-05:00","closed_at":"2026-01-20T20:32:06.757261-05:00","close_reason":"Duplicate of openadapt-evals-0dt"}
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ The Dockerfile also pre-downloads LibreOffice at build time with dynamic version

When a task config includes `related_apps`, the live adapter automatically prepends a `verify_apps` step before the task's setup config. The `--verify` flag on `record_waa_demos.py` provides a pre-flight check across all tasks before starting a recording session.

![LibreOffice Calc installed on Windows 11 VM](https://github.com/OpenAdaptAI/openadapt-evals/releases/download/untagged-42f27f1e47214aae8358/waa_libreoffice.png)
![LibreOffice Calc running inside Windows 11 QEMU VM via noVNC in Chrome](screenshots/waa_libreoffice_desktop.png)

## CLI Reference

Expand Down
67 changes: 67 additions & 0 deletions demo_prompts/04d9aeaf-7bed-4024-bedb-e10e6f00eb7f-WOS.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
DEMONSTRATION:
Task: In a new sheet with 4 headers "Year", "CA changes", "FA changes", and "OA changes", calculate the annual changes for the Current Assets, Fixed Assets, and Other Assets columns. Set the results as percentage type.

Step 1:
Action: Right-click on the "Sheet1" tab at the bottom and select "Insert Sheet" or "New Sheet"

Step 2:
Action: Click cell A1 and type "Year"

Step 3:
Action: Press Tab and type "CA changes"

Step 4:
Action: Press Tab and type "FA changes"

Step 5:
Action: Press Tab and type "OA changes"

Step 6:
Action: Click cell A2 and type "2015"

Step 7:
Action: Press Enter and type "2016"

Step 8:
Action: Press Enter and type "2017"

Step 9:
Action: Press Enter and type "2018"

Step 10:
Action: Press Enter and type "2019"

Step 11:
Action: Click cell B2 and type "=(Sheet1.B3-Sheet1.B2)/Sheet1.B2"

Step 12:
Action: Press Enter

Step 13:
Action: Click cell B2, then drag the fill handle down to B6

Step 14:
Action: Click cell C2 and type "=(Sheet1.C3-Sheet1.C2)/Sheet1.C2"

Step 15:
Action: Press Enter

Step 16:
Action: Click cell C2, then drag the fill handle down to C6

Step 17:
Action: Click cell D2 and type "=(Sheet1.D3-Sheet1.D2)/Sheet1.D2"

Step 18:
Action: Press Enter

Step 19:
Action: Click cell D2, then drag the fill handle down to D6

Step 20:
Action: Click and drag to select cells B2:D6

Step 21:
Action: Click the % button in the toolbar (or press Ctrl+Shift+5)

---
130 changes: 130 additions & 0 deletions demo_prompts_vlm/04d9aeaf-7bed-4024-bedb-e10e6f00eb7f-WOS.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
DEMONSTRATION:
Task: In a new sheet with 4 headers "Year", "CA changes", "FA changes", and "OA changes", calculate the annual changes for the Current Assets, Fixed Assets, and Other Assets columns. Set the results as percentage type.

Step 1:
Observation: The spreadsheet is open to "Sheet1," which contains financial data with columns for years, assets, and liabilities.
Intent: To create a new sheet for calculating and displaying annual changes in asset categories.
Action: Right-click on the "Sheet1" tab at the bottom and select "Insert Sheet" or "New Sheet".
Result: A new, blank sheet named "Sheet2" is added to the workbook and displayed.

Step 2:
Observation: The new sheet is blank, with cell A1 selected and ready for input.
Intent: To create a header row for organizing annual asset changes.
Action: Click cell A1 and type "Year"
Result: The text "Year" now appears in cell A1.

Step 3:
Observation: The new sheet contains only the header "Year" in cell A1, with the cursor positioned in cell B1.
Intent: To add the next header, "CA changes," as part of setting up the required columns for annual asset changes.
Action: Press Tab and type "CA changes"
Result: The header "CA changes" is entered in cell B1, and the sheet now displays two headers: "Year" and "CA changes."

Step 4:
Observation: The new sheet contains two headers, "Year" in cell A1 and "CA changes" in cell B1, with the cursor positioned in cell C1.
Intent: To continue setting up the required headers for calculating annual changes in asset categories.
Action: Press Tab and type "FA changes".
Result: The header "FA changes" is added to cell C1.

Step 5:
Observation: The new sheet contains three headers: "Year", "CA changes", and "FA changes" in the first row.
Intent: To complete the set of required headers by adding "OA changes" as the fourth column.
Action: Press Tab and type "OA changes".
Result: The "OA changes" header is added in cell D1, completing the four required headers in the first row.

Step 6:
Observation: The new sheet contains four headers: "Year", "CA changes", "FA changes", and "OA changes", with all cells below the headers empty.
Intent: To begin entering annual data by specifying the first year in the "Year" column.
Action: Click cell A2 and type "2015".
Result: The value "2015" appears in cell A2 under the "Year" header.

Step 7:
Observation: The new sheet contains headers "Year", "CA changes", "FA changes", and "OA changes" in row 1, with cell A2 currently empty.
Intent: To begin entering the years for which asset changes will be calculated.
Action: Press Enter and type "2016" in cell A2.
Result: The value "2016" appears in cell A2, starting the list of years under the "Year" header.

Step 8:
Observation: The new sheet contains the headers "Year", "CA changes", "FA changes", and "OA changes" in row 1, and "2016" is entered in cell A2.
Intent: To begin listing the years for which annual changes will be calculated.
Action: Press Enter and type "2017".
Result: "2017" is entered into cell A3, directly below "2016" in the "Year" column.

Step 9:
Observation: The new sheet contains the headers "Year", "CA changes", "FA changes", and "OA changes", with the years 2016 and 2017 already listed in the "Year" column.
Intent: To continue populating the "Year" column with the next chronological year for annual change calculations.
Action: Press Enter and type "2018".
Result: The value "2018" appears in the next row of the "Year" column.

Step 10:
Observation: The new sheet contains four headers ("Year", "CA changes", "FA changes", "OA changes") and a column of years from 2015 to 2018 under "Year".
Intent: To add the next year in the sequence to the "Year" column in preparation for calculating annual changes.
Action: Press Enter and type "2019".
Result: The value "2019" appears in the next row under the "Year" column.

Step 11:
Observation: The new sheet contains headers "Year", "CA changes", "FA changes", and "OA changes", with years listed in column A, but no formulas or data in the other columns.
Intent: To calculate the annual percentage change for Current Assets using data from Sheet1.
Action: Click cell B2 and type "=(Sheet1.B3-Sheet1.B2)/Sheet1.B2".
Result: Cell B2 now contains a formula to compute the percentage change in Current Assets between two years from Sheet1.

Step 12:
Observation: The new sheet contains headers "Year", "CA changes", "FA changes", and "OA changes", and the first row under "Year" is filled with "2016", while the "CA changes" cell (B2) contains a formula referencing another sheet.
Intent: To confirm and apply the formula entered in cell B2 for calculating annual changes in Current Assets.
Action: Press Enter
Result: The formula in cell B2 is executed, displaying the calculated percentage change for "CA changes" in 2016.

Step 13:
Observation: The sheet displays headers "Year", "CA changes", "FA changes", and "OA changes", with cell B2 containing a formula and the rest of column B empty.
Intent: To copy the formula in B2 down the column so that annual changes for "CA changes" are calculated for each year.
Action: Click cell B2, then drag the fill handle down to B6.
Result: The formula from B2 is filled down to B6, calculating "CA changes" for all listed years.

Step 14:
Observation: The new sheet contains headers "Year", "CA changes", "FA changes", and "OA changes", and cell C2 is empty.
Intent: To calculate the annual change for Fixed Assets as a percentage using data from Sheet1.
Action: Click cell C2 and type "=(Sheet1.C3-Sheet1.C2)/Sheet1.C2"
Result: Cell C2 now contains the formula to compute the percentage change in Fixed Assets between two years from Sheet1.

Step 15:
Observation: The new sheet contains headers "Year", "CA changes", "FA changes", and "OA changes" in row 1, and the first row of data is being entered in row 2.
Intent: To confirm or complete the entry of a formula or value in the spreadsheet.
Action: Press Enter.
Result: The cell selection moves from the current cell (D2) to the next cell below (D3), confirming the entry in D2.

Step 16:
Observation: The sheet displays headers "Year", "CA changes", "FA changes", and "OA changes" with a formula present in cell C2, but the rest of column C is empty.
Intent: To apply the formula in C2 to the rest of the rows in the "CA changes" column for consistent annual change calculations.
Action: Click cell C2, then drag the fill handle down to C6.
Result: The formula from C2 is copied down to C6, populating the "CA changes" column with calculated values for each year.

Step 17:
Observation: The new sheet contains headers "Year", "CA changes", "FA changes", and "OA changes" in row 1, with all other cells empty.
Intent: To calculate the annual percentage change for the "OA changes" column using data from Sheet1.
Action: Click cell D2 and type "=(Sheet1.D3-Sheet1.D2)/Sheet1.D2"
Result: Cell D2 is populated with the formula to compute the percentage change for "OA changes" between two years.

Step 18:
Observation: The formula `=(C3-C2)/C2` is being entered in cell D3 to calculate the percentage change for "OA changes".
Intent: To compute the annual percentage change for Other Assets in the new sheet.
Action: Press Enter
Result: The formula is executed in cell D3, displaying the calculated percentage change for Other Assets.

Step 19:
Observation: The sheet displays headers "Year", "CA changes", "FA changes", and "OA changes" with a formula present only in cell D2.
Intent: To copy the formula in D2 down the "OA changes" column for all relevant rows.
Action: Click cell D2, then drag the fill handle down to D6.
Result: The formula from D2 is filled down through D6, calculating values for each row in the "OA changes" column.

Step 20:
Observation: The new sheet contains four headers ("Year", "CA changes", "FA changes", "OA changes") and calculated annual changes for each asset type in columns B, C, and D.
Intent: To select the range of calculated annual changes and their corresponding years for further formatting or analysis.
Action: Click and drag to select cells B2:D6.
Result: Cells B2:D6 are highlighted, indicating they are selected for the next operation.

Step 21:
Observation: The annual changes for "CA changes", "FA changes", and "OA changes" are displayed as decimal values in columns B, C, and D.
Intent: To format the annual change values as percentages for better readability.
Action: Click the % button in the toolbar (or press Ctrl+Shift+5).
Result: The values in columns B, C, and D are now displayed as percentages.

---
14 changes: 13 additions & 1 deletion openadapt_evals/infrastructure/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,25 @@

# Restart Windows inside QEMU
from openadapt_evals.infrastructure import QEMUResetManager
mgr = QEMUResetManager(vm_ip="172.173.66.131")
mgr = QEMUResetManager(vm_ip="10.0.0.1")
success, msg = mgr.restart_windows()

# Auto-detect VM IP
from openadapt_evals.infrastructure import resolve_vm_ip
ip = resolve_vm_ip() # pool registry → Azure CLI
```
"""

from openadapt_evals.infrastructure.azure_ops_tracker import AzureOpsTracker
from openadapt_evals.infrastructure.azure_vm import AzureVMManager
from openadapt_evals.infrastructure.pool import PoolManager, PoolRunResult
from openadapt_evals.infrastructure.qemu_reset import QEMUResetManager
from openadapt_evals.infrastructure.screen_stability import (
compare_screenshots,
wait_for_stable_screen,
)
from openadapt_evals.infrastructure.ssh_tunnel import SSHTunnelManager, get_tunnel_manager
from openadapt_evals.infrastructure.vm_ip import resolve_vm_ip
from openadapt_evals.infrastructure.vm_monitor import VMMonitor, VMConfig

__all__ = [
Expand All @@ -43,5 +52,8 @@
"VMMonitor",
"VMConfig",
"SSHTunnelManager",
"compare_screenshots",
"get_tunnel_manager",
"resolve_vm_ip",
"wait_for_stable_screen",
]
2 changes: 1 addition & 1 deletion openadapt_evals/infrastructure/qemu_reset.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

from openadapt_evals.infrastructure.qemu_reset import QEMUResetManager

mgr = QEMUResetManager(vm_ip="172.173.66.131")
mgr = QEMUResetManager(vm_ip="10.0.0.1")

# Full restart: send reset + wait for WAA server
success, message = mgr.restart_windows()
Expand Down
Loading