Skip to content

Commit af69221

Browse files
abrichrclaude
andcommitted
fix: pre-fetch task configs before QEMU reset to avoid stale socat
The evaluate server (localhost:5050) goes through a socat bridge that can become stale after container/VM restarts. Pre-fetching all task configs before the QEMU reset ensures human-readable instructions are cached in memory even if the bridge dies later. Falls back to live fetch with retry on cache miss. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8c11f0e commit af69221

2 files changed

Lines changed: 52 additions & 28 deletions

File tree

.beads/issues.jsonl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
{"id":"openadapt-evals-dke","title":"SYSTEM: Create knowledge persistence workflow using Beads","description":"Every fix/approach must be logged as a Beads issue with:\n1. Problem description\n2. Attempted solution\n3. Result (worked/failed/partial)\n4. Root cause if known\n5. Files changed\n\nBefore any fix attempt, agent MUST:\n1. Run 'bd list --labels=fix,approach' to see prior attempts\n2. Review what was tried before\n3. Document new attempt BEFORE implementing\n\nAfter context compaction, first action:\n1. Run 'bd ready' for current tasks\n2. Run 'bd list --labels=recurring' for known recurring issues\n3. Check docs/RECURRING_ISSUES.md for patterns","status":"closed","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-20T19:00:18.155796-05:00","created_by":"Richard Abrich","updated_at":"2026-02-23T16:21:13.18811-05:00","closed_at":"2026-02-14T12:22:52.357373-05:00"}
1212
{"id":"openadapt-evals-gna","title":"Test simplified Dockerfile (Azure mode)","description":"Testing Dockerfile.simplified which uses vanilla WAA Azure mode: native OEM mechanism (C:\\oem), InstallFrom element for unattended install, VERSION=11e for no product key. Steps: 1) Delete current VM 2) Create fresh VM 3) Build simplified image 4) Test Windows installation via QEMU screenshots","notes":"2026-01-22: Confirmed the blocker is not just docker pull; even starting the existing 'winarena' container via az vm run-command timed out.\n\n- smoke-live tried to run docker start winarena via run-command and timed out (900s)\n- WAA server remained unreachable at http://172.171.112.41:5000\n- VM was deallocated after the attempt\n\nImplication: VM/docker state is unhealthy or container start is hanging (possibly due to incomplete image extraction / stuck daemon / disk pressure).\nNext: add/run a vm-debug command to capture docker/system logs and determine whether to rebuild VM/image, pin/mirror image (ACR), or adjust docker config.","status":"closed","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-21T12:47:15.12243-05:00","created_by":"Richard Abrich","updated_at":"2026-02-23T16:21:13.188539-05:00","closed_at":"2026-02-08T13:23:34.84444-05:00","labels":["testing","waa"],"comments":[{"id":3,"issue_id":"openadapt-evals-gna","author":"Richard Abrich","text":"Session Recovery 2026-01-22 17:58: Previous agents killed during compaction. VM state: Docker/containerd unhealthy, disk /mnt only 32GB (need 47GB+ for vanilla WAA). Git-lfs failing. User feedback: 1) use beads, 2) larger disk, 3) clean up CLI, 4) vanilla WAA config.","created_at":"2026-01-22T18:05:45Z"},{"id":4,"issue_id":"openadapt-evals-gna","author":"Richard Abrich","text":"Launched 3 parallel agents: ae159fc (VM disk upgrade), aabad47 (CLI cleanup), aee4e8a (fix containerd). Check /private/tmp/claude/-Users-abrichr-oa-src-openadapt-ml/tasks/*.output for results.","created_at":"2026-01-22T18:06:18Z"},{"id":5,"issue_id":"openadapt-evals-gna","author":"Richard Abrich","text":"WORKFLOW DOCUMENTED: VM config changes = delete VM -\u003e update code -\u003e relaunch. Added to CLAUDE.md. Default VM size now D8ds_v5 (300GB). Launching fresh VM now.","created_at":"2026-01-22T18:09:12Z"},{"id":6,"issue_id":"openadapt-evals-gna","author":"Richard Abrich","text":"2026-01-22 18:20: VM resources cleaned up, launched agent a9be1f8 to add auto-cleanup to CLI, WAA setup retrying in background (b04fcbe). Workflow documented in CLAUDE.md and STATUS.md.","created_at":"2026-01-22T18:11:56Z"},{"id":7,"issue_id":"openadapt-evals-gna","author":"Richard Abrich","text":"2026-01-22 18:30: VM created with D8s_v3 fallback (D8ds_v5 quota 0), IP 20.120.37.97. Restored waa_deploy symlink. Docker image building. W\u0026B integration agent a21c3ef running.","created_at":"2026-01-22T18:25:29Z"},{"id":8,"issue_id":"openadapt-evals-gna","author":"Richard Abrich","text":"2026-01-22 19:05: WAA Docker image built successfully! Container running. Windows booting. VM: 20.120.37.97, VNC: http://20.120.37.97:8006","created_at":"2026-01-22T18:47:03Z"}]}
1313
{"id":"openadapt-evals-hvm","title":"VL model fix PR #18 ready to merge","notes":"2026-02-08: openadapt-ml PR #18 was already merged on 2026-01-29. VL model fix is done.","status":"closed","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-29T16:17:03.491938-05:00","created_by":"Richard Abrich","updated_at":"2026-02-08T12:55:19.233249-05:00","closed_at":"2026-02-08T12:55:19.233249-05:00","close_reason":"PR #18 already merged 2026-01-29"}
14-
{"id":"openadapt-evals-mx8","title":"Analyze evaluation results and publish findings","description":"After demo-conditioned evaluation completes, analyze results: success rates, failure modes, demo impact. Create data-driven roadmap for improvements.","status":"open","priority":1,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-02-14T12:23:06.328838-05:00","created_by":"Richard Abrich","updated_at":"2026-02-14T12:23:06.328838-05:00"}
14+
{"id":"openadapt-evals-mx8","title":"Analyze evaluation results and publish findings","description":"After demo-conditioned evaluation completes, analyze results: success rates, failure modes, demo impact. Create data-driven roadmap for improvements.","notes":"wright repo (OpenAdaptAI/wright) scaffolding underway. Herald + consilium repos transferred to OpenAdaptAI org. Wright will be the orchestration layer for eval pipeline.","status":"open","priority":1,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-02-14T12:23:06.328838-05:00","created_by":"Richard Abrich","updated_at":"2026-03-01T17:46:08.553556-05:00"}
1515
{"id":"openadapt-evals-sz4","title":"RCA: Windows product key prompt recurring issue","status":"closed","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-20T18:59:36.266286-05:00","created_by":"Richard Abrich","updated_at":"2026-01-20T20:32:06.493102-05:00","closed_at":"2026-01-20T20:32:06.493102-05:00","close_reason":"RCA complete - root cause is VERSION mismatch (CLI=11, Dockerfile=11e). Fix documented in RECURRING_ISSUES.md and WINDOWS_PRODUCT_KEY_RCA.md"}
16-
{"id":"openadapt-evals-vcb","title":"Run demo-conditioned WAA evaluation","description":"Once demos are recorded, run WAA evaluation with demo-conditioned agents (RetrievalAugmentedAgent with real demos). Target: measure improvement over zero-shot baseline. Requires real demos from recording task.","notes":"2026-03-01: GPU grant applications reviewed and rewritten (11 files). Writing done, blocked on eval results (DC signal on harder tasks). Detailed status tracked in openadapt-internal (private repo).","status":"open","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-02-14T12:23:04.624305-05:00","created_by":"Richard Abrich","updated_at":"2026-03-01T13:57:25.582064-05:00"}
16+
{"id":"openadapt-evals-vcb","title":"Run demo-conditioned WAA evaluation","description":"Once demos are recorded, run WAA evaluation with demo-conditioned agents (RetrievalAugmentedAgent with real demos). Target: measure improvement over zero-shot baseline. Requires real demos from recording task.","notes":"wright repo created (OpenAdaptAI/wright), scaffolding in progress. Herald + consilium transferred to OpenAdaptAI org.","status":"open","priority":0,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-02-14T12:23:04.624305-05:00","created_by":"Richard Abrich","updated_at":"2026-03-01T17:45:50.958358-05:00"}
1717
{"id":"openadapt-evals-wis","title":"Add pre-flight check to detect Windows install issues","status":"closed","priority":1,"issue_type":"task","owner":"richard.abrich@gmail.com","created_at":"2026-01-20T18:59:36.865052-05:00","created_by":"Richard Abrich","updated_at":"2026-01-20T20:32:06.757261-05:00","closed_at":"2026-01-20T20:32:06.757261-05:00","close_reason":"Duplicate of openadapt-evals-0dt"}

scripts/record_waa_demos.py

Lines changed: 50 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1513,17 +1513,40 @@ def cmd_record_waa(
15131513
if not connected:
15141514
return
15151515

1516+
# Pre-fetch all task configs BEFORE QEMU reset. The evaluate server
1517+
# (localhost:5050) goes through a socat bridge that can become stale
1518+
# after container/VM restarts. Fetching early ensures we have the
1519+
# human-readable instructions cached even if the bridge dies later.
1520+
print("Pre-fetching task configs from evaluate server...")
1521+
task_configs_cache: dict[str, dict] = {}
1522+
fetch_failures = 0
1523+
for task_id in task_ids:
1524+
try:
1525+
resp = requests.get(f"{evaluate_url}/task/{task_id}", timeout=10)
1526+
if resp.ok:
1527+
task_configs_cache[task_id] = resp.json()
1528+
except Exception:
1529+
fetch_failures += 1
1530+
if task_configs_cache:
1531+
print(f" Cached {len(task_configs_cache)}/{len(task_ids)} task config(s).")
1532+
if fetch_failures:
1533+
print(f" WARNING: {fetch_failures} task config(s) failed to fetch from {evaluate_url}.")
1534+
print(f" Step generation will use task IDs instead of instructions for those tasks.")
1535+
print(f" (Is the evaluate server / socat proxy running?)")
1536+
if not task_configs_cache and len(task_ids) > 0:
1537+
print(f" ERROR: Could not fetch ANY task configs from {evaluate_url}.")
1538+
print(f" The evaluate server may be down. Check socat proxy on the VM.")
1539+
answer = input(" Continue anyway? [y/N] ").strip().lower()
1540+
if answer not in ("y", "yes"):
1541+
return
1542+
print()
1543+
15161544
# Pre-flight: verify all required apps are installed
15171545
if verify:
15181546
print("Verifying required apps across all tasks...")
15191547
all_apps: set[str] = set()
1520-
for task_id in task_ids:
1521-
try:
1522-
resp = requests.get(f"{evaluate_url}/task/{task_id}", timeout=10)
1523-
if resp.ok:
1524-
all_apps.update(resp.json().get("related_apps", []))
1525-
except Exception:
1526-
pass
1548+
for tc in task_configs_cache.values():
1549+
all_apps.update(tc.get("related_apps", []))
15271550
if all_apps:
15281551
resp = requests.post(
15291552
f"{evaluate_url}/setup",
@@ -1579,26 +1602,27 @@ def cmd_record_waa(
15791602
task_dir = output_dir / task_id
15801603
task_dir.mkdir(parents=True, exist_ok=True)
15811604

1582-
# Load task config from evaluate server (retry on transient errors)
1583-
instruction = task_id # fallback
1584-
task_config = {}
1585-
for _attempt in range(3):
1586-
try:
1587-
task_resp = requests.get(
1588-
f"{evaluate_url}/task/{task_id}", timeout=10
1589-
)
1590-
if task_resp.ok:
1591-
task_config = task_resp.json()
1592-
instruction = task_config.get(
1593-
"instruction", task_config.get("task", task_id)
1605+
# Load task config — prefer pre-fetched cache, fall back to live fetch
1606+
task_config = task_configs_cache.get(task_id, {})
1607+
if not task_config:
1608+
# Try live fetch (cache miss or evaluate server was down earlier)
1609+
for _attempt in range(3):
1610+
try:
1611+
task_resp = requests.get(
1612+
f"{evaluate_url}/task/{task_id}", timeout=10
15941613
)
1595-
break
1596-
except Exception as e:
1597-
if _attempt < 2:
1598-
print(f" Warning: task config fetch failed ({e}), retrying...")
1599-
time.sleep(2)
1600-
else:
1601-
print(f" Warning: could not load task config after 3 attempts: {e}")
1614+
if task_resp.ok:
1615+
task_config = task_resp.json()
1616+
break
1617+
except Exception as e:
1618+
if _attempt < 2:
1619+
print(f" Warning: task config fetch failed ({e}), retrying...")
1620+
time.sleep(2)
1621+
else:
1622+
print(f" Warning: could not load task config after 3 attempts: {e}")
1623+
instruction = task_config.get(
1624+
"instruction", task_config.get("task", task_id)
1625+
)
16021626

16031627
def _setup_task_env() -> None:
16041628
"""Run task setup config (download files, open apps, etc.)."""

0 commit comments

Comments
 (0)