Skip to content

Commit 748534b

Browse files
abrichrclaude
andauthored
feat: automate full VM lifecycle in correction flywheel script (#186)
Integrate all manual infrastructure steps so the flywheel runs end-to-end deterministically with a single command: python scripts/run_correction_flywheel.py \ --task-config example_tasks/clear-browsing-data-chrome.yaml \ --demo-dir ./demos --manage-vm --setup-tunnels New infrastructure functions (inline, matching azure_vm.py patterns): - start_vm / get_vm_ip / get_vm_state / wait_for_ssh / deallocate_vm - start_container (docker start or docker run with correct flags) - apply_iptables_fix (exempt port 5050 from DNAT, idempotent) - setup_tunnels (kill stale, create SSH tunnels for 5001/5050/8006) - setup_eval_proxy (socat bridge for evaluate server) - wait_for_waa (poll /probe through tunnel) Design decisions: - --manage-vm flag: opt-in VM start/deallocate lifecycle - --setup-tunnels flag: opt-in tunnel setup with port cleanup - --baseline-model / --guided-model: use different planner models for Phase 1 vs Phase 3 (e.g., gpt-4o-mini baseline to ensure failure) - VM deallocate in try/finally (always runs, even on error) - Phase errors are caught individually; report always generated with partial results - All operations are idempotent (safe to re-run) - --mock mode unchanged (no VM management needed) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e531205 commit 748534b

1 file changed

Lines changed: 827 additions & 54 deletions

File tree

0 commit comments

Comments
 (0)