OpenAdaptAI
diff --git a/‎CLAUDE.md‎
Lines changed: 40 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 46 additions & 0 deletions b/‎README.md‎
Lines changed: 46 additions & 0 deletions
diff --git a/‎docs/REPOSITORY_HISTORY.md‎
Lines changed: 70 additions & 0 deletions b/‎docs/REPOSITORY_HISTORY.md‎
Lines changed: 70 additions & 0 deletions
@@ -1,5 +1,16 @@
 # Claude Context for openadapt-ml
 
+## Project Status & Priorities
+
+**IMPORTANT**: Before starting work, always check the project-wide status document:
+- **Location**: `/Users/abrichr/oa/src/STATUS.md`
+- **Purpose**: Tracks P0 priorities, active background tasks, blockers, and strategic decisions
+- **Action**: Read this file at the start of every session to understand current priorities
+
+This ensures continuity between Claude Code sessions and context compactions.
+
+---
+
 This file helps maintain context across sessions.
 
 ---
@@ -18,9 +29,32 @@ This file helps maintain context across sessions.
 uv run python -m openadapt_ml.benchmarks.cli vm monitor
 ```
 
+**ENHANCED FEATURES (as of Jan 2026):**
+The `vm monitor` command now provides comprehensive VM usage visibility:
+- **VM Status**: Real-time VM state, size, and IP
+- **Activity Detection**: What the VM is currently doing (idle, benchmark running, setup)
+- **Cost Tracking**: Current uptime, hourly rate, and total cost for session
+- **Azure ML Jobs**: Recent jobs from last 7 days with status
+- **Evaluation History**: Past benchmark runs and success rates (with --details flag)
+- **Dashboard & Tunnels**: Auto-starts web dashboard and SSH/VNC tunnels
+
+**Usage:**
+```bash
+# Basic monitoring
+uv run python -m openadapt_ml.benchmarks.cli vm monitor
+
+# With detailed information (costs per day/week, evaluation history)
+uv run python -m openadapt_ml.benchmarks.cli vm monitor --details
+
+# With auto-shutdown after 2 hours
+uv run python -m openadapt_ml.benchmarks.cli vm monitor --auto-shutdown-hours 2
+```
+
 **WHY THIS MATTERS:**
 - VNC is ONLY accessible via SSH tunnel at `localhost:8006` (NOT the public IP)
 - The dashboard auto-manages SSH tunnels
+- Shows real-time costs to prevent budget overruns
+- Tracks all Azure ML jobs for visibility into what's running
 - Without it, you cannot see what Windows is doing
 - The user WILL be frustrated if you keep forgetting this
 
@@ -120,6 +154,12 @@ openadapt-ml is a model-agnostic, domain-agnostic ML engine for GUI automation a
 - With demo: 100% correct first actions
 - See `docs/experiments/demo_conditioned_prompting_results.md`
 
+**✅ VALIDATED (Jan 17, 2026)**: Demo persistence fix is working
+- The P0 fix in `openadapt-evals` ensures demo is included at EVERY step, not just step 1
+- Mock test confirms: agent behavior changes from 6.8 avg steps (random) to 3.0 avg steps (focused)
+- See `openadapt-evals/CLAUDE.md` for full validation details
+- **Next step**: Run full WAA evaluation (154 tasks) to measure episode success improvement
+
 **Next step**: Build demo retrieval to automatically select relevant demos from a library.
 
 **Key insight**: OpenAdapt's value is **trajectory-conditioned disambiguation of UI affordances**, not "better reasoning".
 
@@ -781,6 +781,52 @@ uv run python -m openadapt_ml.cloud.local serve --port 8080 --open
 
 *View benchmark evaluation results with task-level filtering, success/failure status, and run comparison. Shows Claude achieving 30% on mock evaluation tasks (simulated environment for testing the pipeline - real WAA evaluation requires Windows VMs).*
 
+### 13.4 VM Monitoring Dashboard
+
+For managing Azure VMs used in benchmark evaluations, the `vm monitor` command provides a comprehensive dashboard:
+
+```bash
+# Start VM monitoring dashboard (auto-opens browser)
+uv run python -m openadapt_ml.benchmarks.cli vm monitor
+
+# Show detailed information (evaluation history, daily/weekly costs)
+uv run python -m openadapt_ml.benchmarks.cli vm monitor --details
+```
+
+**VM Monitor Dashboard (Full View):**
+
+![VM Monitor Dashboard](docs/screenshots/vm_monitor_dashboard_full.png)
+
+*The VM monitor dashboard shows: (1) VM status (name, IP, size, state), (2) Current activity (idle/benchmark running), (3) Cost tracking (uptime, hourly rate, total cost), (4) Recent Azure ML jobs from last 7 days, and (6) Dashboard & access URLs.*
+
+**VM Monitor Dashboard (With --details Flag):**
+
+![VM Monitor Dashboard Details](docs/screenshots/vm_monitor_details.png)
+
+*The --details flag adds: (5) Evaluation history with success rates and agent types, plus extended cost information (daily/weekly projections).*
+
+**Features:**
+- **Real-time VM status** - Shows VM size, power state, and IP address
+- **Activity detection** - Identifies if VM is idle, running benchmarks, or in setup
+- **Cost tracking** - Displays uptime hours, hourly rate, and total cost for current session
+- **Azure ML jobs** - Lists recent jobs from last 7 days with status indicators
+- **Evaluation history** - Shows past benchmark runs with success rates (with --details flag)
+- **Dashboard & tunnels** - Auto-starts web dashboard and SSH/VNC tunnels for accessing Windows VM
+
+**Mock mode for testing:**
+```bash
+# Generate screenshots or test dashboard without a VM running
+uv run python -m openadapt_ml.benchmarks.cli vm monitor --mock
+```
+
+**Auto-shutdown option:**
+```bash
+# Automatically deallocate VM after 2 hours to prevent runaway costs
+uv run python -m openadapt_ml.benchmarks.cli vm monitor --auto-shutdown-hours 2
+```
+
+For complete VM management commands and Azure setup instructions, see [`CLAUDE.md`](CLAUDE.md) and [`docs/azure_waa_setup.md`](docs/azure_waa_setup.md).
+
 ---
 
 ## 14. Limitations & Notes
 
@@ -0,0 +1,70 @@
+# Repository History
+
+Documentation of deprecated and archived OpenAdapt ecosystem projects.
+
+## Deprecated/Archived Projects
+
+### OpenAdapter (Archived January 2026)
+
+**Repository**: https://github.com/OpenAdaptAI/OpenAdapter (ARCHIVED)
+**Status**: Incomplete proof-of-concept from before OpenAdapt refactor
+
+**Why Archived**:
+- Incomplete proof-of-concept code (only 165 lines, missing imports)
+- Created October 2024, minimal activity (14 commits, only 1 contributor)
+- Cloud infrastructure now handled by `openadapt_ml/cloud/` module
+- No active development, zero ecosystem usage
+- Last substantial commit was February 2025 (marked as WIP)
+
+**Original Purpose**:
+Attempted to provide cloud deployment infrastructure for screenshot parsing and action models, specifically targeting AWS ECS/ECR deployment for OmniParser using CDKTF (Terraform via Python).
+
+**Key Takeaways & Lessons Learned**:
+- Cloud training support is critical for productivity
+- Multiple backends (Lambda Labs, Azure) enable flexibility and cost optimization
+- Infrastructure as Code (Terraform/CDK) is appropriate for cloud setup
+- State management (tracking deployment IPs, configs) is important for multi-region deployments
+- Single-provider solutions are fragile - always support multiple cloud backends
+
+**What Replaced It**:
+- `openadapt_ml/cloud/lambda_labs.py` - Lambda Labs GPU rental and management
+- `openadapt_ml/cloud/azure_inference.py` - Azure ML integration for inference
+- `openadapt_ml/benchmarks/azure.py` - Azure ML for automated WAA evaluation
+- `scripts/setup_azure.py` - Full Azure setup automation with resource management
+- Documentation: `docs/cloud_gpu_training.md`, `docs/azure_waa_setup.md`
+
+**Modern Approach**:
+The current openadapt-ml cloud infrastructure is production-ready and supports:
+- Multiple cloud providers (Lambda Labs, Azure ML, local)
+- Multiple model types (not just OmniParser)
+- Automatic cleanup and quota management
+- Tested deployment patterns with comprehensive documentation
+- Cost estimation and monitoring tools
+
+**References**:
+- Original incomplete code: https://github.com/OpenAdaptAI/OpenAdapter/tree/feat/omniparser
+- Cloud architecture docs: `docs/cloud_gpu_training.md`
+- Azure setup guide: `docs/azure_waa_setup.md`
+
+---
+
+## Notes on Repository Management
+
+**When to Archive**:
+- No active development for 3+ months
+- Incomplete/experimental code that won't be finished
+- Functionality superseded by other ecosystem components
+- Zero usage in production or by other repos
+- Single contributor with no current interest
+
+**Before Archiving**:
+1. Review code for valuable patterns or ideas
+2. Document key takeaways in this file
+3. Update references in other repositories
+4. Remove from GitHub organization profile README
+5. Add archive notice to repository description
+
+**Alternative to Archiving**:
+- Move code to `legacy/` branch in main repository
+- Keep as example/reference in documentation
+- Convert to gist or snippet if very small