Commit 9a37bb4
Vanilla WAA bootstrap automation (#10)
* feat: add openadapt-viewer dependency and adapter module (Phase 1)
Phase 1 of viewer consolidation plan: Foundation
Changes:
- Add openadapt-viewer as local file dependency in pyproject.toml
- Create openadapt_ml/training/viewer_components.py adapter module
* screenshot_with_predictions() - Screenshot with human/AI overlays
* training_metrics() - Training stats metrics grid
* playback_controls() - Playback UI controls
* correctness_badge() - Pass/fail badge component
* generate_comparison_summary() - Model comparison summary
- Add tests/test_viewer_screenshots.py with component validation tests
- Add openadapt_ml/training/viewer_migration_example.py validation example
Design:
- Zero breaking changes to existing viewer.py code
- Adapter pattern wraps openadapt-viewer with ML-specific context
- Functions accept openadapt-ml data structures
- Can be incrementally adopted in future phases
Next steps (Phase 2):
- Gradually migrate viewer.py to use these adapters
- Replace inline HTML generation with component calls
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add workflow segmentation system with capture adapter
Restored and enhanced the workflow segmentation system from commit dd9a393
with new integration for openadapt-capture format.
## What's Added
### Core Segmentation Pipeline (4 stages):
1. **Stage 1 - Frame Description (VLM)**:
- Converts screenshots + actions into semantic descriptions
- Supports Gemini, Claude, GPT-4o backends
- Automatic caching for efficiency
- File: openadapt_ml/segmentation/frame_describer.py
2. **Stage 2 - Episode Extraction (LLM)**:
- Identifies coherent workflow boundaries
- Few-shot prompting for better quality
- Confidence-based filtering
- File: openadapt_ml/segmentation/segment_extractor.py
3. **Stage 3 - Deduplication (Embeddings)**:
- Finds similar workflows across recordings
- Agglomerative clustering with cosine similarity
- Supports OpenAI or local HuggingFace embeddings
- File: openadapt_ml/segmentation/deduplicator.py
4. **Stage 4 - Annotation (VLM Quality Control)**:
- Auto-annotates episodes for training data quality
- Detects failures, boundary issues, incompleteness
- Human-in-the-loop review workflow
- File: openadapt_ml/segmentation/annotator.py
### Integration Features:
- **CaptureAdapter**: Loads recordings from openadapt-capture SQLite format
- File: openadapt_ml/segmentation/adapters/capture_adapter.py
- Automatically used when capture.db is detected
- Converts events to segmentation format
- **Unified Pipeline**: Run all stages with single API
- File: openadapt_ml/segmentation/pipeline.py
- Automatic intermediate result caching
- Resume support for interrupted runs
- **CLI Interface**: Full command-line interface for all stages
- File: openadapt_ml/segmentation/cli.py
- Commands: describe, extract, deduplicate, annotate, review, export-gold
- **Comprehensive Documentation**:
- File: openadapt_ml/segmentation/README.md
- 20+ code examples
- Complete API reference
- Integration guide
- Cost estimates and performance benchmarks
## Use Cases
1. **Training Data Curation**: Extract and filter high-quality demonstration episodes
2. **Demo Retrieval**: Build searchable libraries for demo-conditioned prompting
3. **Workflow Documentation**: Auto-generate step-by-step guides from recordings
## Data Schemas
All schemas use Pydantic for type safety (openadapt_ml/segmentation/schemas.py):
- ActionTranscript: Frame-by-frame semantic descriptions
- Episode: Coherent workflow segment with boundaries
- CanonicalEpisode: Deduplicated workflow definition
- EpisodeAnnotation: Quality assessment for training data
## Example Usage
```python
from openadapt_ml.segmentation import SegmentationPipeline, PipelineConfig
config = PipelineConfig(
vlm_model="gemini-2.0-flash",
llm_model="gpt-4o",
similarity_threshold=0.85
)
pipeline = SegmentationPipeline(config)
result = pipeline.run(
recordings=["/path/to/recording1", "/path/to/recording2"],
output_dir="workflow_library"
)
print(f"Found {result.unique_episodes} unique workflows")
```
## Next Steps
See openadapt_ml/segmentation/README.md for:
- P0: Integration tests with real openadapt-capture recordings
- P0: Visualization generator for segment boundaries
- P1: Improved prompt engineering and cost optimization
- P2: Active learning and multi-modal features
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Enhance vm monitor command with comprehensive VM usage visibility
Features added:
- Azure ML job tracking: Shows recent jobs from last 7 days with status
- Cost tracking: Real-time uptime, hourly rate, and cost estimation
- VM activity detection: Identifies what VM is currently doing
- Evaluation history: Past benchmark runs and success rates (--details flag)
- Enhanced UI: Structured dashboard with clear sections and icons
New utility functions in vm_monitor.py:
- fetch_azure_ml_jobs(): Fetch recent Azure ML jobs with filtering
- calculate_vm_costs(): Calculate VM costs with hourly/daily/weekly rates
- get_vm_uptime_hours(): Get VM uptime from Azure activity logs
- detect_vm_activity(): Detect current VM activity (idle, running, setup)
- get_evaluation_history(): Load past evaluation runs from results dir
CLI enhancements:
- Added --details flag for extended information
- Improved output formatting with sections and separators
- Better error handling and status icons
- Preserved existing SSH tunnel and dashboard functionality
Documentation:
- Updated CLAUDE.md with new features and usage examples
- Added detailed docstrings to all new functions
This consolidates VM monitoring into a single enhanced command rather than
creating duplicate dashboards, following the viewer consolidation strategy.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Refactor segmentation pipeline to use screen.frame events
Update CaptureAdapter to work with actual openadapt-capture database
format. Key changes:
- Use screen.frame events instead of generic event types
- Pair action events (mouse.down + mouse.up → single click)
- Map frame events to screenshots via timestamp matching
- Update event type filtering to match openadapt-capture schema
- Improve frame-to-action association logic
This enables the segmentation pipeline to process real capture recordings
from openadapt-capture instead of requiring simulated data.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Add VM monitoring dashboard with comprehensive usage visibility
Enhance vm monitor command to provide complete VM usage tracking:
- Real-time VM status (size, IP, power state)
- Activity detection (idle, benchmark running, setup)
- Cost tracking (uptime hours, hourly rate, total cost)
- Azure ML jobs list (last 7 days with status)
- Evaluation history (with --details flag)
- Mock mode for testing without VM (--mock flag)
Add new API endpoints to local.py dashboard server:
- /api/benchmark/status - current job status with ETA
- /api/benchmark/costs - cost breakdown (Azure VM, API, GPU)
- /api/benchmark/metrics - performance metrics by domain
- /api/benchmark/workers - worker status and utilization
- /api/benchmark/runs - list all benchmark runs
- /api/benchmark/tasks/{run}/{task} - task execution details
Update README with VM monitor section including screenshots and
usage examples.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Add segmentation testing documentation and test files
Add comprehensive test plan and results for workflow segmentation pipeline:
- Test plan with 8 stages from environment setup to documentation
- Test results documenting real capture processing outcomes
- Test files for CaptureAdapter and segmentation pipeline
Add VM monitor screenshot generation scripts and documentation:
- Scripts for automated dashboard screenshot generation
- Implementation plan for VM monitor screenshot feature
- Analysis of screenshot capture approaches
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Document archived OpenAdapter repository
- Archive OpenAdapter (incomplete pre-refactor cloud deployment POC)
- Document key takeaways and lessons learned
- Reference modern cloud infrastructure in openadapt-ml
- Add guidelines for when to archive repositories
OpenAdapter was an incomplete proof-of-concept from October 2024
with only 165 lines of code and no ecosystem usage. Cloud deployment
is now production-ready in openadapt_ml/cloud/ and benchmarks/azure.py.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Add search functionality to training viewer
- Add search bar to viewer controls with Ctrl+F / Cmd+F keyboard shortcut
- Implement advanced token-based search across step indices, action types, and text
- Search filters step list in real-time with result count display
- Clear button and Escape key support for resetting search
- Consistent UI styling with existing viewer components
- Integrates with existing step list filtering
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: resolve ruff linting and formatting issues
* fix: resolve test failures from missing dependencies
- Remove non-existent openadapt_ml.shared_ui import from viewer.py
- Skip anthropic test when anthropic package not installed (optional dependency)
- Skip viewer_components test when openadapt-viewer not installed (optional dependency)
All tests now pass (334 passed, 6 skipped).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat(dashboard): add Azure ops dashboard with live VNC embed
- Add azure_ops_tracker.py for real-time status tracking via SSE
- Add azure_ops_viewer.py with live VNC iframe embed
- Add /api/azure-ops-status and /api/azure-ops-sse endpoints
- Add progress bar, cost tracking, elapsed time display
- Add copy logs button and auto-scroll controls
feat(cli): add new VM management commands
- Add vm start-windows command
- Add vm restart-windows command
- Add vm check-build command
- Add vm screenshot command for capturing dashboards
- Fix container restart to always use --cap-add NET_ADMIN
feat(infra): add screenshot capture infrastructure
- Add capture_screenshots.py script
- Configure BuildKit GC with 30GB limit
- Fix Dockerfile OEM path and networking
docs: add Azure dashboard spec and update CLI documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: auto-cleanup before Docker builds, use VERSION=11 for unattended install
- Add automatic Docker build cache cleanup before waa-auto builds
- Fix all VERSION=11e → VERSION=11 for fully unattended Windows install
(Enterprise Evaluation shows edition picker dialog; Pro does not)
- Update CLAUDE.md documentation with disk space management solution
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(waa): use VERSION=11e consistently to prevent product key prompt
Root cause: CLI used VERSION=11 but Dockerfile uses VERSION=11e.
This caused XML patches (applied for 11e) to be ignored at runtime.
Enterprise Eval (11e) has built-in GVLK key - never prompts for product key.
Fixes: openadapt-evals-b3l
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: fix inverted VERSION documentation in CLAUDE.md
VERSION=11e (Enterprise Eval) has built-in GVLK - never prompts.
VERSION=11 (Pro) may prompt for product key.
Previous documentation was backwards, causing confusion.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: automate vanilla WAA bootstrap
* docs: clarify unattended WAA bootstrap
* fix(waa): don't replace dockurr/windows autounattend.xml
The previous approach copied windowsarena's autounattend.xml over
dockurr/windows's version, which broke the OOBE flow.
Changes:
- Remove COPY commands that replaced the base image's XML files
- Add conditional sed patch that only adds InstallFrom element if needed
- Reorder Dockerfile to install Python deps before running python3 commands
- Add clear comments explaining the OEM mechanism
This fixes Windows installation failures where the OOBE would hang
or show incorrect dialogs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(cli): remove deprecated WAA handlers, add auto-cleanup
Major cleanup of benchmarks CLI:
Removed deprecated handlers (~1200 lines):
- setup-waa: Replaced by top-level 'waa' command
- run-waa: Replaced by top-level 'waa' command
- prepare-windows: Replaced by top-level 'waa' command
- waa-native: Replaced by scripts/waa_bootstrap_local.sh
Added features:
- cleanup_waa_resources(): Auto-cleanup leftover Azure resources
(NICs, VNETs, NSGs, PublicIPs, disks) before VM creation
- Updated default VM size to Standard_D8ds_v5 (300GB temp storage)
- Updated help text with temp storage sizes for each VM option
- Added deprecation notice to legacy viewer command
The cleanup function prevents "resource already exists" errors when
previous VM deletion was incomplete.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add CLI-first and VM workflow guidelines to CLAUDE.md
Added comprehensive guidelines for Claude Code sessions:
CLI-First Rule:
- Never use raw az/ssh commands that require permission
- Always use or extend the CLI for VM operations
- Example pattern for adding new CLI functionality
Standard VM Configuration Workflow:
- Delete VM, update code, recreate (vs. trying to resize)
- Current VM defaults (D8ds_v5, eastus, Ubuntu 22.04)
This reduces friction by documenting the pre-approved command
patterns and standard operating procedures.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: fix markdown formatting in waa_vanilla_automation.md
- Close unclosed code block (lines 33-41)
- Remove hardcoded absolute path, use relative description
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(waa): unattended Windows installation now works
Key fixes to waa_deploy/Dockerfile:
- Don't replace dockurr/windows autounattend.xml, only patch with InstallFrom
element to prevent "Select the operating system" dialog
- Use sed instead of python3 for run.py patching (Python installed later)
- Fix entrypoint: use /run/entry.sh instead of non-existent /copy-oem.sh
This enables fully automated Windows 11 Enterprise Eval installation with
VERSION=11e, no manual intervention required. WAA server starts automatically
via FirstLogonCommands.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(waa): fix unattended installation and benchmark execution
Key fixes:
- Dockerfile: Don't replace autounattend.xml, only patch it with InstallFrom element
(preserves dockurr/windows OEM mechanism, prevents "Select OS" dialog)
- CLI: Run benchmark inside container with `docker exec -w /client`
- CLI: Use valid som_origin "oss" instead of invalid "omniparser"
- CLI: Fix VNC URLs to use localhost (SSH tunnel) instead of public IP
- CLI: Add SSH retry logic with exponential backoff
- CLI: Add ConnectTimeout to SSH options for faster failure detection
The WAA benchmark now runs successfully with the navi agent.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(dashboard): use Azure CLI for live VM IP, fix activity detection
- Fetch VM IP from Azure CLI at runtime instead of stale registry file
- Fix activity detection to use localhost:5000 (Docker port forward)
- Change SSH tunnel to forward localhost:5001 -> VM:5000
- Update CLAUDE.md with comprehensive WAA workflow documentation
- Add API key auto-loading note (loaded from .env via config.py)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent a6af99c commit 9a37bb4
35 files changed
Lines changed: 9051 additions & 1866 deletions
File tree
- deprecated
- docs
- waa_deploy
- docs
- screenshots
- openadapt_ml
- benchmarks
- waa_deploy
- cloud
- scripts
- training
- scripts
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
827 | 827 | | |
828 | 828 | | |
829 | 829 | | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
830 | 869 | | |
831 | 870 | | |
832 | 871 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
0 commit comments