Skip to content

Commit 2d57d02

Browse files
abrichrclaude
andauthored
feat: remove evaluation infrastructure (moved to openadapt-evals) (#25)
* feat: remove evaluation infrastructure (moved to openadapt-evals) All evaluation infrastructure (~13,000 lines) has been migrated to openadapt-evals (PR #29). This PR removes the now-redundant code from openadapt-ml, making it a pure ML package. Deleted files: - benchmarks/cli.py (8,503 lines - VM/pool CLI) - benchmarks/azure_vm.py (AzureVMManager) - benchmarks/pool.py (PoolManager) - benchmarks/vm_monitor.py, azure_ops_tracker.py, resource_tracker.py - benchmarks/azure.py, viewer.py, pool_viewer.py, trace_export.py - benchmarks/waa_deploy/ (Docker agent deployment) - tests/test_quota_auto_detection.py, test_demo_persistence.py - tests/benchmarks/test_api_agent.py, test_waa.py Updated: - benchmarks/__init__.py: Only exports ML agents (PolicyAgent, etc.) - pyproject.toml: Removed azure-ai-ml, azureml-core, azure-mgmt-* - CLAUDE.md: Removed CLI/VM/pool docs, added migration guide Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale references to migrated benchmark modules Update all remaining references to deleted benchmark modules across source code, scripts, and tests: - cloud/local.py: azure_ops_tracker, session_tracker, CLI subprocess calls - scripts/: p0/p1 validation scripts, screenshot generators, quota checker - training/benchmark_viewer.py: HTML template CLI references - experiments/waa_demo/runner.py: docstring and print references - deprecated/waa_deploy/__init__.py: import path All now point to openadapt_evals equivalents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c3ea5ce commit 2d57d02

36 files changed

Lines changed: 99 additions & 18488 deletions

CLAUDE.md

Lines changed: 52 additions & 372 deletions
Large diffs are not rendered by default.

deprecated/waa_deploy/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@
55
- Dockerfile: Custom waa-auto Docker image
66
"""
77

8-
from openadapt_ml.benchmarks.waa_deploy.api_agent import ApiAgent
8+
from openadapt_evals.waa_deploy.api_agent import ApiAgent
99

1010
__all__ = ["ApiAgent"]
Lines changed: 12 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,27 @@
1-
"""Benchmark integration for openadapt-ml.
1+
"""ML-specific agents for benchmark evaluation.
22
3-
This module provides:
3+
This module provides agents that wrap openadapt-ml ML components
4+
(VLM adapters, policies, baselines) for benchmark evaluation.
45
5-
1. ML-specific agents for benchmark evaluation (PolicyAgent, APIBenchmarkAgent, etc.)
6-
2. Azure VM management with clean Python API (AzureVMManager)
7-
3. Pool management for parallel WAA evaluation (PoolManager)
8-
9-
For benchmark infrastructure (adapters, runners, viewers), use openadapt-evals:
6+
For evaluation infrastructure (VM management, pool orchestration, CLI,
7+
adapters, runners, viewers), use openadapt-evals:
108
```python
119
from openadapt_evals import (
1210
WAAMockAdapter,
1311
WAALiveAdapter,
1412
evaluate_agent_on_benchmark,
1513
)
14+
# VM/pool management CLI:
15+
# oa-vm pool-create --workers 4
16+
# oa-vm pool-run --tasks 10
1617
```
1718
18-
Library usage (programmatic, no CLI):
19+
ML agent usage:
1920
```python
20-
from openadapt_ml.benchmarks import PoolManager, AzureVMManager
21+
from openadapt_ml.benchmarks import PolicyAgent, APIBenchmarkAgent
2122
22-
vm = AzureVMManager(resource_group="my-rg")
23-
manager = PoolManager(vm_manager=vm)
24-
pool = manager.create(workers=4)
25-
manager.wait()
26-
result = manager.run(tasks=10)
27-
manager.cleanup(confirm=False)
23+
agent = APIBenchmarkAgent(provider="anthropic")
24+
agent = PolicyAgent(policy)
2825
```
2926
"""
3027

@@ -33,14 +30,9 @@
3330
PolicyAgent,
3431
UnifiedBaselineAgent,
3532
)
36-
from openadapt_ml.benchmarks.azure_vm import AzureVMManager
37-
from openadapt_ml.benchmarks.pool import PoolManager, PoolRunResult
3833

3934
__all__ = [
4035
"PolicyAgent",
4136
"APIBenchmarkAgent",
4237
"UnifiedBaselineAgent",
43-
"AzureVMManager",
44-
"PoolManager",
45-
"PoolRunResult",
4638
]

0 commit comments

Comments
 (0)