Skip to content

Commit fa1a9c4

Browse files
abrichrclaude
andauthored
fix: switch distillation collection to WAADirect for reliable task setup (#194)
Replace RLEnvironment + WAALiveAdapter with WAADirect in the distillation data collection script. The adapter layer fails on custom YAML task IDs and doesn't reset the environment properly. Key changes: - Load task configs from --task-dir (YAML/JSON files) via TaskConfig.from_dir() - Use WAADirect.setup_task(task_config.to_waa_config()) for environment reset - Use WAADirect.screenshot() and execute_action() instead of env.step() - Evaluate via evaluate_milestones_screenshot() on fresh post-episode screenshot - Fix Anthropic API call: always use max_tokens (not max_completion_tokens) - Add --eval-model flag for milestone VLM evaluation model - Add --task-dir as required arg (replaces server-side task discovery) Kept unchanged: TeacherAgent, PlannerTrajectoryLogger (keep_failed=True), CostTracker, resume support, graceful shutdown handling. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 7c402ca commit fa1a9c4

2 files changed

Lines changed: 185 additions & 204 deletions

File tree

CLAUDE.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -325,45 +325,52 @@ Two-step workflow: collect expert trajectories from a frontier teacher model, th
325325

326326
### Step 1: Collect Teacher Trajectories (`scripts/collect_distillation_data.py`)
327327

328-
Runs a frontier model (GPT-5.4, Claude, etc.) as a unified desktop agent on WAA tasks, saving every successful trajectory as SFT training data. Failed episodes are automatically discarded.
328+
Runs a frontier model (GPT-5.4, Claude, etc.) as a unified desktop agent on WAA tasks, saving every trajectory as SFT training data. Uses WAADirect for reliable task setup instead of the adapter layer. Tasks are loaded from local YAML/JSON files via `--task-dir`.
329329

330330
```bash
331331
# Collect from GPT-5.4 (default teacher)
332332
python scripts/collect_distillation_data.py \
333+
--task-dir tasks/ \
333334
--server-url http://localhost:5001
334335

335336
# Collect from Claude with cost-limited testing
336337
python scripts/collect_distillation_data.py \
338+
--task-dir tasks/ \
337339
--model claude-sonnet-4-6-20260210 \
338340
--provider anthropic \
339341
--max-tasks 5 \
340342
--server-url http://localhost:5001
341343

342-
# Specific tasks
344+
# Specific tasks from task-dir
343345
python scripts/collect_distillation_data.py \
344-
--tasks TASK_UUID_1,TASK_UUID_2 \
346+
--task-dir tasks/ \
347+
--tasks change-font-arial,open-notepad \
345348
--server-url http://localhost:5001
346349

347350
# Resume previous collection
348351
python scripts/collect_distillation_data.py \
352+
--task-dir tasks/ \
349353
--server-url http://localhost:5001 \
350354
--output-dir distillation_data/gpt54_run1 \
351355
--resume
352356

353357
# Dry run (list tasks, estimate cost)
354358
python scripts/collect_distillation_data.py \
359+
--task-dir tasks/ \
355360
--dry-run --server-url http://localhost:5001
356361
```
357362

358363
| Flag | Default | Description |
359364
|------|---------|-------------|
365+
| `--task-dir` | (required) | Directory of task YAML/JSON configs |
360366
| `--model` | `gpt-5.4` | Teacher model API ID |
361367
| `--provider` | `openai` | `openai` or `anthropic` |
362-
| `--tasks` | all from server | Comma-separated task IDs |
368+
| `--tasks` | all from task-dir | Comma-separated task IDs to filter |
363369
| `--max-tasks` | unlimited | Limit tasks (for cost control) |
364370
| `--server-url` | `http://localhost:5001` | WAA server URL |
365371
| `--output-dir` | `distillation_data/` | Output directory |
366372
| `--max-steps` | 15 | Steps per episode |
373+
| `--eval-model` | `gpt-4.1-mini` | VLM for milestone evaluation |
367374
| `--resume` | off | Skip tasks with existing data |
368375
| `--dry-run` | off | List tasks without running |
369376

0 commit comments

Comments
 (0)