Skip to content

fix: prefer multilevel demo files over plain .txt in eval scripts#103

Merged
abrichr merged 1 commit into
mainfrom
fix/prefer-multilevel-demo
Mar 4, 2026
Merged

fix: prefer multilevel demo files over plain .txt in eval scripts#103
abrichr merged 1 commit into
mainfrom
fix/prefer-multilevel-demo

Conversation

@abrichr

@abrichr abrichr commented Mar 4, 2026

Copy link
Copy Markdown
Member

Summary

  • When both {task_id}_multilevel.txt and {task_id}.txt exist, all demo lookup paths now prefer the multilevel (Option D) format
  • Falls back to plain .txt, then .json for backwards compatibility
  • Fixes all 4 locations: run_dc_eval.py, run_eval_pipeline.py, cli.py (_suite_find_demo), comparison_viewer.py

Context

The rigid plain-text demo format caused DC agents to abandon tasks when UI state didn't match descriptions. The multilevel format (PLAN + {Think, Action, Expect} + "adapt if needed") avoids this.

Test plan

  • With both {id}.txt and {id}_multilevel.txt present, multilevel file is selected
  • With only {id}.txt present, falls back correctly
  • With only {id}.json present, falls back correctly

🤖 Generated with Claude Code

When both {task_id}_multilevel.txt and {task_id}.txt exist in the demo
directory, all demo file lookup paths now prefer the multilevel (Option D)
format. Falls back to plain .txt, then .json for backwards compatibility.

Files changed:
- scripts/run_dc_eval.py
- scripts/run_eval_pipeline.py
- openadapt_evals/benchmarks/cli.py (_suite_find_demo)
- openadapt_evals/benchmarks/comparison_viewer.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr merged commit eb9bc3e into main Mar 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant