Skip to content

feat: migrate annotation pipeline from openadapt-ml to openadapt-evals#64

Merged
abrichr merged 3 commits into
mainfrom
feat/migrate-annotation-pipeline
Mar 2, 2026
Merged

feat: migrate annotation pipeline from openadapt-ml to openadapt-evals#64
abrichr merged 3 commits into
mainfrom
feat/migrate-annotation-pipeline

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 2, 2026

Summary

  • New openadapt_evals/vlm.py: Shared VLM call module consolidating 3 separate implementations into one vlm_call() function supporting consilium council, OpenAI, and Anthropic providers. Also includes extract_json() for robust LLM output parsing and image_bytes_from_path() helper.
  • New openadapt_evals/annotation.py: Migrated annotation data classes (AnnotatedStep, AnnotatedDemo), prompts (ANNOTATION_SYSTEM_PROMPT, ANNOTATION_STEP_PROMPT), and utilities (parse_annotation_response, validate_annotations, format_annotated_demo) from openadapt_ml.experiments.demo_prompt.annotate.
  • Updated scripts/record_waa_demos.py: cmd_annotate_waa() now imports from openadapt_evals instead of openadapt_ml, removing PIL and provider abstraction dependencies.
  • Updated scripts/refine_demo.py: Replaced local _vlm_call(), _extract_json(), _encode_png_b64(), _image_content_block() with shared module imports. Refactored message builders to return (prompt, images) tuples.
  • Updated scripts/convert_recording_to_demo.py: Replaced local _vlm_call(), _vlm_call_openai(), _vlm_call_anthropic(), _encode_image() with shared module imports.
  • 16 new tests in tests/test_annotation.py covering data classes, JSON roundtrip, parsing, formatting, and validation.

Closes #59

Test plan

  • uv run pytest tests/test_annotation.py -v — 16 new tests pass
  • uv run pytest tests/test_vlm_call.py -v — 10 existing tests pass
  • uv run pytest tests/ -v — 569/576 pass (7 pre-existing failures unrelated to this change)
  • Import verification: from openadapt_evals.annotation import AnnotatedDemo works
  • Import verification: from openadapt_evals.vlm import vlm_call works
  • Manual: uv run python scripts/convert_recording_to_demo.py --recordings waa_recordings --output /tmp/test_demos --mode text (text mode, no VLM needed)

🤖 Generated with Claude Code

abrichr and others added 3 commits March 2, 2026 02:10
Move annotation data classes, prompts, and utilities into
openadapt_evals.annotation and consolidate three separate VLM call
implementations into a shared openadapt_evals.vlm module.

- New openadapt_evals/vlm.py: unified vlm_call() supporting consilium
  council, OpenAI, and Anthropic; extract_json() for LLM output parsing;
  image_bytes_from_path() helper
- New openadapt_evals/annotation.py: AnnotatedStep/AnnotatedDemo data
  classes, ANNOTATION_SYSTEM_PROMPT/ANNOTATION_STEP_PROMPT constants,
  parse_annotation_response(), validate_annotations(),
  format_annotated_demo()
- Updated scripts/record_waa_demos.py cmd_annotate_waa() to import from
  openadapt_evals instead of openadapt_ml
- Updated scripts/refine_demo.py to use shared vlm_call/extract_json,
  refactored message builders to prompt+images interface
- Updated scripts/convert_recording_to_demo.py to use shared vlm_call
- 16 new tests in tests/test_annotation.py, all existing tests pass

Closes #59

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ding_to_demo

- Remove unused `import os` from openadapt_evals/vlm.py
- Move `resolved_model` computation before the for-loop in convert_vlm()
  so it's computed once instead of redundantly inside each step's try block

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- vlm.py: add timeout=120s to OpenAI/Anthropic SDK clients to prevent
  indefinite hangs (old code had explicit timeouts via requests)
- vlm.py: pass system prompt separately to consilium council_query()
  instead of concatenating into user prompt
- refine_demo.py: explicitly pass temperature=1.0 to vlm_call() in
  holistic and per-step review to match old behavior (vlm_call defaults
  to 0.1 which would be an unintended behavioral change)
- refine_demo.py: remove dead api_key parameter from run_holistic_review,
  run_per_step_review, refine_recording, and main() — vlm_call() reads
  API keys from environment via the SDK

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr merged commit 7896051 into main Mar 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: migrate annotation pipeline from openadapt-ml into openadapt-evals

1 participant