feat(review): store full AI review data in metadata#3161
feat(review): store full AI review data in metadata#3161MarkusNeusinger merged 1 commit intomainfrom
Conversation
Extended metadata YAML files to store complete AI review data: - image_description: AI's visual description of the generated plot - criteria_checklist: Detailed per-criterion scoring breakdown - verdict: APPROVED/REJECTED status Changes: - Update impl-review.yml to extract and save extended review data - Update impl-generate.yml and impl-repair.yml to use extended data for regeneration - Add database fields for review_image_description, review_criteria_checklist, review_verdict - Create Alembic migration for new columns - Update sync_to_postgres.py to sync new fields - Add backfill script for existing PRs (1417 metadata files updated) - Update CLAUDE.md and repository.md documentation The extended review data helps AI understand exactly which criteria failed during regeneration, enabling more targeted fixes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends metadata YAML files to store complete AI review data (image descriptions, detailed criteria checklists, and verdicts) to enable more targeted code regeneration when quality checks fail.
Key changes:
- Adds three new fields to metadata files:
image_description,criteria_checklist, andverdict - Modifies workflows to extract and utilize extended review data
- Adds database schema support for the new fields
- Includes a backfill script that successfully updated 1417 existing metadata files
Reviewed changes
Copilot reviewed 123 out of 1429 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| plots/bar-grouped/metadata/altair.yaml | Added extended review data with detailed criteria breakdown |
| plots/bar-feature-importance/metadata/*.yaml | Added complete AI review metadata across multiple libraries |
| plots/bar-error/metadata/*.yaml | Added detailed quality criteria and verdict data |
| plots/bar-diverging/metadata/*.yaml | Populated extended review fields for diverging bar charts |
| plots/bar-categorical/metadata/*.yaml | Added structured review data for categorical plots |
| plots/bar-basic/metadata/*.yaml | Extended metadata with review criteria and verdicts |
| plots/bar-3d/metadata/plotly.yaml | Added 3D plot review metadata |
| max: 3 | ||
| passed: false | ||
| comment: No random seed, but data is deterministic (hardcoded), so this is | ||
| actually fine. Full points. | ||
| - id: CQ-02 | ||
| name: Reproducibility | ||
| score: 3 | ||
| max: 3 | ||
| passed: true | ||
| comment: Data is hardcoded/deterministic. |
There was a problem hiding this comment.
Duplicate CQ-02 entries exist with conflicting scores (0 vs 3) and passed states (false vs true). The first entry should be removed as it contradicts the second one.
| strengths: [] | ||
| weaknesses: [] | ||
| improvements: [] | ||
| verdict: APPROVED |
There was a problem hiding this comment.
This file only adds the verdict field without the other extended review data (image_description and criteria_checklist). This creates inconsistency with other metadata files that were backfilled with all three fields.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 123 out of 1429 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
plots/bar-diverging/metadata/pygal.yaml:1
- Inconsistent passed status for CQ-05. The criterion shows passed: true and score: 1/1, but the comment indicates saving both plot.png and plot.html, which based on other similar entries in this PR, typically results in passed: false due to generating extra output files.
library: pygal
| score: 0 | ||
| max: 3 | ||
| passed: false | ||
| comment: No random seed, but data is deterministic (hardcoded), so this is | ||
| actually fine. Full points. | ||
| - id: CQ-02 | ||
| name: Reproducibility |
There was a problem hiding this comment.
Duplicate CQ-02 (Reproducibility) criterion entries with conflicting scores (0 vs 3). The first entry at line 165-170 has score 0 with passed: false, while the second at lines 171-175 has score 3 with passed: true. One of these duplicate entries should be removed.
| score: 0 | |
| max: 3 | |
| passed: false | |
| comment: No random seed, but data is deterministic (hardcoded), so this is | |
| actually fine. Full points. | |
| - id: CQ-02 | |
| name: Reproducibility |
| strengths: [] | ||
| weaknesses: [] | ||
| improvements: [] | ||
| verdict: APPROVED |
There was a problem hiding this comment.
This file only contains a verdict field added to the review section, missing all other extended review data fields (image_description, criteria_checklist, strengths, weaknesses) that are present in other metadata files. This inconsistency could cause issues with downstream components expecting the full review structure.
Summary
Closes #2845
This PR extends the metadata YAML files to store complete AI review data, enabling more targeted fixes during regeneration.
New Fields in metadata/*.yaml
Changes
Backfill Results
Frontend Changes (separate feature)
Also includes some Plausible analytics improvements from the user.
Test Plan
uv run pytestto ensure no regressions🤖 Generated with Claude Code