Skip to content

feat(review): store full AI review data in metadata#3161

Merged
MarkusNeusinger merged 1 commit intomainfrom
feature/2845-full-review-metadata
Jan 2, 2026
Merged

feat(review): store full AI review data in metadata#3161
MarkusNeusinger merged 1 commit intomainfrom
feature/2845-full-review-metadata

Conversation

@MarkusNeusinger
Copy link
Copy Markdown
Owner

Summary

Closes #2845

This PR extends the metadata YAML files to store complete AI review data, enabling more targeted fixes during regeneration.

New Fields in metadata/*.yaml

review:
  image_description: |
    The plot shows a scatter plot with 100 data points...
  criteria_checklist:
    visual_quality:
      score: 36
      max: 40
      items:
        - id: VQ-01
          name: Text Legibility
          score: 10
          max: 10
          passed: true
          comment: "All text readable at full size"
    spec_compliance: ...
    data_quality: ...
    code_quality: ...
    library_features: ...
  verdict: APPROVED
  strengths: [...]
  weaknesses: [...]

Changes

  • impl-review.yml: Extract and save extended review data (image_description, criteria_checklist, verdict)
  • impl-generate.yml: Use extended data for regeneration - AI can see exactly which criteria failed
  • impl-repair.yml: Pass full review data to Claude for targeted fixes
  • Database: Add 3 new fields (review_image_description, review_criteria_checklist, review_verdict)
  • Migration: Alembic migration for new columns
  • sync_to_postgres.py: Sync extended review data to database
  • backfill_review_metadata.py: Backfill script for existing PRs
  • Documentation: Updated CLAUDE.md and repository.md

Backfill Results

  • 1417 metadata files updated with extended review data
  • 59 files skipped (no extended format in PR comments)
  • 0 errors

Frontend Changes (separate feature)

Also includes some Plausible analytics improvements from the user.

Test Plan

  • Run backfill script on existing PRs
  • Verify YAML structure is correct
  • Run uv run pytest to ensure no regressions
  • Run Alembic migration on staging database
  • Verify sync_to_postgres.py works with new fields

🤖 Generated with Claude Code

Extended metadata YAML files to store complete AI review data:
- image_description: AI's visual description of the generated plot
- criteria_checklist: Detailed per-criterion scoring breakdown
- verdict: APPROVED/REJECTED status

Changes:
- Update impl-review.yml to extract and save extended review data
- Update impl-generate.yml and impl-repair.yml to use extended data for regeneration
- Add database fields for review_image_description, review_criteria_checklist, review_verdict
- Create Alembic migration for new columns
- Update sync_to_postgres.py to sync new fields
- Add backfill script for existing PRs (1417 metadata files updated)
- Update CLAUDE.md and repository.md documentation

The extended review data helps AI understand exactly which criteria failed during
regeneration, enabling more targeted fixes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 1, 2026 23:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends metadata YAML files to store complete AI review data (image descriptions, detailed criteria checklists, and verdicts) to enable more targeted code regeneration when quality checks fail.

Key changes:

  • Adds three new fields to metadata files: image_description, criteria_checklist, and verdict
  • Modifies workflows to extract and utilize extended review data
  • Adds database schema support for the new fields
  • Includes a backfill script that successfully updated 1417 existing metadata files

Reviewed changes

Copilot reviewed 123 out of 1429 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
plots/bar-grouped/metadata/altair.yaml Added extended review data with detailed criteria breakdown
plots/bar-feature-importance/metadata/*.yaml Added complete AI review metadata across multiple libraries
plots/bar-error/metadata/*.yaml Added detailed quality criteria and verdict data
plots/bar-diverging/metadata/*.yaml Populated extended review fields for diverging bar charts
plots/bar-categorical/metadata/*.yaml Added structured review data for categorical plots
plots/bar-basic/metadata/*.yaml Extended metadata with review criteria and verdicts
plots/bar-3d/metadata/plotly.yaml Added 3D plot review metadata

Comment on lines +166 to +175
max: 3
passed: false
comment: No random seed, but data is deterministic (hardcoded), so this is
actually fine. Full points.
- id: CQ-02
name: Reproducibility
score: 3
max: 3
passed: true
comment: Data is hardcoded/deterministic.
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate CQ-02 entries exist with conflicting scores (0 vs 3) and passed states (false vs true). The first entry should be removed as it contradicts the second one.

Copilot uses AI. Check for mistakes.
strengths: []
weaknesses: []
improvements: []
verdict: APPROVED
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file only adds the verdict field without the other extended review data (image_description and criteria_checklist). This creates inconsistency with other metadata files that were backfilled with all three fields.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 123 out of 1429 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

plots/bar-diverging/metadata/pygal.yaml:1

  • Inconsistent passed status for CQ-05. The criterion shows passed: true and score: 1/1, but the comment indicates saving both plot.png and plot.html, which based on other similar entries in this PR, typically results in passed: false due to generating extra output files.
library: pygal

Comment on lines +165 to +171
score: 0
max: 3
passed: false
comment: No random seed, but data is deterministic (hardcoded), so this is
actually fine. Full points.
- id: CQ-02
name: Reproducibility
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate CQ-02 (Reproducibility) criterion entries with conflicting scores (0 vs 3). The first entry at line 165-170 has score 0 with passed: false, while the second at lines 171-175 has score 3 with passed: true. One of these duplicate entries should be removed.

Suggested change
score: 0
max: 3
passed: false
comment: No random seed, but data is deterministic (hardcoded), so this is
actually fine. Full points.
- id: CQ-02
name: Reproducibility

Copilot uses AI. Check for mistakes.
strengths: []
weaknesses: []
improvements: []
verdict: APPROVED
Copy link

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file only contains a verdict field added to the review section, missing all other extended review data fields (image_description, criteria_checklist, strengths, weaknesses) that are present in other metadata files. This inconsistency could cause issues with downstream components expecting the full review structure.

Copilot uses AI. Check for mistakes.
@MarkusNeusinger MarkusNeusinger merged commit af3e677 into main Jan 2, 2026
8 of 10 checks passed
@MarkusNeusinger MarkusNeusinger deleted the feature/2845-full-review-metadata branch January 2, 2026 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Store full AI review data in metadata (checklist, image description)

2 participants