Skip to content

Commit c5e7fc9

Browse files
feat(review): store full AI review data in metadata (#2845)
Extended metadata YAML files to store complete AI review data: - image_description: AI's visual description of the generated plot - criteria_checklist: Detailed per-criterion scoring breakdown - verdict: APPROVED/REJECTED status Changes: - Update impl-review.yml to extract and save extended review data - Update impl-generate.yml and impl-repair.yml to use extended data for regeneration - Add database fields for review_image_description, review_criteria_checklist, review_verdict - Create Alembic migration for new columns - Update sync_to_postgres.py to sync new fields - Add backfill script for existing PRs (1417 metadata files updated) - Update CLAUDE.md and repository.md documentation The extended review data helps AI understand exactly which criteria failed during regeneration, enabling more targeted fixes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent a83d66b commit c5e7fc9

File tree

1,429 files changed

+239781
-76
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,429 files changed

+239781
-76
lines changed

.github/workflows/impl-generate.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,13 @@ jobs:
271271
1. Read `plots/${{ steps.inputs.outputs.specification_id }}/metadata/${{ steps.inputs.outputs.library }}.yaml`
272272
- Look at `review.strengths` (keep these aspects!)
273273
- Look at `review.weaknesses` (fix these problems - decide HOW yourself)
274+
- Look at `review.image_description` (understand what was generated visually)
275+
- Look at `review.criteria_checklist` (see exactly which criteria failed)
276+
- Focus on categories with low scores (e.g., visual_quality.score < visual_quality.max)
277+
- Check items with `passed: false` - these need fixing
278+
- VQ-XX items for visual issues
279+
- SC-XX items for spec compliance
280+
- CQ-XX items for code quality
274281
2. Read `plots/${{ steps.inputs.outputs.specification_id }}/implementations/${{ steps.inputs.outputs.library }}.py`
275282
- Understand what was done before
276283
- Keep what worked, fix what didn't
@@ -346,6 +353,13 @@ jobs:
346353
1. Read `plots/${{ steps.inputs.outputs.specification_id }}/metadata/${{ steps.inputs.outputs.library }}.yaml`
347354
- Look at `review.strengths` (keep these aspects!)
348355
- Look at `review.weaknesses` (fix these problems - decide HOW yourself)
356+
- Look at `review.image_description` (understand what was generated visually)
357+
- Look at `review.criteria_checklist` (see exactly which criteria failed)
358+
- Focus on categories with low scores (e.g., visual_quality.score < visual_quality.max)
359+
- Check items with `passed: false` - these need fixing
360+
- VQ-XX items for visual issues
361+
- SC-XX items for spec compliance
362+
- CQ-XX items for code quality
349363
2. Read `plots/${{ steps.inputs.outputs.specification_id }}/implementations/${{ steps.inputs.outputs.library }}.py`
350364
- Understand what was done before
351365
- Keep what worked, fix what didn't

.github/workflows/impl-repair.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,13 @@ jobs:
127127
2. `plots/${{ inputs.specification_id }}/metadata/${{ inputs.library }}.yaml` - Look at:
128128
- `review.strengths` (keep these aspects!)
129129
- `review.weaknesses` (fix these problems - decide HOW yourself)
130+
- `review.image_description` (understand what was generated visually)
131+
- `review.criteria_checklist` (see exactly which criteria failed)
132+
- Look for items with `passed: false` - these need fixing
133+
- Focus on categories with low scores (e.g., visual_quality.score < visual_quality.max)
134+
- VQ-XX items for visual issues
135+
- SC-XX items for spec compliance
136+
- CQ-XX items for code quality
130137
131138
### Step 2: Read reference files
132139
1. `prompts/library/${{ inputs.library }}.md` - Library-specific rules
@@ -192,6 +199,13 @@ jobs:
192199
2. `plots/${{ inputs.specification_id }}/metadata/${{ inputs.library }}.yaml` - Look at:
193200
- `review.strengths` (keep these aspects!)
194201
- `review.weaknesses` (fix these problems - decide HOW yourself)
202+
- `review.image_description` (understand what was generated visually)
203+
- `review.criteria_checklist` (see exactly which criteria failed)
204+
- Look for items with `passed: false` - these need fixing
205+
- Focus on categories with low scores (e.g., visual_quality.score < visual_quality.max)
206+
- VQ-XX items for visual issues
207+
- SC-XX items for spec compliance
208+
- CQ-XX items for code quality
195209
196210
### Step 2: Read reference files
197211
1. `prompts/library/${{ inputs.library }}.md` - Library-specific rules

.github/workflows/impl-review.yml

Lines changed: 67 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -206,12 +206,39 @@ jobs:
206206
# Save structured feedback as JSON (one array per file)
207207
echo '["Strength 1", "Strength 2"]' > review_strengths.json
208208
echo '["Weakness 1"]' > review_weaknesses.json
209+
210+
# Save verdict
211+
echo "APPROVED" > review_verdict.txt # or "REJECTED"
212+
213+
# Save image description (multi-line text)
214+
cat > review_image_description.txt << 'EOF'
215+
The plot shows a scatter plot with blue markers...
216+
[Your full image description here]
217+
EOF
218+
219+
# Save criteria checklist as structured JSON
220+
cat > review_checklist.json << 'EOF'
221+
{
222+
"visual_quality": {
223+
"score": 36,
224+
"max": 40,
225+
"items": [
226+
{"id": "VQ-01", "name": "Text Legibility", "score": 10, "max": 10, "passed": true, "comment": "All text readable"},
227+
{"id": "VQ-02", "name": "No Overlap", "score": 8, "max": 8, "passed": true, "comment": "No overlapping elements"}
228+
]
229+
},
230+
"spec_compliance": {"score": 23, "max": 25, "items": [...]},
231+
"data_quality": {"score": 18, "max": 20, "items": [...]},
232+
"code_quality": {"score": 10, "max": 10, "items": [...]},
233+
"library_features": {"score": 5, "max": 5, "items": [...]}
234+
}
235+
EOF
209236
```
210237
211238
8. **DO NOT add ai-approved or ai-rejected labels** - the workflow will add them after updating metadata.
212239
213240
**IMPORTANT**: Your review MUST include the "Image Description" section. A review without an image description will be considered invalid.
214-
**IMPORTANT**: The Strengths/Weaknesses sections are saved to the metadata for future regeneration. Be specific!
241+
**IMPORTANT**: All review data (strengths, weaknesses, image_description, criteria_checklist) is saved to metadata for future regeneration. Be specific!
215242
216243
- name: Extract quality score
217244
id: score
@@ -266,21 +293,8 @@ jobs:
266293
git fetch origin "$BRANCH"
267294
git checkout -B "$BRANCH" "origin/$BRANCH"
268295
269-
# Read review feedback from JSON files (created by Claude)
270-
STRENGTHS="[]"
271-
WEAKNESSES="[]"
272-
273-
if [ -f "review_strengths.json" ]; then
274-
STRENGTHS=$(cat review_strengths.json)
275-
fi
276-
if [ -f "review_weaknesses.json" ]; then
277-
WEAKNESSES=$(cat review_weaknesses.json)
278-
fi
279-
280296
# Update metadata file with quality score, timestamp, and review feedback
281297
if [ -f "$METADATA_FILE" ]; then
282-
# Update all metadata using Python for proper YAML handling
283-
# Pass JSON via files to avoid shell escaping issues with quotes
284298
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
285299
286300
# Write Python script to temp file to avoid YAML/shell escaping issues
@@ -294,8 +308,12 @@ jobs:
294308
score = int(sys.argv[2])
295309
timestamp = sys.argv[3]
296310
311+
# Read existing review data files
297312
strengths = []
298313
weaknesses = []
314+
image_description = None
315+
criteria_checklist = None
316+
verdict = None
299317
300318
if Path('review_strengths.json').exists():
301319
try:
@@ -311,6 +329,28 @@ jobs:
311329
except:
312330
pass
313331
332+
if Path('review_image_description.txt').exists():
333+
try:
334+
with open('review_image_description.txt') as f:
335+
image_description = f.read().strip()
336+
except:
337+
pass
338+
339+
if Path('review_checklist.json').exists():
340+
try:
341+
with open('review_checklist.json') as f:
342+
criteria_checklist = json.load(f)
343+
except:
344+
pass
345+
346+
if Path('review_verdict.txt').exists():
347+
try:
348+
with open('review_verdict.txt') as f:
349+
verdict = f.read().strip()
350+
except:
351+
pass
352+
353+
# Load existing metadata
314354
with open(metadata_file, 'r') as f:
315355
data = yaml.safe_load(f)
316356
@@ -320,12 +360,24 @@ jobs:
320360
if 'review' not in data:
321361
data['review'] = {}
322362
363+
# Update review section with all fields
323364
data['review']['strengths'] = strengths
324365
data['review']['weaknesses'] = weaknesses
325366
367+
# Add extended review data (issue #2845)
368+
if image_description:
369+
data['review']['image_description'] = image_description
370+
if criteria_checklist:
371+
data['review']['criteria_checklist'] = criteria_checklist
372+
if verdict:
373+
data['review']['verdict'] = verdict
374+
326375
def str_representer(dumper, data):
327376
if isinstance(data, str) and data.endswith('Z') and 'T' in data:
328377
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style="'")
378+
# Use literal block style for multi-line strings (image_description)
379+
if isinstance(data, str) and '\n' in data:
380+
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
329381
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
330382
331383
yaml.add_representer(str, str_representer)
@@ -335,7 +387,7 @@ jobs:
335387
EOF
336388
337389
python3 /tmp/update_metadata.py "$METADATA_FILE" "$SCORE" "$TIMESTAMP"
338-
echo "::notice::Updated metadata with quality score ${SCORE} and review feedback"
390+
echo "::notice::Updated metadata with quality score ${SCORE} and extended review data"
339391
fi
340392
341393
# Update implementation header with quality score

CLAUDE.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,34 @@ quality_score: 92
308308
309309
# Review feedback (used for regeneration)
310310
review:
311+
# AI's visual description of the generated plot
312+
image_description: |
313+
The plot shows a scatter plot with 100 data points displaying
314+
a positive correlation. Points are rendered in blue with 70%
315+
opacity. Axes are clearly labeled and a subtle grid is visible.
316+
317+
# Detailed scoring breakdown by category
318+
criteria_checklist:
319+
visual_quality:
320+
score: 36
321+
max: 40
322+
items:
323+
- id: VQ-01
324+
name: Text Legibility
325+
score: 10
326+
max: 10
327+
passed: true
328+
comment: "All text readable at full size"
329+
spec_compliance:
330+
score: 23
331+
max: 25
332+
items: [...]
333+
# ... data_quality, code_quality, library_features
334+
335+
# Final verdict
336+
verdict: APPROVED
337+
338+
# Summary feedback
311339
strengths:
312340
- "Clean code structure"
313341
- "Good use of alpha for overlapping points"
@@ -329,6 +357,7 @@ Quality: 92/100 | Created: 2025-01-10
329357
- Spec-level tracking in `specification.yaml`: `created`, `updated`, `issue`, `suggested`, `tags`
330358
- Per-library metadata in separate files (no merge conflicts!)
331359
- **Review feedback** stored in metadata for regeneration (AI reads previous feedback to improve)
360+
- **Extended review data**: `image_description`, `criteria_checklist`, and `verdict` for targeted fixes
332361
- Contributors credited via `suggested` field
333362
- Tags are at spec level (same for all libraries)
334363
- Per-library metadata updated automatically by `impl-review.yml` (quality score, review feedback)
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
"""add_extended_review_fields
2+
3+
Add extended review data fields to impls table for issue #2845:
4+
- review_image_description: AI's visual description of the plot
5+
- review_criteria_checklist: Detailed per-criterion scoring breakdown
6+
- review_verdict: "APPROVED" or "REJECTED"
7+
8+
Revision ID: 6345896e2e90
9+
Revises: d0c76553a5cc
10+
Create Date: 2026-01-01
11+
12+
"""
13+
14+
from typing import Sequence, Union
15+
16+
import sqlalchemy as sa
17+
from sqlalchemy.dialects import postgresql
18+
19+
from alembic import op
20+
21+
22+
# revision identifiers, used by Alembic.
23+
revision: str = "6345896e2e90"
24+
down_revision: Union[str, None] = "d0c76553a5cc"
25+
branch_labels: Union[str, Sequence[str], None] = None
26+
depends_on: Union[str, Sequence[str], None] = None
27+
28+
29+
def upgrade() -> None:
30+
"""Add extended review data columns to impls table."""
31+
# Add review_image_description (text field for AI's visual description)
32+
op.add_column("impls", sa.Column("review_image_description", sa.Text(), nullable=True))
33+
34+
# Add review_criteria_checklist (JSONB for detailed scoring breakdown)
35+
op.add_column("impls", sa.Column("review_criteria_checklist", postgresql.JSONB(), nullable=True))
36+
37+
# Add review_verdict (short string: "APPROVED" or "REJECTED")
38+
op.add_column("impls", sa.Column("review_verdict", sa.String(20), nullable=True))
39+
40+
41+
def downgrade() -> None:
42+
"""Remove extended review data columns from impls table."""
43+
op.drop_column("impls", "review_verdict")
44+
op.drop_column("impls", "review_criteria_checklist")
45+
op.drop_column("impls", "review_image_description")

app/src/components/FilterBar.tsx

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,7 @@ export function FilterBar({
139139
const handleValueSelect = useCallback(
140140
(category: FilterCategory, value: string) => {
141141
onAddFilter(category, value);
142-
onTrackEvent('filter_add', { category, value });
143-
// Track search if query was used
142+
// Track search if query was used (filter changes tracked via pageview)
144143
if (searchQuery.trim()) {
145144
onTrackEvent('search', { query: searchQuery.trim(), category });
146145
}
@@ -169,46 +168,62 @@ export function FilterBar({
169168
const handleRemoveValue = useCallback(
170169
(value: string) => {
171170
if (activeGroupIndex !== null) {
172-
const group = activeFilters[activeGroupIndex];
173171
onRemoveFilter(activeGroupIndex, value);
174-
onTrackEvent('filter_remove', { category: group?.category || '', value });
175172
}
176173
setChipMenuAnchor(null);
177174
setActiveGroupIndex(null);
178175
},
179-
[activeGroupIndex, activeFilters, onRemoveFilter, onTrackEvent]
176+
[activeGroupIndex, onRemoveFilter]
180177
);
181178

182179
// Remove entire group
183180
const handleRemoveGroup = useCallback(() => {
184181
if (activeGroupIndex !== null) {
185-
const group = activeFilters[activeGroupIndex];
186182
onRemoveGroup(activeGroupIndex);
187-
onTrackEvent('filter_remove_group', { category: group?.category || '' });
188183
}
189184
setChipMenuAnchor(null);
190185
setActiveGroupIndex(null);
191-
}, [activeGroupIndex, activeFilters, onRemoveGroup, onTrackEvent]);
186+
}, [activeGroupIndex, onRemoveGroup]);
192187

193188
// Add value to existing group (OR)
194189
const handleAddValueToExistingGroup = useCallback(
195190
(value: string) => {
196191
if (activeGroupIndex !== null) {
197-
const group = activeFilters[activeGroupIndex];
198192
onAddValueToGroup(activeGroupIndex, value);
199-
onTrackEvent('filter_add_or', { category: group?.category || '', value });
200193
}
201194
setChipMenuAnchor(null);
202195
setActiveGroupIndex(null);
203196
},
204-
[activeGroupIndex, activeFilters, onAddValueToGroup, onTrackEvent]
197+
[activeGroupIndex, onAddValueToGroup]
205198
);
206199

207200
// Memoize search results to avoid recalculating on every render
208201
const searchResults = useMemo(
209202
() => getSearchResults(filterCounts, activeFilters, searchQuery, selectedCategory),
210203
[filterCounts, activeFilters, searchQuery, selectedCategory]
211204
);
205+
206+
// Track searches with no results (debounced, to discover missing specs)
207+
const lastTrackedQueryRef = useRef<string>('');
208+
useEffect(() => {
209+
const query = searchQuery.trim();
210+
// Only track if: query >= 2 chars, no results, not already tracked this query
211+
if (query.length >= 2 && searchResults.length === 0 && query !== lastTrackedQueryRef.current) {
212+
const timer = setTimeout(() => {
213+
onTrackEvent('search_no_results', { query });
214+
lastTrackedQueryRef.current = query;
215+
}, 500);
216+
return () => clearTimeout(timer);
217+
}
218+
}, [searchQuery, searchResults.length, onTrackEvent]);
219+
220+
// Reset tracked query when dropdown closes
221+
useEffect(() => {
222+
if (!dropdownAnchor) {
223+
lastTrackedQueryRef.current = '';
224+
}
225+
}, [dropdownAnchor]);
226+
212227
// Only open if anchor is valid and in document
213228
const isDropdownOpen = Boolean(dropdownAnchor) && document.body.contains(dropdownAnchor);
214229
const hasQuery = searchQuery.trim().length > 0;
@@ -349,10 +364,7 @@ export function FilterBar({
349364
key={`${group.category}-${index}`}
350365
label={displayLabel}
351366
onClick={(e) => handleChipClick(e, index)}
352-
onDelete={() => {
353-
onRemoveGroup(index);
354-
onTrackEvent('filter_remove_group', { category: group.category });
355-
}}
367+
onDelete={() => onRemoveGroup(index)}
356368
deleteIcon={<CloseIcon sx={{ fontSize: '1rem !important' }} />}
357369
sx={{
358370
fontFamily: '"MonoLisa", "MonoLisa Fallback", monospace',

0 commit comments

Comments
 (0)