@@ -44,54 +44,77 @@ pyplots uses a **hybrid automation strategy** combining GitHub Actions (for code
4444
4545All workflows are in ` .github/workflows/ `
4646
47- ### 1. ` spec-to-code.yml ` - Code Generation
47+ ### 1. ` spec-to-code.yml ` + ` claude.yml ` - Code Generation
4848
4949** Trigger** : GitHub Issue labeled ` approved `
5050
51- ** Purpose** : Convert approved spec into implementation code
51+ ** Purpose** : Convert approved spec into implementation code using Claude Code
52+
53+ ** How it works** :
54+
55+ 1 . ** spec-to-code.yml** (trigger workflow):
56+ - Extracts spec ID from issue title (format: ` scatter-basic-001 ` )
57+ - Posts ` @claude ` comment with detailed generation instructions
58+ - Links to spec file, generation rules, and quality criteria
59+
60+ 2 . ** claude.yml** (execution workflow):
61+ - Triggers on ` @claude ` comments (via ` anthropics/claude-code-action@v1 ` )
62+ - Claude Code reads spec and generation rules
63+ - Generates implementations for matplotlib and seaborn
64+ - Self-reviews code against quality criteria (max 3 attempts)
65+ - Creates pull request with implementations
66+ - ** Visible live** at https://claude.ai/code (if OAuth token configured)
5267
5368** Steps** :
5469``` yaml
70+ # spec-to-code.yml
5571on :
5672 issues :
5773 types : [labeled]
5874
5975jobs :
60- generate -code :
76+ trigger-claude -code :
6177 if : github.event.label.name == 'approved'
62- runs-on : ubuntu-latest
6378 steps :
64- - name : Checkout repository
65- uses : actions/checkout@v4
66-
67- - name : Extract spec from issue
79+ - name : Extract spec ID from issue
6880 run : |
69- # Parse issue body (Markdown )
70- # Create specs/{spec-id}.md
81+ # Parse spec ID from title (e.g., "scatter-basic-001" )
82+ SPEC_ID=$(echo "$ISSUE_TITLE" | grep -oiP '^[a-z]+-[a-z]+-\d{3,4}')
7183
72- - name : Generate code with Claude
73- env :
74- ANTHROPIC_API_KEY : ${{ secrets.ANTHROPIC_API_KEY }}
84+ - name : Trigger Claude Code with @claude comment
7585 run : |
76- # Call automation/generators/claude_generator.py
77- # Generate plots/{library}/{type}/{spec-id}/default.py
86+ # Post @claude comment with:
87+ # - Spec file path: specs/{spec-id}.md
88+ # - Generation rules: rules/generation/v1.0.0-draft/
89+ # - Target paths for matplotlib and seaborn
90+ # - Self-review requirements
91+ ` ` `
7892
79- - name : Create pull request
80- uses : peter-evans/create-pull-request@v5
81- with :
82- title : " feat: implement ${{ env.SPEC_ID }}"
83- body : |
84- Auto-generated from issue #${{ github.event.issue.number }}
93+ ` ` ` yaml
94+ # claude.yml
95+ on :
96+ issue_comment :
97+ types : [created]
8598
86- Implements: `${{ env.SPEC_ID }}`
87- branch : " auto/${{ env.SPEC_ID }}"
88- labels : code-generated
99+ jobs :
100+ claude :
101+ if : contains(github.event.comment.body, '@claude')
102+ steps :
103+ - uses : anthropics/claude-code-action@v1
104+ with :
105+ claude_code_oauth_token : ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
106+ # Claude Code reads instructions from @claude comment
107+ # Generates code, commits, creates PR autonomously
89108```
90109
91110** Outputs** :
92- - Spec file: ` specs/{spec-id}.md`
93111- Implementation files: ` plots/{library}/{type}/{spec-id}/default.py `
94- - Pull request linked to original issue
112+ - Pull request with title: ` feat: implement {spec-id} `
113+ - PR linked to original issue
114+ - Live progress visible in Claude Code Web
115+
116+ ** Required Secrets** :
117+ - ` CLAUDE_CODE_OAUTH_TOKEN ` : OAuth token from https://claude.ai/code/settings
95118
96119---
97120
@@ -167,102 +190,89 @@ jobs:
167190
168191---
169192
170- # ## 3. `quality-check.yml` - Multi-LLM Quality Evaluation
193+ ### 3. ` quality-check.yml` + `claude.yml` - Quality Evaluation
194+
195+ **Trigger**: Preview images uploaded to GCS (triggered by `test-and-preview.yml`)
196+
197+ **Purpose**: Evaluate generated plots using Claude Code with vision capabilities
171198
172- **Trigger **: Preview images uploaded to GCS (via workflow dispatch or comment command)
199+ **How it works **:
173200
174- **Purpose**: Multi-LLM consensus evaluation of generated plots
201+ 1. **test-and-preview.yml** uploads preview images to GCS
202+ 2. **quality-check.yml** downloads images and posts `@claude` comment
203+ 3. **claude.yml** triggers on `@claude` comment
204+ 4. Claude Code evaluates images against spec criteria using vision
205+ 5. Claude Code posts quality report and updates labels
175206
176207**Steps**:
177208` ` ` yaml
209+ # quality-check.yml
178210on:
179- workflow_dispatch:
180- inputs:
181- pr_number:
182- required: true
183- spec_id:
184- required: true
211+ workflow_run:
212+ workflows: ["Test and Preview"]
213+ types: [completed]
185214
186215jobs:
187- quality-check:
188- runs-on: ubuntu-latest
216+ trigger-quality-check:
189217 steps:
190- - name: Download previews from GCS
191- run: |
192- # Download all preview PNGs for this spec
193-
194- - name: Load spec
195- run: |
196- # Read specs/{spec-id}.md
197-
198- - name: Claude evaluation
199- env:
200- ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
201- run: |
202- # Call automation/generators/quality_checker.py
203- # Returns score + feedback
204-
205- - name: Gemini evaluation (critical decisions only)
206- if: env.IS_NEW_PLOT_TYPE == 'true'
207- env:
208- GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
218+ - name: Download preview images from GCS
209219 run: |
210- # Vertex AI call
220+ # Download preview PNGs to preview_images/ directory
221+ gsutil -m cp -r "gs://$BUCKET/$PATH/*" preview_images/
211222
212- - name: GPT evaluation (critical decisions only)
213- if: env.IS_NEW_PLOT_TYPE == 'true'
214- env:
215- OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
223+ - name: Trigger Claude Code with @claude comment
216224 run: |
217- # OpenAI API call
225+ # Post @claude comment on PR with:
226+ # - Preview image locations
227+ # - Spec file reference
228+ # - Quality criteria from spec
229+ # - Scoring guidelines (0-100, ≥85 to pass)
230+ ` ` `
218231
219- - name: Calculate consensus
220- run: |
221- # Median score across all LLMs
222- # Pass if >= 85
232+ ` ` ` yaml
233+ # claude.yml
234+ on:
235+ issue_comment:
236+ types: [created]
223237
224- - name: Post results to issue
225- uses: peter-evans/create-or-update-comment@v3
238+ jobs:
239+ claude:
240+ if: contains(github.event.comment.body, '@claude')
241+ steps:
242+ - uses: anthropics/claude-code-action@v1
226243 with:
227- issue-number: ${{ inputs.pr_number }}
228- body: |
229- ## 🤖 Quality Check Results
230-
231- **Claude:** ${{ env.CLAUDE_SCORE }}/100
232- ${{ env.CLAUDE_FEEDBACK }}
233-
234- **Gemini:** ${{ env.GEMINI_SCORE }}/100 (if applicable)
235- **GPT-4:** ${{ env.GPT_SCORE }}/100 (if applicable)
236-
237- **Consensus:** ${{ env.CONSENSUS }} (Median: ${{ env.MEDIAN_SCORE }})
238-
239- ---
244+ claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
245+ # Claude Code:
246+ # 1. Reads spec file
247+ # 2. Views preview images with vision
248+ # 3. Evaluates against quality criteria
249+ # 4. Posts quality report as comment
250+ # 5. Adds label: quality-approved or quality-check-failed
251+ ` ` `
240252
241- ${{ env.DETAILED_FEEDBACK }}
253+ **Evaluation Process**:
254+ 1. For each preview image :
255+ - Parse filename to extract spec_id, library, variant
256+ - Read corresponding spec file
257+ - View image using Claude's vision capabilities
258+ - Check against all quality criteria in spec
259+ - Score 0-100 (≥85 to pass)
242260
243- - name: Update labels
244- run: |
245- if [ "${{ env.CONSENSUS }}" == "APPROVED" ]; then
246- gh issue edit ${{ inputs.pr_number }} --add-label quality-approved
247- else
248- gh issue edit ${{ inputs.pr_number }} --add-label quality-failed-attempt-${{ env.ATTEMPT_NUMBER }}
249- fi
250-
251- - name: Trigger regeneration if needed
252- if: env.CONSENSUS == 'REJECTED' && env.ATTEMPT_NUMBER < 3
253- run: |
254- # Trigger spec-to-code.yml again with feedback
255- ` ` `
261+ 2. Generate quality report with :
262+ - Overall verdict (PASS/FAIL)
263+ - Score for each implementation
264+ - Specific feedback per quality criterion
265+ - Strengths and improvements needed
256266
257267**Quality Gate**:
258- - Routine plots : Claude only (fast, cost-effective)
259- - Critical plots : Multi-LLM consensus (≥2 of 3 must approve)
260- - Pass threshold : Median score ≥ 85
261-
262- **Feedback Loop **:
263- - Attempt 1 fails → Regenerate with feedback
264- - Attempt 2 fails → Regenerate with feedback
265- - Attempt 3 fails → Mark as `quality-failed`, requires human review
268+ - Claude vision-based evaluation
269+ - Pass threshold : Score ≥ 85 for all implementations
270+ - Objective, measurable criteria from spec
271+
272+ **Outputs **:
273+ - Quality report as PR comment
274+ - Label : ` quality-approved ` or `quality-check-failed`
275+ - Live evaluation visible in Claude Code Web
266276
267277---
268278
0 commit comments