Skip to content

Commit fe8ff5f

Browse files
Merge pull request #6 from MarkusNeusinger/claude/create-test-issue-merge-01D6cfGXen5SRMqh8Xgf7iUn
Create test issue and merge via GitHub
2 parents 7ca4f26 + 00b237b commit fe8ff5f

File tree

2 files changed

+114
-107
lines changed

2 files changed

+114
-107
lines changed

.github/workflows/claude.yml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@ on:
55
types: [created]
66
pull_request_review_comment:
77
types: [created]
8-
issues:
9-
types: [opened, assigned]
108
pull_request_review:
119
types: [submitted]
1210

@@ -15,8 +13,7 @@ jobs:
1513
if: |
1614
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
1715
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
18-
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
19-
(github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
16+
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude'))
2017
runs-on: ubuntu-latest
2118
permissions:
2219
contents: write # Allow commits

docs/architecture/automation-workflows.md

Lines changed: 113 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -44,54 +44,77 @@ pyplots uses a **hybrid automation strategy** combining GitHub Actions (for code
4444

4545
All workflows are in `.github/workflows/`
4646

47-
### 1. `spec-to-code.yml` - Code Generation
47+
### 1. `spec-to-code.yml` + `claude.yml` - Code Generation
4848

4949
**Trigger**: GitHub Issue labeled `approved`
5050

51-
**Purpose**: Convert approved spec into implementation code
51+
**Purpose**: Convert approved spec into implementation code using Claude Code
52+
53+
**How it works**:
54+
55+
1. **spec-to-code.yml** (trigger workflow):
56+
- Extracts spec ID from issue title (format: `scatter-basic-001`)
57+
- Posts `@claude` comment with detailed generation instructions
58+
- Links to spec file, generation rules, and quality criteria
59+
60+
2. **claude.yml** (execution workflow):
61+
- Triggers on `@claude` comments (via `anthropics/claude-code-action@v1`)
62+
- Claude Code reads spec and generation rules
63+
- Generates implementations for matplotlib and seaborn
64+
- Self-reviews code against quality criteria (max 3 attempts)
65+
- Creates pull request with implementations
66+
- **Visible live** at https://claude.ai/code (if OAuth token configured)
5267

5368
**Steps**:
5469
```yaml
70+
# spec-to-code.yml
5571
on:
5672
issues:
5773
types: [labeled]
5874

5975
jobs:
60-
generate-code:
76+
trigger-claude-code:
6177
if: github.event.label.name == 'approved'
62-
runs-on: ubuntu-latest
6378
steps:
64-
- name: Checkout repository
65-
uses: actions/checkout@v4
66-
67-
- name: Extract spec from issue
79+
- name: Extract spec ID from issue
6880
run: |
69-
# Parse issue body (Markdown)
70-
# Create specs/{spec-id}.md
81+
# Parse spec ID from title (e.g., "scatter-basic-001")
82+
SPEC_ID=$(echo "$ISSUE_TITLE" | grep -oiP '^[a-z]+-[a-z]+-\d{3,4}')
7183
72-
- name: Generate code with Claude
73-
env:
74-
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
84+
- name: Trigger Claude Code with @claude comment
7585
run: |
76-
# Call automation/generators/claude_generator.py
77-
# Generate plots/{library}/{type}/{spec-id}/default.py
86+
# Post @claude comment with:
87+
# - Spec file path: specs/{spec-id}.md
88+
# - Generation rules: rules/generation/v1.0.0-draft/
89+
# - Target paths for matplotlib and seaborn
90+
# - Self-review requirements
91+
```
7892
79-
- name: Create pull request
80-
uses: peter-evans/create-pull-request@v5
81-
with:
82-
title: "feat: implement ${{ env.SPEC_ID }}"
83-
body: |
84-
Auto-generated from issue #${{ github.event.issue.number }}
93+
```yaml
94+
# claude.yml
95+
on:
96+
issue_comment:
97+
types: [created]
8598

86-
Implements: `${{ env.SPEC_ID }}`
87-
branch: "auto/${{ env.SPEC_ID }}"
88-
labels: code-generated
99+
jobs:
100+
claude:
101+
if: contains(github.event.comment.body, '@claude')
102+
steps:
103+
- uses: anthropics/claude-code-action@v1
104+
with:
105+
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
106+
# Claude Code reads instructions from @claude comment
107+
# Generates code, commits, creates PR autonomously
89108
```
90109

91110
**Outputs**:
92-
- Spec file: `specs/{spec-id}.md`
93111
- Implementation files: `plots/{library}/{type}/{spec-id}/default.py`
94-
- Pull request linked to original issue
112+
- Pull request with title: `feat: implement {spec-id}`
113+
- PR linked to original issue
114+
- Live progress visible in Claude Code Web
115+
116+
**Required Secrets**:
117+
- `CLAUDE_CODE_OAUTH_TOKEN`: OAuth token from https://claude.ai/code/settings
95118

96119
---
97120

@@ -167,102 +190,89 @@ jobs:
167190
168191
---
169192
170-
### 3. `quality-check.yml` - Multi-LLM Quality Evaluation
193+
### 3. `quality-check.yml` + `claude.yml` - Quality Evaluation
194+
195+
**Trigger**: Preview images uploaded to GCS (triggered by `test-and-preview.yml`)
196+
197+
**Purpose**: Evaluate generated plots using Claude Code with vision capabilities
171198

172-
**Trigger**: Preview images uploaded to GCS (via workflow dispatch or comment command)
199+
**How it works**:
173200

174-
**Purpose**: Multi-LLM consensus evaluation of generated plots
201+
1. **test-and-preview.yml** uploads preview images to GCS
202+
2. **quality-check.yml** downloads images and posts `@claude` comment
203+
3. **claude.yml** triggers on `@claude` comment
204+
4. Claude Code evaluates images against spec criteria using vision
205+
5. Claude Code posts quality report and updates labels
175206

176207
**Steps**:
177208
```yaml
209+
# quality-check.yml
178210
on:
179-
workflow_dispatch:
180-
inputs:
181-
pr_number:
182-
required: true
183-
spec_id:
184-
required: true
211+
workflow_run:
212+
workflows: ["Test and Preview"]
213+
types: [completed]
185214
186215
jobs:
187-
quality-check:
188-
runs-on: ubuntu-latest
216+
trigger-quality-check:
189217
steps:
190-
- name: Download previews from GCS
191-
run: |
192-
# Download all preview PNGs for this spec
193-
194-
- name: Load spec
195-
run: |
196-
# Read specs/{spec-id}.md
197-
198-
- name: Claude evaluation
199-
env:
200-
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
201-
run: |
202-
# Call automation/generators/quality_checker.py
203-
# Returns score + feedback
204-
205-
- name: Gemini evaluation (critical decisions only)
206-
if: env.IS_NEW_PLOT_TYPE == 'true'
207-
env:
208-
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
218+
- name: Download preview images from GCS
209219
run: |
210-
# Vertex AI call
220+
# Download preview PNGs to preview_images/ directory
221+
gsutil -m cp -r "gs://$BUCKET/$PATH/*" preview_images/
211222
212-
- name: GPT evaluation (critical decisions only)
213-
if: env.IS_NEW_PLOT_TYPE == 'true'
214-
env:
215-
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
223+
- name: Trigger Claude Code with @claude comment
216224
run: |
217-
# OpenAI API call
225+
# Post @claude comment on PR with:
226+
# - Preview image locations
227+
# - Spec file reference
228+
# - Quality criteria from spec
229+
# - Scoring guidelines (0-100, ≥85 to pass)
230+
```
218231

219-
- name: Calculate consensus
220-
run: |
221-
# Median score across all LLMs
222-
# Pass if >= 85
232+
```yaml
233+
# claude.yml
234+
on:
235+
issue_comment:
236+
types: [created]
223237
224-
- name: Post results to issue
225-
uses: peter-evans/create-or-update-comment@v3
238+
jobs:
239+
claude:
240+
if: contains(github.event.comment.body, '@claude')
241+
steps:
242+
- uses: anthropics/claude-code-action@v1
226243
with:
227-
issue-number: ${{ inputs.pr_number }}
228-
body: |
229-
## 🤖 Quality Check Results
230-
231-
**Claude:** ${{ env.CLAUDE_SCORE }}/100
232-
${{ env.CLAUDE_FEEDBACK }}
233-
234-
**Gemini:** ${{ env.GEMINI_SCORE }}/100 (if applicable)
235-
**GPT-4:** ${{ env.GPT_SCORE }}/100 (if applicable)
236-
237-
**Consensus:** ${{ env.CONSENSUS }} (Median: ${{ env.MEDIAN_SCORE }})
238-
239-
---
244+
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
245+
# Claude Code:
246+
# 1. Reads spec file
247+
# 2. Views preview images with vision
248+
# 3. Evaluates against quality criteria
249+
# 4. Posts quality report as comment
250+
# 5. Adds label: quality-approved or quality-check-failed
251+
```
240252

241-
${{ env.DETAILED_FEEDBACK }}
253+
**Evaluation Process**:
254+
1. For each preview image:
255+
- Parse filename to extract spec_id, library, variant
256+
- Read corresponding spec file
257+
- View image using Claude's vision capabilities
258+
- Check against all quality criteria in spec
259+
- Score 0-100 (≥85 to pass)
242260

243-
- name: Update labels
244-
run: |
245-
if [ "${{ env.CONSENSUS }}" == "APPROVED" ]; then
246-
gh issue edit ${{ inputs.pr_number }} --add-label quality-approved
247-
else
248-
gh issue edit ${{ inputs.pr_number }} --add-label quality-failed-attempt-${{ env.ATTEMPT_NUMBER }}
249-
fi
250-
251-
- name: Trigger regeneration if needed
252-
if: env.CONSENSUS == 'REJECTED' && env.ATTEMPT_NUMBER < 3
253-
run: |
254-
# Trigger spec-to-code.yml again with feedback
255-
```
261+
2. Generate quality report with:
262+
- Overall verdict (PASS/FAIL)
263+
- Score for each implementation
264+
- Specific feedback per quality criterion
265+
- Strengths and improvements needed
256266

257267
**Quality Gate**:
258-
- Routine plots: Claude only (fast, cost-effective)
259-
- Critical plots: Multi-LLM consensus (≥2 of 3 must approve)
260-
- Pass threshold: Median score ≥ 85
261-
262-
**Feedback Loop**:
263-
- Attempt 1 fails → Regenerate with feedback
264-
- Attempt 2 fails → Regenerate with feedback
265-
- Attempt 3 fails → Mark as `quality-failed`, requires human review
268+
- Claude vision-based evaluation
269+
- Pass threshold: Score ≥ 85 for all implementations
270+
- Objective, measurable criteria from spec
271+
272+
**Outputs**:
273+
- Quality report as PR comment
274+
- Label: `quality-approved` or `quality-check-failed`
275+
- Live evaluation visible in Claude Code Web
266276

267277
---
268278

0 commit comments

Comments
 (0)