Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .github/workflows/claude.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ on:
types: [created]
pull_request_review_comment:
types: [created]
issues:
types: [opened, assigned]
pull_request_review:
types: [submitted]

Expand All @@ -15,8 +13,7 @@ jobs:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
(github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude'))
runs-on: ubuntu-latest
permissions:
contents: write # Allow commits
Expand Down
216 changes: 113 additions & 103 deletions docs/architecture/automation-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,54 +44,77 @@ pyplots uses a **hybrid automation strategy** combining GitHub Actions (for code

All workflows are in `.github/workflows/`

### 1. `spec-to-code.yml` - Code Generation
### 1. `spec-to-code.yml` + `claude.yml` - Code Generation

**Trigger**: GitHub Issue labeled `approved`

**Purpose**: Convert approved spec into implementation code
**Purpose**: Convert approved spec into implementation code using Claude Code

**How it works**:

1. **spec-to-code.yml** (trigger workflow):
- Extracts spec ID from issue title (format: `scatter-basic-001`)
- Posts `@claude` comment with detailed generation instructions
- Links to spec file, generation rules, and quality criteria

2. **claude.yml** (execution workflow):
- Triggers on `@claude` comments (via `anthropics/claude-code-action@v1`)
- Claude Code reads spec and generation rules
- Generates implementations for matplotlib and seaborn
- Self-reviews code against quality criteria (max 3 attempts)
- Creates pull request with implementations
- **Visible live** at https://claude.ai/code (if OAuth token configured)

**Steps**:
```yaml
# spec-to-code.yml
on:
issues:
types: [labeled]

jobs:
generate-code:
trigger-claude-code:
if: github.event.label.name == 'approved'
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Extract spec from issue
- name: Extract spec ID from issue
run: |
# Parse issue body (Markdown)
# Create specs/{spec-id}.md
# Parse spec ID from title (e.g., "scatter-basic-001")
SPEC_ID=$(echo "$ISSUE_TITLE" | grep -oiP '^[a-z]+-[a-z]+-\d{3,4}')

- name: Generate code with Claude
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Trigger Claude Code with @claude comment
run: |
# Call automation/generators/claude_generator.py
# Generate plots/{library}/{type}/{spec-id}/default.py
# Post @claude comment with:
# - Spec file path: specs/{spec-id}.md
# - Generation rules: rules/generation/v1.0.0-draft/
# - Target paths for matplotlib and seaborn
# - Self-review requirements
```

- name: Create pull request
uses: peter-evans/create-pull-request@v5
with:
title: "feat: implement ${{ env.SPEC_ID }}"
body: |
Auto-generated from issue #${{ github.event.issue.number }}
```yaml
# claude.yml
on:
issue_comment:
types: [created]

Implements: `${{ env.SPEC_ID }}`
branch: "auto/${{ env.SPEC_ID }}"
labels: code-generated
jobs:
claude:
if: contains(github.event.comment.body, '@claude')
steps:
- uses: anthropics/claude-code-action@v1
with:
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
# Claude Code reads instructions from @claude comment
# Generates code, commits, creates PR autonomously
```

**Outputs**:
- Spec file: `specs/{spec-id}.md`
- Implementation files: `plots/{library}/{type}/{spec-id}/default.py`
- Pull request linked to original issue
- Pull request with title: `feat: implement {spec-id}`
- PR linked to original issue
- Live progress visible in Claude Code Web

**Required Secrets**:
- `CLAUDE_CODE_OAUTH_TOKEN`: OAuth token from https://claude.ai/code/settings

---

Expand Down Expand Up @@ -167,102 +190,89 @@ jobs:

---

### 3. `quality-check.yml` - Multi-LLM Quality Evaluation
### 3. `quality-check.yml` + `claude.yml` - Quality Evaluation

**Trigger**: Preview images uploaded to GCS (triggered by `test-and-preview.yml`)

**Purpose**: Evaluate generated plots using Claude Code with vision capabilities

**Trigger**: Preview images uploaded to GCS (via workflow dispatch or comment command)
**How it works**:

**Purpose**: Multi-LLM consensus evaluation of generated plots
1. **test-and-preview.yml** uploads preview images to GCS
2. **quality-check.yml** downloads images and posts `@claude` comment
3. **claude.yml** triggers on `@claude` comment
4. Claude Code evaluates images against spec criteria using vision
5. Claude Code posts quality report and updates labels

**Steps**:
```yaml
# quality-check.yml
on:
workflow_dispatch:
inputs:
pr_number:
required: true
spec_id:
required: true
workflow_run:
workflows: ["Test and Preview"]
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow name reference is incorrect. The documentation references "Test and Preview" but the actual workflow in .github/workflows/test-and-preview.yml is named "Test and Generate Previews". This should be updated to match the actual workflow name to ensure the workflow_run trigger works correctly.

Suggested change
workflows: ["Test and Preview"]
workflows: ["Test and Generate Previews"]

Copilot uses AI. Check for mistakes.
types: [completed]

jobs:
quality-check:
runs-on: ubuntu-latest
trigger-quality-check:
steps:
- name: Download previews from GCS
run: |
# Download all preview PNGs for this spec

- name: Load spec
run: |
# Read specs/{spec-id}.md

- name: Claude evaluation
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
# Call automation/generators/quality_checker.py
# Returns score + feedback

- name: Gemini evaluation (critical decisions only)
if: env.IS_NEW_PLOT_TYPE == 'true'
env:
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
- name: Download preview images from GCS
run: |
# Vertex AI call
# Download preview PNGs to preview_images/ directory
gsutil -m cp -r "gs://$BUCKET/$PATH/*" preview_images/

- name: GPT evaluation (critical decisions only)
if: env.IS_NEW_PLOT_TYPE == 'true'
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
- name: Trigger Claude Code with @claude comment
run: |
# OpenAI API call
# Post @claude comment on PR with:
# - Preview image locations
# - Spec file reference
# - Quality criteria from spec
# - Scoring guidelines (0-100, ≥85 to pass)
```

- name: Calculate consensus
run: |
# Median score across all LLMs
# Pass if >= 85
```yaml
# claude.yml
on:
issue_comment:
types: [created]

- name: Post results to issue
uses: peter-evans/create-or-update-comment@v3
jobs:
claude:
if: contains(github.event.comment.body, '@claude')
steps:
- uses: anthropics/claude-code-action@v1
with:
issue-number: ${{ inputs.pr_number }}
body: |
## 🤖 Quality Check Results

**Claude:** ${{ env.CLAUDE_SCORE }}/100
${{ env.CLAUDE_FEEDBACK }}

**Gemini:** ${{ env.GEMINI_SCORE }}/100 (if applicable)
**GPT-4:** ${{ env.GPT_SCORE }}/100 (if applicable)

**Consensus:** ${{ env.CONSENSUS }} (Median: ${{ env.MEDIAN_SCORE }})

---
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
# Claude Code:
# 1. Reads spec file
# 2. Views preview images with vision
# 3. Evaluates against quality criteria
# 4. Posts quality report as comment
# 5. Adds label: quality-approved or quality-check-failed
```

${{ env.DETAILED_FEEDBACK }}
**Evaluation Process**:
1. For each preview image:
- Parse filename to extract spec_id, library, variant
- Read corresponding spec file
- View image using Claude's vision capabilities
- Check against all quality criteria in spec
- Score 0-100 (≥85 to pass)

- name: Update labels
run: |
if [ "${{ env.CONSENSUS }}" == "APPROVED" ]; then
gh issue edit ${{ inputs.pr_number }} --add-label quality-approved
else
gh issue edit ${{ inputs.pr_number }} --add-label quality-failed-attempt-${{ env.ATTEMPT_NUMBER }}
fi

- name: Trigger regeneration if needed
if: env.CONSENSUS == 'REJECTED' && env.ATTEMPT_NUMBER < 3
run: |
# Trigger spec-to-code.yml again with feedback
```
2. Generate quality report with:
- Overall verdict (PASS/FAIL)
- Score for each implementation
- Specific feedback per quality criterion
- Strengths and improvements needed

**Quality Gate**:
- Routine plots: Claude only (fast, cost-effective)
- Critical plots: Multi-LLM consensus (≥2 of 3 must approve)
- Pass threshold: Median score ≥ 85

**Feedback Loop**:
- Attempt 1 fails → Regenerate with feedback
- Attempt 2 fails → Regenerate with feedback
- Attempt 3 fails → Mark as `quality-failed`, requires human review
- Claude vision-based evaluation
- Pass threshold: Score ≥ 85 for all implementations
- Objective, measurable criteria from spec

**Outputs**:
- Quality report as PR comment
- Label: `quality-approved` or `quality-check-failed`
- Live evaluation visible in Claude Code Web

---

Expand Down