Skip to content

fix(workflows): update quality threshold to 90 and add CLI retry logic#1089

Merged
MarkusNeusinger merged 1 commit intomainfrom
fix/issue-967-quality-workflow-findings
Dec 16, 2025
Merged

fix(workflows): update quality threshold to 90 and add CLI retry logic#1089
MarkusNeusinger merged 1 commit intomainfrom
fix/issue-967-quality-workflow-findings

Conversation

@MarkusNeusinger
Copy link
Copy Markdown
Owner

Fixes #967

Summary

  • Update quality threshold documentation from 85 to 90
  • Add CLI retry logic for intermittent failures
  • Change issue lifecycle: spec-ready issues stay open until all implementations complete

Changes

Documentation (Issue Finding #4)

Updates quality threshold from 85 to 90 in documentation to match actual workflow configuration in impl-review.yml.

Files:

  • CLAUDE.md
  • README.md
  • docs/workflow.md
  • docs/concepts/claude-skill-plot-generation.md
  • prompts/quality-evaluator.md

CLI Retry Logic (Issue Finding #3)

Adds retry mechanism for Claude CLI steps to handle intermittent "Executable not found in $PATH" errors. Each Claude step now has continue-on-error: true and a retry step that runs if the first attempt fails.

Files:

  • .github/workflows/spec-create.yml
  • .github/workflows/spec-update.yml
  • .github/workflows/impl-generate.yml
  • .github/workflows/impl-repair.yml
  • .github/workflows/util-claude.yml

Issue Lifecycle (Bonus)

  • spec-ready issues now stay open until all 9 library implementations are merged
  • Changed Closes #... to Related to #... in spec PR body
  • Added auto-close logic in impl-merge.yml when all impl:{library}:done labels are present

Files:

  • .github/workflows/spec-create.yml
  • .github/workflows/impl-merge.yml

Test Plan

  • Verify documentation shows correct 90 threshold
  • Trigger a workflow and verify retry works on CLI failure
  • Create a test spec and verify issue stays open after merge
  • Verify issue closes when all 9 libraries have impl:{library}:done

Documentation:
- Update quality threshold from 85 to 90 across all docs
- Files: CLAUDE.md, README.md, docs/workflow.md,
  docs/concepts/claude-skill-plot-generation.md, prompts/quality-evaluator.md

CLI Retry Logic:
- Add automatic retry for Claude CLI steps on failure
- Helps handle intermittent "Executable not found in $PATH" errors
- Files: spec-create.yml, spec-update.yml, impl-generate.yml,
  impl-repair.yml, util-claude.yml

Issue Lifecycle:
- spec-ready issues now stay open until all 9 libraries are done
- Changed "Closes #..." to "Related to #..." in spec PR body
- Added auto-close when all impl:{library}:done labels present

Fixes #967
Copilot AI review requested due to automatic review settings December 16, 2025 20:35
@MarkusNeusinger MarkusNeusinger merged commit 9ddcf06 into main Dec 16, 2025
10 checks passed
@MarkusNeusinger MarkusNeusinger deleted the fix/issue-967-quality-workflow-findings branch December 16, 2025 20:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #967 by updating quality threshold documentation from 85 to 90 (matching the actual implementation), adding CLI retry logic to handle intermittent Claude failures, and changing the issue lifecycle so spec-ready issues remain open until all library implementations are complete.

Key Changes:

  • Documentation updated across 5 files to reflect the correct quality threshold of 90
  • Retry mechanism added to 5 workflow files to handle Claude CLI "Executable not found in $PATH" errors
  • Issue lifecycle modified: spec PRs now use "Related to" instead of "Closes", and issues auto-close when all 9 libraries complete

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
prompts/quality-evaluator.md Updates score thresholds: approve ≥90 (was ≥85), request_changes 80-89 (was 75-84), reject <80 (was <75)
docs/workflow.md Updates all references to quality threshold from 85 to 90 in documentation and diagrams
docs/concepts/claude-skill-plot-generation.md Updates pass_threshold parameter and loop condition from 85 to 90
README.md Updates two instances of quality score requirement from ≥85 to ≥90
CLAUDE.md Updates label descriptions and final note about quality threshold from 85 to 90
.github/workflows/util-claude.yml Adds continue-on-error and retry step to handle Claude CLI failures
.github/workflows/spec-update.yml Adds continue-on-error to Claude step and full retry step with duplicate prompt
.github/workflows/spec-create.yml Adds continue-on-error to Claude step, retry logic, and changes "Closes" to "Related to" in PR body
.github/workflows/impl-repair.yml Adds continue-on-error to Claude step and full retry step with duplicate prompt
.github/workflows/impl-generate.yml Adds continue-on-error to Claude step and full retry step with duplicate prompt
.github/workflows/impl-merge.yml Adds auto-close logic that closes issues when all 9 library implementations have impl:{lib}:done labels
Comments suppressed due to low confidence (4)

.github/workflows/impl-repair.yml:225

  • The retry step duplicates the entire prompt from the first attempt (lines 119-170). This creates maintenance burden - if the prompt needs to be updated, it must be changed in two places. Consider using YAML anchors or extracting the prompt to a separate file/variable that can be reused in both steps.
      - name: Retry Claude (on failure)
        if: steps.claude.outcome == 'failure'
        id: claude_retry
        timeout-minutes: 45
        uses: anthropics/claude-code-action@v1
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          claude_args: "--model opus"
          prompt: |
            ## Task: Repair ${{ inputs.library }} Implementation for ${{ inputs.specification_id }}

            This is **repair attempt ${{ inputs.attempt }}/3**. The previous implementation was rejected.

            ### Step 1: Read the AI review feedback
            Read `/tmp/ai_feedback.md` to understand what needs to be fixed.

            ### Step 2: Read reference files
            1. `prompts/library/${{ inputs.library }}.md` - Library-specific rules
            2. `plots/${{ inputs.specification_id }}/specification.md` - The specification
            3. `prompts/quality-criteria.md` - Quality requirements

            ### Step 3: Read current implementation
            `plots/${{ inputs.specification_id }}/implementations/${{ inputs.library }}.py`

            ### Step 4: Fix the issues
            Based on the AI feedback, fix:
            - Visual quality issues
            - Code quality issues
            - Spec compliance issues

            ### Step 5: Test the fix
            ```bash
            source .venv/bin/activate
            cd plots/${{ inputs.specification_id }}/implementations
            MPLBACKEND=Agg python ${{ inputs.library }}.py
            ```

            ### Step 6: Visual self-check
            View `plot.png` and verify fixes are correct.

            ### Step 7: Format the code
            ```bash
            source .venv/bin/activate
            ruff format plots/${{ inputs.specification_id }}/implementations/${{ inputs.library }}.py
            ruff check --fix plots/${{ inputs.specification_id }}/implementations/${{ inputs.library }}.py
            ```

            ### Step 8: Commit and push
            ```bash
            git config user.name "github-actions[bot]"
            git config user.email "github-actions[bot]@users.noreply.github.com"
            git add plots/${{ inputs.specification_id }}/implementations/${{ inputs.library }}.py
            git commit -m "fix(${{ inputs.library }}): address review feedback for ${{ inputs.specification_id }}

.github/workflows/spec-create.yml:256

  • The retry step duplicates the entire prompt from the first attempt (lines 76-161). This creates maintenance burden - if the prompt needs to be updated, it must be changed in two places. Consider using YAML anchors or extracting the prompt to a separate file/variable that can be reused in both steps.
      - name: Retry Claude (on failure)
        if: steps.check.outputs.should_run == 'true' && steps.process.outcome == 'failure'
        id: process_retry
        timeout-minutes: 30
        uses: anthropics/claude-code-action@v1
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          claude_args: "--model opus"
          prompt: |
            ## Task: Create New Specification

            You are creating a new plot specification.

            ### Issue Details
            - **Title:** ${{ github.event.issue.title }}
            - **Number:** #${{ github.event.issue.number }}
            - **Author:** ${{ github.event.issue.user.login }}
            - **Body:**
            ```
            ${{ github.event.issue.body }}
            ```

            ---

            ## Instructions

            1. **Read the rules:** `prompts/spec-id-generator.md`

            2. **Check for duplicates:**
               - List all existing specs: `ls plots/`
               - Read existing specification files if titles seem similar
               - If duplicate found: Post comment explaining which spec matches, then STOP

            3. **Generate specification-id:**
               - Format: `{type}-{variant}` or `{type}-{variant}-{modifier}`
               - Examples: `scatter-basic`, `bar-grouped-horizontal`, `heatmap-correlation`
               - All lowercase, hyphens only

            4. **Create specification branch:**
               ```bash
               git checkout -b "specification/{specification-id}"
               ```

            5. **Post analysis comment:**
               Post a SHORT comment (max 3-4 sentences) to the issue using `gh issue comment`:
               - Is this a valid/useful plot type?
               - Does it already exist? (check `ls plots/`)
               - Any concerns?

            6. **Create specification files:**
               - Read template: `prompts/templates/specification.md`
               - Read metadata template: `prompts/templates/specification.yaml`
               - Create directory: `plots/{specification-id}/`
               - Create: `plots/{specification-id}/specification.md` (follow template structure)
               - Create: `plots/{specification-id}/specification.yaml` with:
                 - `specification_id`: the generated id
                 - `title`: a proper title
                 - `created`: Use `$(date -u +"%Y-%m-%dT%H:%M:%SZ")` for current timestamp
                 - `issue`: ${{ github.event.issue.number }}
                 - `suggested`: ${{ github.event.issue.user.login }}
                 - `tags`: appropriate tags for this plot type
               - Create empty folder: `plots/{specification-id}/implementations/`
               - Create empty folder: `plots/{specification-id}/metadata/`

            7. **Commit and push:**
               ```bash
               git config user.name "github-actions[bot]"
               git config user.email "github-actions[bot]@users.noreply.github.com"
               git add plots/{specification-id}/
               git commit -m "spec: add {specification-id} specification

               Created from issue #${{ github.event.issue.number }}"
               git push -u origin "specification/{specification-id}"
               ```

            8. **Update issue title:**
               ```bash
               gh issue edit ${{ github.event.issue.number }} --title "[{specification-id}] {original title}"
               ```

            9. **Output for workflow:**
               After completing, print these lines exactly:
               ```
               SPECIFICATION_ID={specification-id}
               BRANCH=specification/{specification-id}
               ```

            ---

            ## Important Rules
            - Do NOT create a PR (the workflow does that)
            - Do NOT add labels
            - Do NOT close the issue
            - STOP after pushing the branch

.github/workflows/spec-update.yml:203

  • The retry step duplicates the entire prompt from the first attempt (lines 79-136). This creates maintenance burden - if the prompt needs to be updated, it must be changed in two places. Consider using YAML anchors or extracting the prompt to a separate file/variable that can be reused in both steps.
      - name: Retry Claude (on failure)
        if: steps.process.outcome == 'failure'
        id: process_retry
        timeout-minutes: 30
        uses: anthropics/claude-code-action@v1
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          claude_args: "--model opus"
          prompt: |
            ## Task: Update Existing Specification

            You are updating an existing plot specification.

            ### Issue Details
            - **Title:** ${{ github.event.issue.title }}
            - **Number:** #${{ github.event.issue.number }}
            - **Specification ID:** ${{ steps.extract.outputs.specification_id }}
            - **Body:**
            ```
            ${{ github.event.issue.body }}
            ```

            ---

            ## Instructions

            1. **Read current specification:**
               - `plots/${{ steps.extract.outputs.specification_id }}/specification.md`
               - `plots/${{ steps.extract.outputs.specification_id }}/specification.yaml`

            2. **Post analysis comment:**
               Post a SHORT comment (max 3-4 sentences) to the issue using `gh issue comment`:
               - Is this a valid/useful change?
               - What will be modified?
               - Any concerns?

            3. **Create update branch:**
               ```bash
               git checkout -b "specification/${{ steps.extract.outputs.specification_id }}-update"
               ```

            4. **Apply updates:**
               - Modify `plots/${{ steps.extract.outputs.specification_id }}/specification.md` as needed
               - Update `plots/${{ steps.extract.outputs.specification_id }}/specification.yaml`:
                 - Add entry to `history` array with:
                   - `date`: current timestamp (ISO 8601 format)
                   - `issue`: ${{ github.event.issue.number }}
                   - `changes`: brief description of what changed

            5. **Commit and push:**
               ```bash
               git config user.name "github-actions[bot]"
               git config user.email "github-actions[bot]@users.noreply.github.com"
               git add plots/${{ steps.extract.outputs.specification_id }}/
               git commit -m "spec: update ${{ steps.extract.outputs.specification_id }}

               Updated from issue #${{ github.event.issue.number }}"
               git push -u origin "specification/${{ steps.extract.outputs.specification_id }}-update"
               ```

            ---

            ## Important Rules
            - Do NOT create a PR (the workflow does that)
            - Do NOT add labels
            - STOP after pushing the branch

.github/workflows/impl-generate.yml:320

  • The retry step duplicates the entire prompt from the first attempt (lines 205-259). This creates maintenance burden - if the prompt needs to be updated, it must be changed in two places. Consider using YAML anchors or extracting the prompt to a separate file/variable that can be reused in both steps.
      - name: Retry Claude (on failure)
        if: steps.claude.outcome == 'failure'
        id: claude_retry
        timeout-minutes: 60
        uses: anthropics/claude-code-action@v1
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          claude_args: "--model opus"
          prompt: |
            ## Task: Generate ${{ steps.inputs.outputs.library }} Implementation

            You are generating the **${{ steps.inputs.outputs.library }}** implementation for **${{ steps.inputs.outputs.specification_id }}**.

            ### Step 1: Read required files
            1. `prompts/plot-generator.md` - Base generation rules
            2. `prompts/default-style-guide.md` - Visual style requirements
            3. `prompts/quality-criteria.md` - Quality requirements
            4. `prompts/library/${{ steps.inputs.outputs.library }}.md` - Library-specific rules
            5. `plots/${{ steps.inputs.outputs.specification_id }}/specification.md` - The specification

            ### Step 2: Generate implementation
            Create: `plots/${{ steps.inputs.outputs.specification_id }}/implementations/${{ steps.inputs.outputs.library }}.py`

            The script MUST:
            - Save as `plot.png` in the current directory
            - For interactive libraries (plotly, bokeh, altair, highcharts, pygal, letsplot): also save `plot.html`

            ### Step 3: Test and fix (up to 3 attempts)
            Run the implementation:
            ```bash
            source .venv/bin/activate
            cd plots/${{ steps.inputs.outputs.specification_id }}/implementations
            MPLBACKEND=Agg python ${{ steps.inputs.outputs.library }}.py
            ```

            If it fails, fix and try again (max 3 attempts).

            ### Step 4: Visual self-check
            Look at the generated `plot.png`:
            - Does it match the specification?
            - Are axes labeled correctly?
            - Is the visualization clear?

            ### Step 5: Format the code
            ```bash
            source .venv/bin/activate
            ruff format plots/${{ steps.inputs.outputs.specification_id }}/implementations/${{ steps.inputs.outputs.library }}.py
            ruff check --fix plots/${{ steps.inputs.outputs.specification_id }}/implementations/${{ steps.inputs.outputs.library }}.py
            ```

            ### Step 6: Commit
            ```bash
            git config user.name "github-actions[bot]"
            git config user.email "github-actions[bot]@users.noreply.github.com"
            git add plots/${{ steps.inputs.outputs.specification_id }}/implementations/${{ steps.inputs.outputs.library }}.py
            git commit -m "feat(${{ steps.inputs.outputs.library }}): implement ${{ steps.inputs.outputs.specification_id }}"
            git push -u origin implementation/${{ steps.inputs.outputs.specification_id }}/${{ steps.inputs.outputs.library }}
            ```

            ### Report result

Comment on lines +258 to +281
# Close issue if all 9 libraries are done
if [ "$DONE_COUNT" -eq 9 ]; then
gh issue comment "$ISSUE" --body "## :tada: All Implementations Complete!

All 9 library implementations for \`${SPEC_ID}\` have been successfully merged.

| Library | Status |
|---------|--------|
| matplotlib | :white_check_mark: |
| seaborn | :white_check_mark: |
| plotly | :white_check_mark: |
| bokeh | :white_check_mark: |
| altair | :white_check_mark: |
| plotnine | :white_check_mark: |
| pygal | :white_check_mark: |
| highcharts | :white_check_mark: |
| letsplot | :white_check_mark: |

---
:robot: *[impl-merge](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})*"

gh issue close "$ISSUE"
echo "::notice::Closed issue #$ISSUE - all implementations complete"
fi
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-close logic only checks for impl:{lib}:done labels and requires all 9 libraries to have this label. However, if any library implementation fails 3 times, it gets marked with impl:{lib}:failed instead (see impl-review.yml line 349). This means issues will remain open indefinitely if even a single library cannot implement the spec. Consider also counting impl:{lib}:failed labels and closing the issue when all 9 libraries have either :done or :failed status.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Findings] 5 neue Specs Test: Quality-Workflow Evaluation

2 participants