feat(daily-regen): autonomous spec polish + cross-library similarity audit (#5714)

MarkusNeusinger · claude · web-flow · commit 92fc47caa0ac · 2026-05-05T19:45:08.000+02:00
## Summary Adds two autonomous pre-flight steps to `daily-regen.yml` that run before `bulk-generate` fans out, lifting two quality vectors that today only exist in the local `/regen` and `/update` skills into the cloud cadence. Everything runs through `claude-code-action` on Claude Max OAuth — **no extra API costs** — same pinned SHA as `impl-generate` / `impl-review` / `impl-repair`. ## What it does For each spec the `pick` job selects: 1. **Skip-gate** (Bash, no LLM) — `gh pr list --search "plots/<spec>/ in:files is:open"`. If any PR is open touching the spec, the polish step is skipped to avoid racing humans or stacking auto-polish PRs. The similarity audit still runs (it's read-only). 2. **Spec polish** (claude-code-action, `--model haiku`) — audits the spec across the five `update.md` §2 dimensions (wording, missing sections, tag completeness, tag quality, tag accuracy). If anything needs work, opens an `auto-polish/<spec>/<timestamp>` branch + PR with label `auto-polish`. **Never pushes to main directly. Never auto-merges.** PR awaits human review. If nothing needs work, prints `NOOP` and stops. `continue-on-error: true` so a transient action failure does not block the main pipeline. 3. **Cross-library similarity audit** (claude-code-action, `--model haiku`) — reads the 9 `review.image_description` blobs from `plots/<spec>/metadata/python/*.yaml` and clusters libraries that converged on the same data scenario / example domain / visual variant *beyond what the spec dictated*. Optionally drills into impl `.py` files for ambiguous clusters via the Read tool. Emits `/tmp/change-requests.json` keyed by library. Project-mandated constants (Okabe-Ito palette positions 1–7, plot size and aspect ratio, theme chrome) are explicitly excluded as cluster signals. `continue-on-error: true`; if the audit fails, the collect step falls back to empty change_requests. 4. **Dispatch bulk-generate with hints** — passes `change_requests` JSON to bulk-generate, which jq-extracts the per-library hint and forwards it as the new `change_request` input to `impl-generate`. The hint is staged to `/tmp/anyplot-change-request.txt`, where the updated `impl-generate-claude.md` picks it up and treats it as a hard requirement (mirroring `regen.md` §2c verbatim — hard requirement, no sibling reads, preserve `review.strengths`, override "no changes for sake of changes"). ## Why - **Spec drift:** specs are currently written once at creation and never revisited. Tag vocab evolves, sections go missing, wording grows vague. Polish-on-cycle keeps them sharp without manual maintenance, and at ~10 cycles/day across 300+ specs, each spec gets touched once a month — drift risk is low. - **Silent convergence:** without a similarity check, 9 libs can independently land on the same scenario / domain / variant, producing nine copies of the same chart in different engines — exactly the opposite of the catalog's purpose. The hint-injection breaks the cluster cleanly (one library per cluster, alphabetically later). ## Model routing - The existing `daily-regen` `model` input (default `haiku`, choices `haiku`/`sonnet`/`opus`) is **unchanged** and still flows to bulk-generate → impl-generate / review / repair. - The two new pre-flight LLM steps **hardcode** `--model haiku` — they're narrow, cheap audits. ## dry_run semantics `dry_run=true` runs the read-only and decision-only steps so operators can preview what the cycle will do without committing anything: - Runs: pick, skip-gate, similarity audit (read-only), collect change_requests - Skipped: spec polish (would open a real PR — side effect), dispatch bulk-generate (would fan out 9 impl-generate jobs) To preview spec polish in isolation, run a real (non-dry-run) cycle against a single spec: `gh workflow run daily-regen.yml -f specification_id=<spec> -f model=haiku`. Polish opens a PR; merge or close it manually. ## Backwards compatibility Both new inputs (`change_request` on impl-generate, `change_requests` on bulk-generate) default to empty (`""` and `'{}'`). Existing manual triggers without these inputs behave byte-identically to today. ## Risks + rollback - Auto-polish PRs accumulate if humans never review them. Skip-gate prevents duplicates per spec; the spec just doesn't get polished further until reviewed. Acceptable: human stays in control. - Spec polish prompt has hard rules: no changes to `id` / `issue` / `created`, no semantic changes (data shape, plot type, requirements). Reviewable in commit diffs; revert if anything slips. - Rollback: revert `daily-regen.yml` first. Downstream `change_request[s]` inputs default to empty; remaining changes are no-ops without daily-regen wiring. ## Test plan - [ ] CI parses all four workflow YAMLs cleanly - [ ] Manual `gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku -f dry_run=true` — confirms `pick` + `preflight-dispatch` (skip-gate, similarity, collect) run, polish + bulk-generate dispatch are skipped (dry_run) - [ ] Manual `gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku` (no dry_run) — confirms the full chain: polish either NOOPs or opens a PR, similarity emits change_requests, bulk-generate fires, an impl-generate run with non-empty hint shows `::notice::Change request staged: …` - [ ] On the auto-polish PR (if produced): `id` / `issue` / `created` unchanged; only wording / sections / tags polished; `updated` bumped 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/.github/workflows/bulk-generate.yml b/.github/workflows/bulk-generate.yml
@@ -45,6 +45,11 @@ on:
           - haiku
           - sonnet
           - opus
+      change_requests:
+        description: "JSON object {library: one-sentence-hint} from daily-regen similarity audit. Empty = no clusters."
+        required: false
+        type: string
+        default: '{}'
 
 env:
   ALL_LIBRARIES: "matplotlib seaborn plotly bokeh altair plotnine pygal highcharts letsplot"
@@ -178,13 +183,22 @@ jobs:
           MATRIX: ${{ needs.build-matrix.outputs.matrix }}
           PACE_SECONDS: ${{ inputs.pace_seconds || '120' }}
           MODEL: ${{ inputs.model || 'sonnet' }}
+          CHANGE_REQUESTS: ${{ inputs.change_requests || '{}' }}
         run: |
           set -u
 
           pace="${PACE_SECONDS}"
           pairs=$(echo "$MATRIX" | jq -r '.include[] | "\(.specification_id) \(.library)"')
           total=$(echo "$pairs" | wc -l | tr -d ' ')
-          echo "::notice::Dispatching $total item(s) with ${pace}s pacing between each (model=${MODEL})"
+
+          # Validate change_requests is a JSON object early — bad JSON would
+          # silently produce empty hints later and we'd never know.
+          if ! echo "$CHANGE_REQUESTS" | jq -e 'type == "object"' >/dev/null 2>&1; then
+            echo "::warning::change_requests input is not a valid JSON object; ignoring (got: ${CHANGE_REQUESTS})"
+            CHANGE_REQUESTS='{}'
+          fi
+          flagged_count=$(echo "$CHANGE_REQUESTS" | jq 'length')
+          echo "::notice::Dispatching $total item(s) with ${pace}s pacing between each (model=${MODEL}, change_requests for ${flagged_count} libs)"
 
           i=0
           failed=0
@@ -199,12 +213,19 @@ jobs:
               [ "$ISSUE" = "null" ] && ISSUE=""
             fi
 
+            # Per-library divergence hint (empty if not flagged).
+            HINT=$(echo "$CHANGE_REQUESTS" | jq -r --arg lib "$LIBRARY" '.[$lib] // ""')
+
             # Best-effort pending label so the issue shows the in-flight lib.
             if [ -n "$ISSUE" ]; then
               gh issue edit "$ISSUE" --add-label "impl:${LIBRARY}:pending" 2>/dev/null || true
             fi
 
-            echo "::notice::[$i/$total] $(date -u +%H:%M:%SZ)  dispatching impl-generate for ${SPEC_ID}/${LIBRARY} (issue: ${ISSUE:-none})"
+            if [ -n "$HINT" ]; then
+              echo "::notice::[$i/$total] $(date -u +%H:%M:%SZ)  dispatching impl-generate for ${SPEC_ID}/${LIBRARY} (issue: ${ISSUE:-none}, change_request: ${HINT})"
+            else
+              echo "::notice::[$i/$total] $(date -u +%H:%M:%SZ)  dispatching impl-generate for ${SPEC_ID}/${LIBRARY} (issue: ${ISSUE:-none})"
+            fi
 
             # Retry dispatch up to 3× with linear backoff.
             dispatched=0
@@ -214,12 +235,14 @@ jobs:
                   -f specification_id="${SPEC_ID}" \
                   -f library="${LIBRARY}" \
                   -f issue_number="${ISSUE}" \
-                  -f model="${MODEL}" && dispatched=1 && break
+                  -f model="${MODEL}" \
+                  -f change_request="${HINT}" && dispatched=1 && break
               else
                 gh workflow run impl-generate.yml --repo "${{ github.repository }}" \
                   -f specification_id="${SPEC_ID}" \
                   -f library="${LIBRARY}" \
-                  -f model="${MODEL}" && dispatched=1 && break
+                  -f model="${MODEL}" \
+                  -f change_request="${HINT}" && dispatched=1 && break
               fi
               echo "::warning::Dispatch attempt $attempt failed for ${SPEC_ID}/${LIBRARY}, retrying in 10s"
               sleep 10
diff --git a/.github/workflows/daily-regen.yml b/.github/workflows/daily-regen.yml
@@ -61,6 +61,7 @@ jobs:
     runs-on: ubuntu-latest
     outputs:
       specs: ${{ steps.pick.outputs.specs }}
+      specs_json: ${{ steps.pick.outputs.specs_json }}
       count: ${{ steps.pick.outputs.count }}
     steps:
       - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6
@@ -81,6 +82,7 @@ jobs:
           SPEC_OVERRIDE: ${{ inputs.specification_id }}
         run: |
           python3 <<'PY'
+          import json
           import os
           import sys
           from datetime import datetime, timedelta, timezone
@@ -106,6 +108,7 @@ jobs:
               github_output = os.environ["GITHUB_OUTPUT"]
               with open(github_output, "a", encoding="utf-8") as f:
                   f.write(f"specs={OVERRIDE}\n")
+                  f.write(f"specs_json={json.dumps(picks)}\n")
                   f.write(f"count=1\n")
               sys.exit(0)
 
@@ -158,29 +161,137 @@ jobs:
           github_output = os.environ["GITHUB_OUTPUT"]
           with open(github_output, "a", encoding="utf-8") as f:
               f.write(f"specs={' '.join(picks)}\n")
+              f.write(f"specs_json={json.dumps(picks)}\n")
               f.write(f"count={len(picks)}\n")
           PY
 
-  dispatch:
+  # ============================================================================
+  # Pre-flight: per spec, run autonomous spec polish + cross-library similarity
+  # audit, then dispatch bulk-generate with the resulting change_requests.
+  #
+  # Each matrix entry is one spec from the pick job. We do polish + audit +
+  # dispatch in the same job so we don't have to aggregate matrix outputs back
+  # into a separate dispatch job (which is awkward in GitHub Actions).
+  #
+  # The two pre-flight LLM steps are HARDCODED to Haiku regardless of
+  # `inputs.model` — they're narrow, cheap audits. The user-selected model is
+  # passed through to bulk-generate (and from there to impl-generate / review /
+  # repair) unchanged.
+  # ============================================================================
+  preflight-dispatch:
     needs: pick
-    if: ${{ needs.pick.outputs.count != '0' && !inputs.dry_run }}
+    if: ${{ needs.pick.outputs.count != '0' }}
     runs-on: ubuntu-latest
     permissions:
-      actions: write
+      contents: write       # spec polish: branch + commit
+      pull-requests: write  # spec polish: open PR + add label
+      actions: write        # dispatch bulk-generate
+      id-token: write
+    strategy:
+      matrix:
+        spec_id: ${{ fromJson(needs.pick.outputs.specs_json) }}
+      fail-fast: false
+      max-parallel: 1   # serialize so polish PRs and dispatches don't race
+    # Note on dry_run: the JOB always runs when there's a spec to process, so
+    # operators can exercise skip-gate + similarity-audit + collect on demand.
+    # Side-effect steps (polish, dispatch) are individually gated on
+    # `!inputs.dry_run` below.
     steps:
-      - name: Trigger bulk-generate for each picked spec
+      - name: Checkout repository
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6
+        with:
+          fetch-depth: 0
+
+      - name: Skip-gate — open PRs touching this spec?
+        id: gate
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          SPEC_ID: ${{ matrix.spec_id }}
+        run: |
+          # If any PR is open that touches plots/{spec}/, skip the polish step
+          # to avoid racing against human edits or stacking auto-polish PRs.
+          # Similarity audit still runs — it's read-only.
+          OPEN=$(gh pr list \
+            --repo "${{ github.repository }}" \
+            --search "plots/${SPEC_ID}/ in:files is:open" \
+            --json number --jq 'length' 2>/dev/null || echo 0)
+          if [ "${OPEN:-0}" -gt 0 ]; then
+            echo "::notice::Open PR(s) touch plots/${SPEC_ID}/ — skipping spec polish"
+            echo "skip_polish=1" >> "$GITHUB_OUTPUT"
+          else
+            echo "skip_polish=0" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Spec polish (autonomous, opens PR — no auto-merge)
+        if: ${{ steps.gate.outputs.skip_polish == '0' && !inputs.dry_run }}
+        # Optional quality pass: a transient action failure here must not
+        # block the main regeneration pipeline. Skip cleanly and continue.
+        continue-on-error: true
+        timeout-minutes: 15
+        uses: anthropics/claude-code-action@2cc1ac1331eac7a6a96d716dd204dd2888d0fcd2  # v1
+        with:
+          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
+          claude_args: '--model haiku'
+          allowed_bots: '*'
+          prompt: |
+            Read `prompts/workflow-prompts/spec-polish-claude.md` and follow those instructions.
+
+            Variables for this run:
+            - SPEC_ID: ${{ matrix.spec_id }}
+
+      - name: Cross-library similarity audit
+        # Read-only audit; if it fails, fall back to empty change_requests
+        # rather than aborting the dispatch.
+        continue-on-error: true
+        timeout-minutes: 15
+        uses: anthropics/claude-code-action@2cc1ac1331eac7a6a96d716dd204dd2888d0fcd2  # v1
+        with:
+          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
+          claude_args: '--model haiku'
+          allowed_bots: '*'
+          prompt: |
+            Read `prompts/workflow-prompts/impl-similarity-claude.md` and follow those instructions.
+
+            Variables for this run:
+            - SPEC_ID: ${{ matrix.spec_id }}
+
+      - name: Collect change_requests
+        id: collect
+        run: |
+          # Default to empty object if the audit never wrote a file (e.g.
+          # fewer than 2 metadata files exist).
+          if [ -f /tmp/change-requests.json ]; then
+            CR=$(cat /tmp/change-requests.json)
+            # Validate it's a JSON object; fall back to empty otherwise.
+            if ! echo "$CR" | jq -e 'type == "object"' >/dev/null 2>&1; then
+              echo "::warning::/tmp/change-requests.json is not a valid JSON object; using {} (got: ${CR})"
+              CR='{}'
+            fi
+          else
+            CR='{}'
+          fi
+          # Compact + escape newlines so it survives as a single GitHub Actions output line.
+          CR_COMPACT=$(echo "$CR" | jq -c '.')
+          echo "change_requests=${CR_COMPACT}" >> "$GITHUB_OUTPUT"
+          flagged=$(echo "$CR_COMPACT" | jq 'length')
+          echo "::notice::change_requests for ${{ matrix.spec_id }}: ${flagged} lib(s) flagged — ${CR_COMPACT}"
+
+      - name: Dispatch bulk-generate with change_requests
+        if: ${{ !inputs.dry_run }}
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          SPECS: ${{ needs.pick.outputs.specs }}
+          SPEC_ID: ${{ matrix.spec_id }}
           MODEL: ${{ inputs.model || 'haiku' }}
+          CHANGE_REQUESTS: ${{ steps.collect.outputs.change_requests }}
         run: |
-          for spec in $SPECS; do
-            echo "::notice::Dispatching bulk-generate for $spec (all 9 libs, model=$MODEL)"
-            gh workflow run bulk-generate.yml \
-              --repo "${{ github.repository }}" \
-              -f specification_id="$spec" \
-              -f library=all \
-              -f model="$MODEL"
-            # Small pause between dispatches so GitHub's webhook processing has a moment.
-            sleep 5
-          done
+          echo "::notice::Dispatching bulk-generate for ${SPEC_ID} (all 9 libs, model=${MODEL})"
+          gh workflow run bulk-generate.yml \
+            --repo "${{ github.repository }}" \
+            -f specification_id="${SPEC_ID}" \
+            -f library=all \
+            -f model="${MODEL}" \
+            -f change_requests="${CHANGE_REQUESTS}"
+          # Small pause so GitHub's webhook processing has a moment before
+          # the next matrix entry's dispatch (matrix is serialized via
+          # max-parallel: 1, so this is between specs).
+          sleep 5
diff --git a/.github/workflows/impl-generate.yml b/.github/workflows/impl-generate.yml
@@ -42,6 +42,11 @@ on:
           - haiku
           - sonnet
           - opus
+      change_request:
+        description: "One-sentence cross-library divergence hint from daily-regen pre-flight similarity audit (empty = none)"
+        required: false
+        type: string
+        default: ''
 
 # Global concurrency: max 3 concurrent implementation workflows
 concurrency:
@@ -318,6 +323,16 @@ jobs:
           mkdir -p "plots/${SPEC_ID}/metadata/${LANGUAGE}"
           echo "::notice::Ensured implementation + metadata directories exist for language '${LANGUAGE}'"
 
+      - name: Stage change_request hint (cross-library divergence)
+        if: ${{ inputs.change_request != '' }}
+        env:
+          CHANGE_REQUEST: ${{ inputs.change_request }}
+        run: |
+          # Written to a file so the prompt template stays variable-free; impl-generate-claude.md
+          # checks for the file's existence and reads it if present.
+          printf '%s\n' "$CHANGE_REQUEST" > /tmp/anyplot-change-request.txt
+          echo "::notice::Change request staged: ${CHANGE_REQUEST}"
+
       - name: Run Claude Code to generate implementation
         id: claude
         continue-on-error: true
diff --git a/prompts/workflow-prompts/impl-generate-claude.md b/prompts/workflow-prompts/impl-generate-claude.md
@@ -52,6 +52,36 @@ and your own idiomatic API. The shared anchors are only the spec, the library
 prompt, and the base style guide. See `prompts/plot-generator.md` →
 "Library Independence" for the full rule.
 
+### Change Request — cross-library divergence hint
+
+If the file `/tmp/anyplot-change-request.txt` exists, read it. Its content is a
+**hard requirement** of this regen: the cross-library similarity audit (in
+`daily-regen` pre-flight) flagged this library as too close to a sibling on a
+dimension the spec didn't dictate, and produced a one-sentence direction hint
+to break the convergence.
+
+When a change_request is present:
+
+- **Apply it.** This is the only cross-library context permitted in this run;
+  treat it as binding.
+- **Do NOT open sibling-library files** even to "verify" the request. The hint
+  contains everything you need; the Library Independence rule above still
+  binds.
+- The "no changes for the sake of changes" exception (default regen mindset
+  prefers incremental improvement) does **NOT** apply when a change_request is
+  present — you must implement the requested change.
+- **Preserve `review.strengths`** while applying the new direction. Override
+  "Respect the spec variant" (below) only insofar as the change_request
+  explicitly permits — the spec-variant rule still binds the rest of the
+  implementation.
+- The hint is short by design (~1 sentence). It will name the sibling and the
+  shared signal, then suggest 2–3 alternative directions along that dimension.
+  Pick one of the suggested alternatives, or another that fits the same
+  dimension; do not invent a tangential change.
+
+If `/tmp/anyplot-change-request.txt` does not exist, ignore this section
+entirely — there is nothing to apply.
+
 ### Feasibility Check (Static Libraries Only)
 
 If LIBRARY is **matplotlib**, **seaborn**, or **plotnine**, AND the specification mentions interactive features (hover, zoom, click, brush, animation, streaming):
diff --git a/prompts/workflow-prompts/impl-similarity-claude.md b/prompts/workflow-prompts/impl-similarity-claude.md
diff --git a/prompts/workflow-prompts/spec-polish-claude.md b/prompts/workflow-prompts/spec-polish-claude.md