Skip to content

Commit 92fc47c

Browse files
feat(daily-regen): autonomous spec polish + cross-library similarity audit (#5714)
## Summary Adds two autonomous pre-flight steps to `daily-regen.yml` that run before `bulk-generate` fans out, lifting two quality vectors that today only exist in the local `/regen` and `/update` skills into the cloud cadence. Everything runs through `claude-code-action` on Claude Max OAuth — **no extra API costs** — same pinned SHA as `impl-generate` / `impl-review` / `impl-repair`. ## What it does For each spec the `pick` job selects: 1. **Skip-gate** (Bash, no LLM) — `gh pr list --search "plots/<spec>/ in:files is:open"`. If any PR is open touching the spec, the polish step is skipped to avoid racing humans or stacking auto-polish PRs. The similarity audit still runs (it's read-only). 2. **Spec polish** (claude-code-action, `--model haiku`) — audits the spec across the five `update.md` §2 dimensions (wording, missing sections, tag completeness, tag quality, tag accuracy). If anything needs work, opens an `auto-polish/<spec>/<timestamp>` branch + PR with label `auto-polish`. **Never pushes to main directly. Never auto-merges.** PR awaits human review. If nothing needs work, prints `NOOP` and stops. `continue-on-error: true` so a transient action failure does not block the main pipeline. 3. **Cross-library similarity audit** (claude-code-action, `--model haiku`) — reads the 9 `review.image_description` blobs from `plots/<spec>/metadata/python/*.yaml` and clusters libraries that converged on the same data scenario / example domain / visual variant *beyond what the spec dictated*. Optionally drills into impl `.py` files for ambiguous clusters via the Read tool. Emits `/tmp/change-requests.json` keyed by library. Project-mandated constants (Okabe-Ito palette positions 1–7, plot size and aspect ratio, theme chrome) are explicitly excluded as cluster signals. `continue-on-error: true`; if the audit fails, the collect step falls back to empty change_requests. 4. **Dispatch bulk-generate with hints** — passes `change_requests` JSON to bulk-generate, which jq-extracts the per-library hint and forwards it as the new `change_request` input to `impl-generate`. The hint is staged to `/tmp/anyplot-change-request.txt`, where the updated `impl-generate-claude.md` picks it up and treats it as a hard requirement (mirroring `regen.md` §2c verbatim — hard requirement, no sibling reads, preserve `review.strengths`, override "no changes for sake of changes"). ## Why - **Spec drift:** specs are currently written once at creation and never revisited. Tag vocab evolves, sections go missing, wording grows vague. Polish-on-cycle keeps them sharp without manual maintenance, and at ~10 cycles/day across 300+ specs, each spec gets touched once a month — drift risk is low. - **Silent convergence:** without a similarity check, 9 libs can independently land on the same scenario / domain / variant, producing nine copies of the same chart in different engines — exactly the opposite of the catalog's purpose. The hint-injection breaks the cluster cleanly (one library per cluster, alphabetically later). ## Model routing - The existing `daily-regen` `model` input (default `haiku`, choices `haiku`/`sonnet`/`opus`) is **unchanged** and still flows to bulk-generate → impl-generate / review / repair. - The two new pre-flight LLM steps **hardcode** `--model haiku` — they're narrow, cheap audits. ## dry_run semantics `dry_run=true` runs the read-only and decision-only steps so operators can preview what the cycle will do without committing anything: - Runs: pick, skip-gate, similarity audit (read-only), collect change_requests - Skipped: spec polish (would open a real PR — side effect), dispatch bulk-generate (would fan out 9 impl-generate jobs) To preview spec polish in isolation, run a real (non-dry-run) cycle against a single spec: `gh workflow run daily-regen.yml -f specification_id=<spec> -f model=haiku`. Polish opens a PR; merge or close it manually. ## Backwards compatibility Both new inputs (`change_request` on impl-generate, `change_requests` on bulk-generate) default to empty (`""` and `'{}'`). Existing manual triggers without these inputs behave byte-identically to today. ## Risks + rollback - Auto-polish PRs accumulate if humans never review them. Skip-gate prevents duplicates per spec; the spec just doesn't get polished further until reviewed. Acceptable: human stays in control. - Spec polish prompt has hard rules: no changes to `id` / `issue` / `created`, no semantic changes (data shape, plot type, requirements). Reviewable in commit diffs; revert if anything slips. - Rollback: revert `daily-regen.yml` first. Downstream `change_request[s]` inputs default to empty; remaining changes are no-ops without daily-regen wiring. ## Test plan - [ ] CI parses all four workflow YAMLs cleanly - [ ] Manual `gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku -f dry_run=true` — confirms `pick` + `preflight-dispatch` (skip-gate, similarity, collect) run, polish + bulk-generate dispatch are skipped (dry_run) - [ ] Manual `gh workflow run daily-regen.yml --ref feat/daily-regen-pre-flight -f specification_id=<spec> -f model=haiku` (no dry_run) — confirms the full chain: polish either NOOPs or opens a PR, similarity emits change_requests, bulk-generate fires, an impl-generate run with non-empty hint shows `::notice::Change request staged: …` - [ ] On the auto-polish PR (if produced): `id` / `issue` / `created` unchanged; only wording / sections / tags polished; `updated` bumped 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 50410a5 commit 92fc47c

6 files changed

Lines changed: 394 additions & 19 deletions

File tree

.github/workflows/bulk-generate.yml

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,11 @@ on:
4545
- haiku
4646
- sonnet
4747
- opus
48+
change_requests:
49+
description: "JSON object {library: one-sentence-hint} from daily-regen similarity audit. Empty = no clusters."
50+
required: false
51+
type: string
52+
default: '{}'
4853

4954
env:
5055
ALL_LIBRARIES: "matplotlib seaborn plotly bokeh altair plotnine pygal highcharts letsplot"
@@ -178,13 +183,22 @@ jobs:
178183
MATRIX: ${{ needs.build-matrix.outputs.matrix }}
179184
PACE_SECONDS: ${{ inputs.pace_seconds || '120' }}
180185
MODEL: ${{ inputs.model || 'sonnet' }}
186+
CHANGE_REQUESTS: ${{ inputs.change_requests || '{}' }}
181187
run: |
182188
set -u
183189
184190
pace="${PACE_SECONDS}"
185191
pairs=$(echo "$MATRIX" | jq -r '.include[] | "\(.specification_id) \(.library)"')
186192
total=$(echo "$pairs" | wc -l | tr -d ' ')
187-
echo "::notice::Dispatching $total item(s) with ${pace}s pacing between each (model=${MODEL})"
193+
194+
# Validate change_requests is a JSON object early — bad JSON would
195+
# silently produce empty hints later and we'd never know.
196+
if ! echo "$CHANGE_REQUESTS" | jq -e 'type == "object"' >/dev/null 2>&1; then
197+
echo "::warning::change_requests input is not a valid JSON object; ignoring (got: ${CHANGE_REQUESTS})"
198+
CHANGE_REQUESTS='{}'
199+
fi
200+
flagged_count=$(echo "$CHANGE_REQUESTS" | jq 'length')
201+
echo "::notice::Dispatching $total item(s) with ${pace}s pacing between each (model=${MODEL}, change_requests for ${flagged_count} libs)"
188202
189203
i=0
190204
failed=0
@@ -199,12 +213,19 @@ jobs:
199213
[ "$ISSUE" = "null" ] && ISSUE=""
200214
fi
201215
216+
# Per-library divergence hint (empty if not flagged).
217+
HINT=$(echo "$CHANGE_REQUESTS" | jq -r --arg lib "$LIBRARY" '.[$lib] // ""')
218+
202219
# Best-effort pending label so the issue shows the in-flight lib.
203220
if [ -n "$ISSUE" ]; then
204221
gh issue edit "$ISSUE" --add-label "impl:${LIBRARY}:pending" 2>/dev/null || true
205222
fi
206223
207-
echo "::notice::[$i/$total] $(date -u +%H:%M:%SZ) dispatching impl-generate for ${SPEC_ID}/${LIBRARY} (issue: ${ISSUE:-none})"
224+
if [ -n "$HINT" ]; then
225+
echo "::notice::[$i/$total] $(date -u +%H:%M:%SZ) dispatching impl-generate for ${SPEC_ID}/${LIBRARY} (issue: ${ISSUE:-none}, change_request: ${HINT})"
226+
else
227+
echo "::notice::[$i/$total] $(date -u +%H:%M:%SZ) dispatching impl-generate for ${SPEC_ID}/${LIBRARY} (issue: ${ISSUE:-none})"
228+
fi
208229
209230
# Retry dispatch up to 3× with linear backoff.
210231
dispatched=0
@@ -214,12 +235,14 @@ jobs:
214235
-f specification_id="${SPEC_ID}" \
215236
-f library="${LIBRARY}" \
216237
-f issue_number="${ISSUE}" \
217-
-f model="${MODEL}" && dispatched=1 && break
238+
-f model="${MODEL}" \
239+
-f change_request="${HINT}" && dispatched=1 && break
218240
else
219241
gh workflow run impl-generate.yml --repo "${{ github.repository }}" \
220242
-f specification_id="${SPEC_ID}" \
221243
-f library="${LIBRARY}" \
222-
-f model="${MODEL}" && dispatched=1 && break
244+
-f model="${MODEL}" \
245+
-f change_request="${HINT}" && dispatched=1 && break
223246
fi
224247
echo "::warning::Dispatch attempt $attempt failed for ${SPEC_ID}/${LIBRARY}, retrying in 10s"
225248
sleep 10

.github/workflows/daily-regen.yml

Lines changed: 126 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ jobs:
6161
runs-on: ubuntu-latest
6262
outputs:
6363
specs: ${{ steps.pick.outputs.specs }}
64+
specs_json: ${{ steps.pick.outputs.specs_json }}
6465
count: ${{ steps.pick.outputs.count }}
6566
steps:
6667
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
@@ -81,6 +82,7 @@ jobs:
8182
SPEC_OVERRIDE: ${{ inputs.specification_id }}
8283
run: |
8384
python3 <<'PY'
85+
import json
8486
import os
8587
import sys
8688
from datetime import datetime, timedelta, timezone
@@ -106,6 +108,7 @@ jobs:
106108
github_output = os.environ["GITHUB_OUTPUT"]
107109
with open(github_output, "a", encoding="utf-8") as f:
108110
f.write(f"specs={OVERRIDE}\n")
111+
f.write(f"specs_json={json.dumps(picks)}\n")
109112
f.write(f"count=1\n")
110113
sys.exit(0)
111114
@@ -158,29 +161,137 @@ jobs:
158161
github_output = os.environ["GITHUB_OUTPUT"]
159162
with open(github_output, "a", encoding="utf-8") as f:
160163
f.write(f"specs={' '.join(picks)}\n")
164+
f.write(f"specs_json={json.dumps(picks)}\n")
161165
f.write(f"count={len(picks)}\n")
162166
PY
163167
164-
dispatch:
168+
# ============================================================================
169+
# Pre-flight: per spec, run autonomous spec polish + cross-library similarity
170+
# audit, then dispatch bulk-generate with the resulting change_requests.
171+
#
172+
# Each matrix entry is one spec from the pick job. We do polish + audit +
173+
# dispatch in the same job so we don't have to aggregate matrix outputs back
174+
# into a separate dispatch job (which is awkward in GitHub Actions).
175+
#
176+
# The two pre-flight LLM steps are HARDCODED to Haiku regardless of
177+
# `inputs.model` — they're narrow, cheap audits. The user-selected model is
178+
# passed through to bulk-generate (and from there to impl-generate / review /
179+
# repair) unchanged.
180+
# ============================================================================
181+
preflight-dispatch:
165182
needs: pick
166-
if: ${{ needs.pick.outputs.count != '0' && !inputs.dry_run }}
183+
if: ${{ needs.pick.outputs.count != '0' }}
167184
runs-on: ubuntu-latest
168185
permissions:
169-
actions: write
186+
contents: write # spec polish: branch + commit
187+
pull-requests: write # spec polish: open PR + add label
188+
actions: write # dispatch bulk-generate
189+
id-token: write
190+
strategy:
191+
matrix:
192+
spec_id: ${{ fromJson(needs.pick.outputs.specs_json) }}
193+
fail-fast: false
194+
max-parallel: 1 # serialize so polish PRs and dispatches don't race
195+
# Note on dry_run: the JOB always runs when there's a spec to process, so
196+
# operators can exercise skip-gate + similarity-audit + collect on demand.
197+
# Side-effect steps (polish, dispatch) are individually gated on
198+
# `!inputs.dry_run` below.
170199
steps:
171-
- name: Trigger bulk-generate for each picked spec
200+
- name: Checkout repository
201+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
202+
with:
203+
fetch-depth: 0
204+
205+
- name: Skip-gate — open PRs touching this spec?
206+
id: gate
207+
env:
208+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
209+
SPEC_ID: ${{ matrix.spec_id }}
210+
run: |
211+
# If any PR is open that touches plots/{spec}/, skip the polish step
212+
# to avoid racing against human edits or stacking auto-polish PRs.
213+
# Similarity audit still runs — it's read-only.
214+
OPEN=$(gh pr list \
215+
--repo "${{ github.repository }}" \
216+
--search "plots/${SPEC_ID}/ in:files is:open" \
217+
--json number --jq 'length' 2>/dev/null || echo 0)
218+
if [ "${OPEN:-0}" -gt 0 ]; then
219+
echo "::notice::Open PR(s) touch plots/${SPEC_ID}/ — skipping spec polish"
220+
echo "skip_polish=1" >> "$GITHUB_OUTPUT"
221+
else
222+
echo "skip_polish=0" >> "$GITHUB_OUTPUT"
223+
fi
224+
225+
- name: Spec polish (autonomous, opens PR — no auto-merge)
226+
if: ${{ steps.gate.outputs.skip_polish == '0' && !inputs.dry_run }}
227+
# Optional quality pass: a transient action failure here must not
228+
# block the main regeneration pipeline. Skip cleanly and continue.
229+
continue-on-error: true
230+
timeout-minutes: 15
231+
uses: anthropics/claude-code-action@2cc1ac1331eac7a6a96d716dd204dd2888d0fcd2 # v1
232+
with:
233+
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
234+
claude_args: '--model haiku'
235+
allowed_bots: '*'
236+
prompt: |
237+
Read `prompts/workflow-prompts/spec-polish-claude.md` and follow those instructions.
238+
239+
Variables for this run:
240+
- SPEC_ID: ${{ matrix.spec_id }}
241+
242+
- name: Cross-library similarity audit
243+
# Read-only audit; if it fails, fall back to empty change_requests
244+
# rather than aborting the dispatch.
245+
continue-on-error: true
246+
timeout-minutes: 15
247+
uses: anthropics/claude-code-action@2cc1ac1331eac7a6a96d716dd204dd2888d0fcd2 # v1
248+
with:
249+
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
250+
claude_args: '--model haiku'
251+
allowed_bots: '*'
252+
prompt: |
253+
Read `prompts/workflow-prompts/impl-similarity-claude.md` and follow those instructions.
254+
255+
Variables for this run:
256+
- SPEC_ID: ${{ matrix.spec_id }}
257+
258+
- name: Collect change_requests
259+
id: collect
260+
run: |
261+
# Default to empty object if the audit never wrote a file (e.g.
262+
# fewer than 2 metadata files exist).
263+
if [ -f /tmp/change-requests.json ]; then
264+
CR=$(cat /tmp/change-requests.json)
265+
# Validate it's a JSON object; fall back to empty otherwise.
266+
if ! echo "$CR" | jq -e 'type == "object"' >/dev/null 2>&1; then
267+
echo "::warning::/tmp/change-requests.json is not a valid JSON object; using {} (got: ${CR})"
268+
CR='{}'
269+
fi
270+
else
271+
CR='{}'
272+
fi
273+
# Compact + escape newlines so it survives as a single GitHub Actions output line.
274+
CR_COMPACT=$(echo "$CR" | jq -c '.')
275+
echo "change_requests=${CR_COMPACT}" >> "$GITHUB_OUTPUT"
276+
flagged=$(echo "$CR_COMPACT" | jq 'length')
277+
echo "::notice::change_requests for ${{ matrix.spec_id }}: ${flagged} lib(s) flagged — ${CR_COMPACT}"
278+
279+
- name: Dispatch bulk-generate with change_requests
280+
if: ${{ !inputs.dry_run }}
172281
env:
173282
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
174-
SPECS: ${{ needs.pick.outputs.specs }}
283+
SPEC_ID: ${{ matrix.spec_id }}
175284
MODEL: ${{ inputs.model || 'haiku' }}
285+
CHANGE_REQUESTS: ${{ steps.collect.outputs.change_requests }}
176286
run: |
177-
for spec in $SPECS; do
178-
echo "::notice::Dispatching bulk-generate for $spec (all 9 libs, model=$MODEL)"
179-
gh workflow run bulk-generate.yml \
180-
--repo "${{ github.repository }}" \
181-
-f specification_id="$spec" \
182-
-f library=all \
183-
-f model="$MODEL"
184-
# Small pause between dispatches so GitHub's webhook processing has a moment.
185-
sleep 5
186-
done
287+
echo "::notice::Dispatching bulk-generate for ${SPEC_ID} (all 9 libs, model=${MODEL})"
288+
gh workflow run bulk-generate.yml \
289+
--repo "${{ github.repository }}" \
290+
-f specification_id="${SPEC_ID}" \
291+
-f library=all \
292+
-f model="${MODEL}" \
293+
-f change_requests="${CHANGE_REQUESTS}"
294+
# Small pause so GitHub's webhook processing has a moment before
295+
# the next matrix entry's dispatch (matrix is serialized via
296+
# max-parallel: 1, so this is between specs).
297+
sleep 5

.github/workflows/impl-generate.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,11 @@ on:
4242
- haiku
4343
- sonnet
4444
- opus
45+
change_request:
46+
description: "One-sentence cross-library divergence hint from daily-regen pre-flight similarity audit (empty = none)"
47+
required: false
48+
type: string
49+
default: ''
4550

4651
# Global concurrency: max 3 concurrent implementation workflows
4752
concurrency:
@@ -318,6 +323,16 @@ jobs:
318323
mkdir -p "plots/${SPEC_ID}/metadata/${LANGUAGE}"
319324
echo "::notice::Ensured implementation + metadata directories exist for language '${LANGUAGE}'"
320325
326+
- name: Stage change_request hint (cross-library divergence)
327+
if: ${{ inputs.change_request != '' }}
328+
env:
329+
CHANGE_REQUEST: ${{ inputs.change_request }}
330+
run: |
331+
# Written to a file so the prompt template stays variable-free; impl-generate-claude.md
332+
# checks for the file's existence and reads it if present.
333+
printf '%s\n' "$CHANGE_REQUEST" > /tmp/anyplot-change-request.txt
334+
echo "::notice::Change request staged: ${CHANGE_REQUEST}"
335+
321336
- name: Run Claude Code to generate implementation
322337
id: claude
323338
continue-on-error: true

prompts/workflow-prompts/impl-generate-claude.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,36 @@ and your own idiomatic API. The shared anchors are only the spec, the library
5252
prompt, and the base style guide. See `prompts/plot-generator.md`
5353
"Library Independence" for the full rule.
5454

55+
### Change Request — cross-library divergence hint
56+
57+
If the file `/tmp/anyplot-change-request.txt` exists, read it. Its content is a
58+
**hard requirement** of this regen: the cross-library similarity audit (in
59+
`daily-regen` pre-flight) flagged this library as too close to a sibling on a
60+
dimension the spec didn't dictate, and produced a one-sentence direction hint
61+
to break the convergence.
62+
63+
When a change_request is present:
64+
65+
- **Apply it.** This is the only cross-library context permitted in this run;
66+
treat it as binding.
67+
- **Do NOT open sibling-library files** even to "verify" the request. The hint
68+
contains everything you need; the Library Independence rule above still
69+
binds.
70+
- The "no changes for the sake of changes" exception (default regen mindset
71+
prefers incremental improvement) does **NOT** apply when a change_request is
72+
present — you must implement the requested change.
73+
- **Preserve `review.strengths`** while applying the new direction. Override
74+
"Respect the spec variant" (below) only insofar as the change_request
75+
explicitly permits — the spec-variant rule still binds the rest of the
76+
implementation.
77+
- The hint is short by design (~1 sentence). It will name the sibling and the
78+
shared signal, then suggest 2–3 alternative directions along that dimension.
79+
Pick one of the suggested alternatives, or another that fits the same
80+
dimension; do not invent a tangential change.
81+
82+
If `/tmp/anyplot-change-request.txt` does not exist, ignore this section
83+
entirely — there is nothing to apply.
84+
5585
### Feasibility Check (Static Libraries Only)
5686

5787
If LIBRARY is **matplotlib**, **seaborn**, or **plotnine**, AND the specification mentions interactive features (hover, zoom, click, brush, animation, streaming):

0 commit comments

Comments
 (0)