Skip to content

Add failed-ingest recovery and pre-reuse changelog validation [skip-sweep]#1821

Merged
Oseltamivir merged 13 commits into
mainfrom
workflow/add-recover-ingest-command
Jun 18, 2026
Merged

Add failed-ingest recovery and pre-reuse changelog validation [skip-sweep]#1821
Oseltamivir merged 13 commits into
mainfrom
workflow/add-recover-ingest-command

Conversation

@Oseltamivir

@Oseltamivir Oseltamivir commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • add /recover-failed-ingest <failed-run-or-job-url> [source-run-id] as a thin command over tested recovery helpers
  • run check-changelog before reuse authorization or setup; validate sweep-label exclusivity, duplicate YAML keys, full schema, byte-exact append-only history, PR links, separators, and generated sweep configuration
  • allow XXX only on PR additions; canonicalize it in the supported merge helper and reject it on push to main
  • stop link-only correction PRs after the CPU gate via has-additions=false
  • preserve current main changelog bytes during conflict resolution, always push a fresh synchronization SHA, wait for check-changelog on that SHA, and verify the PR head again before admin merge
  • validate reusable fixed-sequence, agentic point/raw/aggregate, and eval raw/aggregate identities exactly, including duplicate and unexpected rows and expired artifacts
  • harden recovery workflow validation for exact confirmation gating and workflow/job-level read-only permissions
  • correct the two stale XXX links and the incorrect repository link in perf-changelog.yaml
  • make no repository-rules changes

Validation


Note

Medium Risk
Changes merge-time ingest gating and artifact validation for official benchmark database publishing; mistakes could block merges or allow incorrect reuse, though extensive tests and fail-closed checks mitigate that.

Overview
Adds a CPU-only check-changelog job ahead of reuse authorization and sweep setup in run-sweep.yml, replacing the old trailing-newline check and moving sweep-label conflict detection into that job. Validation enforces append-only history, schema/duplicate-key rules, PR link policy (XXX only on PR additions), generated sweep config, and gates downstream work with has-additions so link-only fixes stop before GPU setup.

Introduces validate_perf_changelog.py, prepare_perf_changelog_merge.py, and recover_failed_ingest.py, plus a /recover-failed-ingest command and test-changelog-gate workflow with broad pytest coverage (including parsing real run-sweep.yml if conditions). merge_with_reuse.sh now uses the merge helpers, always pushes a sync SHA, waits for check-changelog, and re-checks PR head before admin merge.

Reuse validation tightens: find_reusable_sweep_run.py accepts agentic point artifacts and ignores expired artifacts; validate_reusable_sweep_artifacts.py requires exact fixed-sequence, agentic (point/raw/aggregate), and eval identities and rejects duplicates/unexpected rows. Merge-time artifact filtering keeps agentic_* and drops agentic_aggregated from the allowlist change shown in diff (actually it changed from agentic_aggregated to agentic_*).

perf-changelog.yaml gets three canonical PR link corrections. Docs (README, KLAUD_DEBUG.md) describe the new gate and merge path.

Reviewed by Cursor Bugbot for commit 5b9ce1e. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread .claude/commands/recover-failed-ingest.md Outdated
Comment thread .claude/commands/recover-failed-ingest.md Outdated
Comment thread .claude/commands/recover-failed-ingest.md Outdated
@Oseltamivir Oseltamivir force-pushed the workflow/add-recover-ingest-command branch 2 times, most recently from d736775 to 2a17676 Compare June 18, 2026 02:18
@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

perf-changelog.yaml audit results:

No historical changelog lines were edited: the file is append-only, and non-whitespace deletions are rejected by process_changelog.py. The command now reports these conditions and applies strict formatting/link checks to entries introduced by the target PR.

Comment thread .claude/commands/recover-failed-ingest.md Outdated
@Oseltamivir Oseltamivir force-pushed the workflow/add-recover-ingest-command branch from 2a17676 to ae01bdb Compare June 18, 2026 03:12
@Oseltamivir Oseltamivir changed the title Add failed-ingest recovery command Add failed-ingest recovery command [skip-sweep] Jun 18, 2026
@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

Fixed the three audited changelog links in commit ae01bdb3:

All three PRs are merged. The full file still parses and validates as 495 ChangelogEntry records, every pr-link is now a canonical InferenceX PR URL, and no surrounding whitespace or entry ordering changed. The PR title and commit retain [skip-sweep] because these are historical metadata replacements.

Comment thread .claude/commands/recover-failed-ingest.md
Comment thread .claude/commands/recover-failed-ingest.md Outdated
@Oseltamivir Oseltamivir changed the title Add failed-ingest recovery command [skip-sweep] Add failed-ingest recovery and pre-reuse changelog validation [skip-sweep] Jun 18, 2026
@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

Added commit 3d432bb7 to close the reuse validation gap.

The new order is check-changelog -> reuse-sweep-gate -> setup/GPU jobs. check-changelog always evaluates the final PR changelog diff first. It validates the complete YAML/schema, append-only ordering, changed-line whitespace, PR-link rules, and runs the same process_changelog.py generation used by setup for new entries.

utils/merge_with_reuse.sh now waits up to 15 minutes for check-changelog to pass on the exact pushed conflict-resolution SHA before invoking the admin squash merge. A failed, cancelled, or missing check stops the merge.

This reproduces and rejects both prior failure classes: PR #1767 malformed YAML and PR #1798 historical deletion/invalid base state.

Comment thread utils/merge_with_reuse.sh
Comment thread utils/validate_perf_changelog.py Outdated
Comment thread utils/validate_perf_changelog.py Outdated
@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

Added push-triggered changelog gate coverage in commit a77ab9dc.

@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

Fail-closed behavior was tested with a temporary intentional failure.

The final workflow content is identical to commit a77ab9dc; only the intentional failure and restoration commits remain in history.

@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

Final verification for e0a422b8da5493cc54c054cd2d3d619dff0fa909:

…s [skip-sweep]

Add utils/test_run_sweep_gating.py: parse the real check-changelog ->
reuse-sweep-gate -> setup `if` conditions from run-sweep.yml and evaluate
them with a minimal GitHub Actions expression engine, so coverage cannot
drift from production.

Covers every gating case: a curated 18-case truth table plus a 3,464-case
exhaustive cross-product (action x draft x label-config x label-name x
validator outcome x has-additions x reuse x skip-sweep) cross-checked against
an independent reference spec. The engine is grounded against the real outcome
of run-sweep run 27737489942. Wire the test into test-changelog-gate.yml (the
prior simulation jobs are kept as a real-Actions smoke test).
Comment thread utils/validate_reusable_sweep_artifacts.py
The gating DAG (check-changelog -> reuse-sweep-gate -> setup) is now covered
exhaustively by utils/test_run_sweep_gating.py against the real run-sweep.yml
conditions, so the hand-copied simulation jobs (test-reuse-sweep-gate,
test-setup, test-metadata-only, test-metadata-setup, verify-ordering) and the
stub has-additions output are redundant. Reduce test-changelog-gate.yml to a
lean pytest workflow matching test-matrix-logic.yml / test-process-result.yml,
keeping the unit tests, historical push-validation pins, and live-PR check.
Opening or reopening a PR that already carries a sweep label triggered a full
benchmark sweep via run-sweep.yml. Restrict pull_request triggers to
ready_for_review, synchronize, labeled, and unlabeled so a sweep starts only on
an explicit push or (re)label. Update the gating tests to assert opened/reopened
are excluded and drop those cases from the exhaustive cross-product.
…_tests [skip-sweep]

Relocate the four test files added by this PR (test_validate_perf_changelog,
test_prepare_perf_changelog_merge, test_recover_failed_ingest,
test_run_sweep_gating) into utils/changelog_gate_tests/. A conftest.py prepends
utils/ to sys.path so the top-level module imports still resolve under pytest's
prepend import mode, and the repo-root lookups move from parents[1] to
parents[2]. Source modules stay in utils/, so no production invocation paths
change; only test-changelog-gate.yml's pytest targets and path filter are
updated. The two pre-existing reuse tests remain colocated in utils/.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0452e5e. Configure here.

Comment thread utils/prepare_perf_changelog_merge.py
Add a regression test for resolve_conflict_bytes with more than one appended
PR contribution, asserting entries are separated by exactly one blank line with
a single trailing newline. This was previously untested; the behavior is
correct (each block ends in a newline, so the single-newline join yields one
blank line, and resolve_conflict_bytes self-validates via validate_raw_change).
@Oseltamivir Oseltamivir merged commit 862fcad into main Jun 18, 2026
31 checks passed
@Oseltamivir Oseltamivir deleted the workflow/add-recover-ingest-command branch June 18, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant