Add failed-ingest recovery and pre-reuse changelog validation [skip-sweep]#1821
Conversation
d736775 to
2a17676
Compare
|
No historical changelog lines were edited: the file is append-only, and non-whitespace deletions are rejected by |
2a17676 to
ae01bdb
Compare
|
Fixed the three audited changelog links in commit
All three PRs are merged. The full file still parses and validates as 495 |
|
Added commit The new order is
This reproduces and rejects both prior failure classes: PR #1767 malformed YAML and PR #1798 historical deletion/invalid base state. |
|
Added push-triggered changelog gate coverage in commit
|
|
Fail-closed behavior was tested with a temporary intentional failure.
The final workflow content is identical to commit |
|
Final verification for
|
…s [skip-sweep] Add utils/test_run_sweep_gating.py: parse the real check-changelog -> reuse-sweep-gate -> setup `if` conditions from run-sweep.yml and evaluate them with a minimal GitHub Actions expression engine, so coverage cannot drift from production. Covers every gating case: a curated 18-case truth table plus a 3,464-case exhaustive cross-product (action x draft x label-config x label-name x validator outcome x has-additions x reuse x skip-sweep) cross-checked against an independent reference spec. The engine is grounded against the real outcome of run-sweep run 27737489942. Wire the test into test-changelog-gate.yml (the prior simulation jobs are kept as a real-Actions smoke test).
The gating DAG (check-changelog -> reuse-sweep-gate -> setup) is now covered exhaustively by utils/test_run_sweep_gating.py against the real run-sweep.yml conditions, so the hand-copied simulation jobs (test-reuse-sweep-gate, test-setup, test-metadata-only, test-metadata-setup, verify-ordering) and the stub has-additions output are redundant. Reduce test-changelog-gate.yml to a lean pytest workflow matching test-matrix-logic.yml / test-process-result.yml, keeping the unit tests, historical push-validation pins, and live-PR check.
Opening or reopening a PR that already carries a sweep label triggered a full benchmark sweep via run-sweep.yml. Restrict pull_request triggers to ready_for_review, synchronize, labeled, and unlabeled so a sweep starts only on an explicit push or (re)label. Update the gating tests to assert opened/reopened are excluded and drop those cases from the exhaustive cross-product.
…_tests [skip-sweep] Relocate the four test files added by this PR (test_validate_perf_changelog, test_prepare_perf_changelog_merge, test_recover_failed_ingest, test_run_sweep_gating) into utils/changelog_gate_tests/. A conftest.py prepends utils/ to sys.path so the top-level module imports still resolve under pytest's prepend import mode, and the repo-root lookups move from parents[1] to parents[2]. Source modules stay in utils/, so no production invocation paths change; only test-changelog-gate.yml's pytest targets and path filter are updated. The two pre-existing reuse tests remain colocated in utils/.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0452e5e. Configure here.
Add a regression test for resolve_conflict_bytes with more than one appended PR contribution, asserting entries are separated by exactly one blank line with a single trailing newline. This was previously untested; the behavior is correct (each block ends in a newline, so the single-newline join yields one blank line, and resolve_conflict_bytes self-validates via validate_raw_change).

Summary
/recover-failed-ingest <failed-run-or-job-url> [source-run-id]as a thin command over tested recovery helperscheck-changelogbefore reuse authorization or setup; validate sweep-label exclusivity, duplicate YAML keys, full schema, byte-exact append-only history, PR links, separators, and generated sweep configurationXXXonly on PR additions; canonicalize it in the supported merge helper and reject it on push tomainhas-additions=falsemainchangelog bytes during conflict resolution, always push a fresh synchronization SHA, wait forcheck-changelogon that SHA, and verify the PR head again before admin mergeXXXlinks and the incorrect repository link inperf-changelog.yamlValidation
73focused validator/reuse/recovery tests passed156matrix-generation/schema tests passedutils/suite:290passed; one pre-existing isolated failure remains intest_processor_loads_traces_jsonl_for_theoretical_cache, outside this diffruff,actionlint,shellcheck,bash -n,py_compile, andgit diff --checkpassed20fixed-sequence rows and2eval jobs27718128616validates as4exact point/raw/aggregate identities27735927631proved fail-closed ordering; restored runs passedNote
Medium Risk
Changes merge-time ingest gating and artifact validation for official benchmark database publishing; mistakes could block merges or allow incorrect reuse, though extensive tests and fail-closed checks mitigate that.
Overview
Adds a CPU-only
check-changelogjob ahead of reuse authorization and sweep setup inrun-sweep.yml, replacing the old trailing-newline check and moving sweep-label conflict detection into that job. Validation enforces append-only history, schema/duplicate-key rules, PR link policy (XXXonly on PR additions), generated sweep config, and gates downstream work withhas-additionsso link-only fixes stop before GPU setup.Introduces
validate_perf_changelog.py,prepare_perf_changelog_merge.py, andrecover_failed_ingest.py, plus a/recover-failed-ingestcommand andtest-changelog-gateworkflow with broad pytest coverage (including parsing realrun-sweep.ymlifconditions).merge_with_reuse.shnow uses the merge helpers, always pushes a sync SHA, waits forcheck-changelog, and re-checks PR head before admin merge.Reuse validation tightens:
find_reusable_sweep_run.pyaccepts agentic point artifacts and ignores expired artifacts;validate_reusable_sweep_artifacts.pyrequires exact fixed-sequence, agentic (point/raw/aggregate), and eval identities and rejects duplicates/unexpected rows. Merge-time artifact filtering keepsagentic_*and dropsagentic_aggregatedfrom the allowlist change shown in diff (actually it changed from agentic_aggregated to agentic_*).perf-changelog.yamlgets three canonical PR link corrections. Docs (README,KLAUD_DEBUG.md) describe the new gate and merge path.Reviewed by Cursor Bugbot for commit 5b9ce1e. Bugbot is set up for automated code reviews on this repo. Configure here.