Skip to content

[codex] Recover PR 1798 ingest without rerunning sweep#1819

Merged
Oseltamivir merged 1 commit into
mainfrom
codex/recover-pr-1798-ingest
Jun 18, 2026
Merged

[codex] Recover PR 1798 ingest without rerunning sweep#1819
Oseltamivir merged 1 commit into
mainfrom
codex/recover-pr-1798-ingest

Conversation

@Oseltamivir

@Oseltamivir Oseltamivir commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What changed

Root cause

The push-to-main run failed in setup because PR #1798 corrected the malformed PR #1767 changelog indentation. The changelog processor rejects non-whitespace deletions, so artifact reuse and ingest never ran.

Safety

  • commit and merge message include [skip-sweep]
  • workflow requires an explicit recover-pr-1798 confirmation input
  • failed target run/job, source run attempt, PR commit, workflow path, conclusion, and required artifacts are validated
  • the reconstructed matrix is validated against the downloaded artifacts before dispatch
  • no benchmark matrix or GPU runner is referenced

Validation

  • actionlint .github/workflows/recover-pr-1798-ingest.yml
  • target/source guard script passed locally
  • reusable artifact validator passed: 20 benchmark rows and 2 multinode eval jobs

Note

Medium Risk
Touches production ingest via repository dispatch and uses a one-off changelog/git reconstruction; mitigated by hard-coded run/SHA checks, artifact validation, and no GPU matrix.

Overview
Adds a manually dispatched GitHub Actions workflow (recover-pr-1798-ingest.yml) to finish database ingest for PR #1798 after the post-merge run-sweep push failed in setup, without rerunning benchmarks on GPU.

The workflow mirrors the existing PR #1767 recovery pattern but pins specific failed push run/job IDs and a successful PR sweep run (attempt 2), and requires typing recover-pr-1798 to run. It validates those runs via the GitHub API (event, workflow path, SHAs, conclusions, non-expired results_bmk / eval_results_all / run-stats artifacts).

It reconstructs the sweep config by building a synthetic tree from the merge commit: a perl edit on perf-changelog.yaml undoes the unrelated PR #1767 indentation fix so process_changelog.py does not treat it as a forbidden deletion, then runs the validator and uploads artifacts plus changelog metadata. Finally it dispatches ingest-results to InferenceX-app with the recovery run id.

Reviewed by Cursor Bugbot for commit 9f0a805. Bugbot is set up for automated code reviews on this repo. Configure here.

@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

Recovery validation is complete:

  • failed target: run 27712344914, setup job 81976315082
  • reusable source: run 27622347964, attempt 2, commit ffe21af3
  • reconstructed PR [NV]Add GLM-5 NVFP4 GB300 disagg-non-mtp TRT-LLM benchmarks via Dynamo  #1798 matrix: 20 benchmark rows and 2 multinode eval jobs
  • actionlint and the repository artifact validator both pass
  • recovery workflow is CPU-only and does not reference benchmark/GPU runners

After this lands on main, I will dispatch it with recover-pr-1798 and verify the downstream InferenceX-app ingest.

@Oseltamivir Oseltamivir merged commit 529a500 into main Jun 18, 2026
5 checks passed
@Oseltamivir Oseltamivir deleted the codex/recover-pr-1798-ingest branch June 18, 2026 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant