Skip to content

[CI] ob1-gate.yml workflow: 0 successful runs in 140 attempts since 2026-03-30 — silent dispatch failure (jobs never start) #284

@MicScalise

Description

@MicScalise

TL;DR: .github/workflows/ob1-gate.yml has had 0 successful runs in 140 attempts since 2026-03-30 22:28 (the run immediately following commit c08c3560 "[infra] Recreate OB1 gate workflow"). Every run terminates with conclusion=failure and jobs.length==0 — i.e. GitHub never dispatches the review job. The failure mode is silent (no logs, no annotations), so PRs sit in "automated review pending" indefinitely. The same workflow file passes every public spec validator (actionlint, yamllint, Python yaml strict mode), so the rejection appears to be on GitHub's runtime parser side.

I am filing this because three of my recent PRs (#280, #281, #283) are in this state — through no fault of the diffs themselves — and I noticed the auto-welcome bot promises "The automated review will run shortly". That promise can't be kept while the workflow is stuck.


Symptom

Every run shows the workflow name rendered as the file path rather than the declared name: OB1 PR Gate:

"name": ".github/workflows/ob1-gate.yml",
"badge_url": "https://github.com/NateBJones-Projects/OB1/workflows/.github/workflows/ob1-gate.yml/badge.svg"

When GitHub successfully parses a workflow, it uses the name: field; when it can't, it falls back to the path. Working workflows in this repo (OB1 PR Follow-Ups, Auto-Label PRs, Welcome New Contributors) all show their proper name: and dispatch jobs normally.

The maintainer appears to have noticed and added a workaround in ob1-pr-followups.yml:

on:
  workflow_run:
    workflows:
      - "OB1 PR Gate"
      - ".github/workflows/ob1-gate.yml"   # ← fallback for the path-as-name case

So the workaround keeps pr-followups triggering, but the underlying gate workflow itself never runs.

Evidence I gathered

Run statistics (via gh api repos/NateBJones-Projects/OB1/actions/workflows/254053846/runs):

All 140 runs share the same failure shape:

{
  "conclusion": "failure",
  "status": "completed",
  "run_attempt": 1,
  "jobs": []
}

And gh run view <id> --log-failed reports log not found for every run — confirming no job container ever started.

What I ruled out

Hypothesis Result
YAML syntax error python3 -c "import yaml; yaml.safe_load(open('ob1-gate.yml'))" succeeds
GitHub Actions linter rejection actionlint reports only 2 SECURITY warnings (untrusted pull_request.title in scripts at lines 52 & 627) — no errors
Duplicate keys Custom Python strict loader: none
File too large 30,000 bytes — well under GitHub's 1 MB workflow limit
Encoding issue UTF-8, no BOM, LF line endings
Unsupported Actions features No workflow_call, matrix, outputs, strategy, concurrency, reusable workflow refs
Stale at HEAD vs run sha File at the run's head_sha is identical to file at HEAD
Same workflow/permissions worked elsewhere Other workflows on this repo with pull_request triggers (Auto-Label, Welcome) dispatch normally on the exact same PRs

Hypothesis

GitHub's runtime parser is rejecting the workflow at the dispatch boundary (after creating the workflow_run record but before instantiating any job), with no surfaced error. Possible causes I cannot test from outside:

  1. Workflow registration cache poisoning. Workflow id 254053846 may have stale state from a prior bad commit. ob1-review.yml shows the same path-as-name pattern AND its file no longer exists in main (gh api .../contents/ob1-review.yml returns 404). Both broken workflows share the same registration-cache symptom.
  2. An undocumented strict-mode interaction with bash heredocs (<<EOF for $GITHUB_OUTPUT) or ${{ }} expressions interpolated into multi-line run: blocks.
  3. A GitHub Actions runtime change between when the workflow was first authored and now, that retroactively invalidated something the file uses.

Suggested fixes (in increasing order of effort)

  1. Force re-registration: rename ob1-gate.yml to ob1-gate-v2.yml (same content). If runs against the new path dispatch jobs, the issue is registration-cache poisoning under workflow id 254053846 — same root cause likely affects the orphaned ob1-review.yml registration.
  2. Bisect rewrite: copy ob1-pr-followups.yml's working skeleton (proven to dispatch on this repo), then incrementally re-add the gate's review logic. The first commit that breaks it identifies the offending pattern.
  3. GitHub Support ticket: with this evidence (0/140 runs, identical failure shape, file passes all public validators, registration shows path-as-name) the support team can see the runtime parser's actual rejection reason from inside GitHub.

Reproducer

# verify the failure pattern
gh api "repos/NateBJones-Projects/OB1/actions/workflows/254053846/runs?per_page=1" \
  | jq '.workflow_runs[0] | {conclusion, status, jobs_url, html_url}'

gh api "repos/NateBJones-Projects/OB1/actions/workflows/254053846/runs?per_page=1" \
  | jq -r '.workflow_runs[0].id' \
  | xargs -I{} gh api "repos/NateBJones-Projects/OB1/actions/runs/{}/jobs" \
  | jq '{total_count}'
# expect: {"total_count": 0}

# verify all public validators accept the file
curl -sH "Accept: application/vnd.github.raw" \
  "https://api.github.com/repos/NateBJones-Projects/OB1/contents/.github/workflows/ob1-gate.yml" \
  -o ob1-gate.yml
python3 -c "import yaml; yaml.safe_load(open('ob1-gate.yml'))"  # OK
# install actionlint, then:
actionlint ob1-gate.yml  # only security warnings, no errors

Related

Happy to submit a fix-it PR if the maintainer would like — option 1 (rename) is the lowest-risk first attempt and takes ~5 minutes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions