Skip to content

Commit b2003ba

Browse files
Dumbrisclaude
andauthored
feat(roadmap): add gen-roadmap --check-github ground-truth validation (#800)
The 2026-07-02 audit found roadmap.yaml chronically drifts from reality: tasks stayed `todo`/`in_review` while their PRs merged, an epic claimed `in_review` with no PR anywhere, and spec: links can point at nonexistent dirs producing false progress badges. `--check` only validates ROADMAP.md freshness against roadmap.yaml — nothing validated roadmap.yaml itself. Add a `--check-github` mode that cross-checks roadmap.yaml against ground truth: - PR status: MERGED but not done → ERROR; CLOSED-unmerged but in_progress/ in_review → ERROR; OPEN but done → ERROR; OPEN but todo → WARN; dangling ref → ERROR. Handles "#786" and full /pull/ URLs and lists; caches per PR. - spec: links must resolve to a real specs/ dir (ERROR); a spec shared by two distinct epics → WARN (badge double-count). - status sanity: in_review with no pr: → WARN (any item); in_progress with no pr: and no children → WARN (leaf only, umbrella epics delegate PRs); done epic with a non-done child → WARN. - exit 0 (no errors) / 1 (any error, or --strict with warnings) / 2 (gh missing or unauthenticated — offline spec/status checks still run). `--check` is untouched. Wire a non-blocking (continue-on-error) advisory step into .github/workflows/roadmap.yml. Docs updated in the generator template and roadmap.yaml header; ROADMAP.md regenerated. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent 12cd2bb commit b2003ba

4 files changed

Lines changed: 272 additions & 5 deletions

File tree

.github/workflows/roadmap.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424

2525
permissions:
2626
contents: read
27+
pull-requests: read
2728

2829
jobs:
2930
roadmap-check:
@@ -39,3 +40,12 @@ jobs:
3940
run: pip install pyyaml
4041
- name: Verify ROADMAP.md is up to date
4142
run: python3 scripts/gen-roadmap.py --check
43+
# Non-blocking: report roadmap.yaml drift vs live GitHub PR state, dangling
44+
# spec: links, and status sanity. `gh` is preinstalled on GitHub runners;
45+
# GH_TOKEN authenticates it. continue-on-error so drift is surfaced in the
46+
# log without failing the build (roadmap.yaml is hand-maintained).
47+
- name: Cross-check roadmap.yaml vs GitHub (advisory)
48+
continue-on-error: true
49+
env:
50+
GH_TOKEN: ${{ github.token }}
51+
run: python3 scripts/gen-roadmap.py --check-github

ROADMAP.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,10 @@ The roadmap models cross-spec **epics → tasks** with a dependency DAG, executi
1313
```bash
1414
python3 scripts/gen-roadmap.py # writes ROADMAP.md
1515
scripts/gen-roadmap # convenience wrapper (same thing)
16-
python3 scripts/gen-roadmap.py --check # CI canary: fail if stale
16+
python3 scripts/gen-roadmap.py --check # CI canary: fail if ROADMAP.md is stale
17+
python3 scripts/gen-roadmap.py --check-github # cross-check statuses vs live GitHub PR state,
18+
# spec links, and status sanity (add --strict
19+
# to fail on warnings; needs an authenticated gh)
1720
```
1821

1922
## roadmap.yaml schema (short form)

roadmap.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,11 @@
1212
# python3 scripts/gen-roadmap.py # writes ROADMAP.md
1313
# # or: scripts/gen-roadmap # same thing (wrapper)
1414
#
15+
# Validate this file against ground truth (does not write ROADMAP.md):
16+
# python3 scripts/gen-roadmap.py --check-github # PR state vs status,
17+
# # dangling spec: links, status sanity. Add --strict to fail on
18+
# # warnings. Needs an authenticated `gh`; exits 2 if gh is missing.
19+
#
1520
# ── Schema ──────────────────────────────────────────────────────────────────
1621
# version: schema version (int).
1722
# epics: list of epic objects. Each epic:

scripts/gen-roadmap.py

Lines changed: 253 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,28 @@
1717
is fully generated and safe to overwrite on every run.
1818
1919
Usage:
20-
python3 scripts/gen-roadmap.py [--check]
20+
python3 scripts/gen-roadmap.py [--check | --check-github [--strict]]
2121
22-
--check Exit non-zero if ROADMAP.md is out of date (does not write).
23-
Useful as a CI canary.
22+
--check Exit non-zero if ROADMAP.md is out of date (does not write).
23+
Useful as a CI canary.
24+
--check-github Cross-check roadmap.yaml against ground truth (does not write):
25+
live GitHub PR state (via `gh`), spec: links resolving to
26+
real specs/ dirs, and status sanity. Reports ERROR/WARN and
27+
exits 1 on any ERROR, 0 otherwise; 2 if `gh` is unavailable.
28+
--strict With --check-github, promote warnings to errors for the exit
29+
code (report is unchanged).
2430
2531
Pure stdlib + PyYAML (already used by scripts/check-settings-parity.py).
2632
Idempotent: running twice with no source change produces identical output.
2733
"""
2834
from __future__ import annotations
2935

3036
import argparse
37+
import json
3138
import os
3239
import re
40+
import shutil
41+
import subprocess
3342
import sys
3443

3544
try:
@@ -43,6 +52,11 @@
4352
ROADMAP_MD = os.path.join(REPO_ROOT, "ROADMAP.md")
4453
SPECS_DIR = os.path.join(REPO_ROOT, "specs")
4554

55+
# GitHub repo `gh` queries target for --check-github.
56+
REPO_SLUG = "smart-mcp-proxy/mcpproxy-go"
57+
# A PR ref inside a pr: field, either "#786" or ".../pull/786".
58+
PR_NUM_RE = re.compile(r"(?:#|/pull/)(\d+)")
59+
4660
# A checkbox line: "- [ ] ...", "- [x] ...", "- [X] ..." (matches specs/README.md).
4761
CHECKBOX_RE = re.compile(r"^- \[([ xX])\]")
4862

@@ -244,7 +258,10 @@ def render(data: dict) -> str:
244258
out.append("```bash")
245259
out.append("python3 scripts/gen-roadmap.py # writes ROADMAP.md")
246260
out.append("scripts/gen-roadmap # convenience wrapper (same thing)")
247-
out.append("python3 scripts/gen-roadmap.py --check # CI canary: fail if stale")
261+
out.append("python3 scripts/gen-roadmap.py --check # CI canary: fail if ROADMAP.md is stale")
262+
out.append("python3 scripts/gen-roadmap.py --check-github # cross-check statuses vs live GitHub PR state,")
263+
out.append(" # spec links, and status sanity (add --strict")
264+
out.append(" # to fail on warnings; needs an authenticated gh)")
248265
out.append("```")
249266
out.append("")
250267
out.append("## roadmap.yaml schema (short form)")
@@ -281,15 +298,247 @@ def render(data: dict) -> str:
281298
return "\n".join(out)
282299

283300

301+
# ── GitHub / ground-truth cross-check (--check-github) ──────────────────────
302+
class Finding:
303+
"""One report line: an ERROR or WARN against a roadmap item."""
304+
__slots__ = ("level", "ref", "reason")
305+
306+
def __init__(self, level: str, ref: str, reason: str):
307+
self.level = level # "ERROR" | "WARN"
308+
self.ref = ref
309+
self.reason = reason
310+
311+
312+
def iter_items(data: dict):
313+
"""Yield metadata for every epic and task, in file order.
314+
315+
Each dict: item (raw), id, kind ('epic'|'task'), epic_id (owning epic),
316+
has_children (bool). A task's owning epic id lets us attribute a task's
317+
spec: link back to its epic for double-count detection.
318+
"""
319+
for epic in data.get("epics", []):
320+
children = epic.get("tasks") or []
321+
yield {"item": epic, "id": epic["id"], "kind": "epic",
322+
"epic_id": epic["id"], "has_children": bool(children)}
323+
for t in children:
324+
yield {"item": t, "id": t["id"], "kind": "task",
325+
"epic_id": epic["id"], "has_children": False}
326+
327+
328+
def ref_label(meta: dict) -> str:
329+
if meta["kind"] == "epic":
330+
return f"{meta['id']} (epic)"
331+
return f"{meta['id']} (task · epic {meta['epic_id']})"
332+
333+
334+
def parse_pr_refs(pr) -> list[int]:
335+
"""Extract PR numbers from a pr: field ("#786", full URL, or a list)."""
336+
if not pr:
337+
return []
338+
refs = pr if isinstance(pr, list) else [pr]
339+
nums: list[int] = []
340+
for r in refs:
341+
for m in PR_NUM_RE.finditer(str(r)):
342+
n = int(m.group(1))
343+
if n not in nums:
344+
nums.append(n)
345+
return nums
346+
347+
348+
def gh_available() -> tuple[bool, str]:
349+
"""(ok, reason). ok=False means skip the live PR cross-check (exit 2)."""
350+
if not shutil.which("gh"):
351+
return False, "`gh` CLI not found on PATH"
352+
try:
353+
r = subprocess.run(["gh", "auth", "status"],
354+
capture_output=True, text=True)
355+
except OSError as e: # pragma: no cover
356+
return False, f"could not execute `gh`: {e}"
357+
if r.returncode != 0:
358+
return False, "`gh` is not authenticated (`gh auth status` failed)"
359+
return True, ""
360+
361+
362+
def gh_pr_state(number: int, repo: str, cache: dict) -> dict:
363+
"""Return {'state','mergedAt'} for a PR, or {'error': msg}. Cached per number."""
364+
if number in cache:
365+
return cache[number]
366+
r = subprocess.run(
367+
["gh", "pr", "view", str(number), "--repo", repo,
368+
"--json", "state,mergedAt"],
369+
capture_output=True, text=True)
370+
if r.returncode != 0:
371+
cache[number] = {"error": (r.stderr.strip().splitlines() or ["not found"])[-1]}
372+
else:
373+
try:
374+
cache[number] = json.loads(r.stdout)
375+
except json.JSONDecodeError:
376+
cache[number] = {"error": "unparseable `gh` JSON output"}
377+
return cache[number]
378+
379+
380+
def check_pr_status(items: list[dict], repo: str, cache: dict) -> list[Finding]:
381+
"""Cross-check every pr: link against live GitHub state.
382+
383+
MERGED but not done → ERROR; CLOSED (unmerged) but in_progress/in_review →
384+
ERROR; OPEN but done → ERROR; OPEN but todo → WARN; unresolvable ref → ERROR.
385+
"""
386+
out: list[Finding] = []
387+
for meta in items:
388+
status = meta["item"].get("status", "todo")
389+
for num in parse_pr_refs(meta["item"].get("pr")):
390+
st = gh_pr_state(num, repo, cache)
391+
if "error" in st:
392+
out.append(Finding("ERROR", ref_label(meta),
393+
f"PR #{num} could not be resolved on GitHub "
394+
f"({st['error']}) — dangling pr: link."))
395+
continue
396+
state = st.get("state") # OPEN | CLOSED | MERGED
397+
if state == "MERGED":
398+
if status != "done":
399+
out.append(Finding("ERROR", ref_label(meta),
400+
f"PR #{num} is MERGED but status is '{status}' "
401+
f"(expected 'done')."))
402+
elif state == "CLOSED":
403+
if status in ("in_progress", "in_review"):
404+
out.append(Finding("ERROR", ref_label(meta),
405+
f"PR #{num} is CLOSED (unmerged) but status is "
406+
f"'{status}'."))
407+
elif state == "OPEN":
408+
if status == "done":
409+
out.append(Finding("ERROR", ref_label(meta),
410+
f"PR #{num} is OPEN but status is 'done'."))
411+
elif status == "todo":
412+
out.append(Finding("WARN", ref_label(meta),
413+
f"PR #{num} is OPEN (work started) but status is "
414+
f"still 'todo'."))
415+
return out
416+
417+
418+
def check_spec_links(items: list[dict]) -> list[Finding]:
419+
"""Every spec: must resolve to a real specs/<NNN> dir (ERROR if not).
420+
A spec shared by two different epics double-counts its badge (WARN)."""
421+
out: list[Finding] = []
422+
spec_to_epics: dict[str, set] = {}
423+
for meta in items:
424+
spec = meta["item"].get("spec")
425+
if not spec:
426+
continue
427+
if not os.path.isdir(os.path.join(REPO_ROOT, spec)):
428+
out.append(Finding("ERROR", ref_label(meta),
429+
f"spec: '{spec}' does not resolve to a directory under specs/."))
430+
# Attribute to the owning epic so an epic sharing a spec with its OWN
431+
# child task is not flagged — only genuinely distinct epics are.
432+
spec_to_epics.setdefault(spec, set()).add(meta["epic_id"])
433+
for spec, epics in sorted(spec_to_epics.items()):
434+
if len(epics) > 1:
435+
out.append(Finding("WARN", f"spec {spec}",
436+
f"shared by {len(epics)} distinct epics "
437+
f"({', '.join(sorted(epics))}) — the Epics-table progress "
438+
f"badge double-counts this spec."))
439+
return out
440+
441+
442+
def check_status_sanity(items: list[dict]) -> list[Finding]:
443+
"""Reviews/in-flight work should have PR evidence; done epics should have
444+
all children done.
445+
446+
in_review with no pr: → WARN for any item (an in-review claim with no PR
447+
anywhere is exactly the drift this audit found). in_progress with no pr: →
448+
WARN only for leaf items, since an umbrella epic legitimately delegates its
449+
PRs to child tasks.
450+
"""
451+
out: list[Finding] = []
452+
for meta in items:
453+
item = meta["item"]
454+
status = item.get("status", "todo")
455+
has_pr = bool(parse_pr_refs(item.get("pr")))
456+
if not has_pr:
457+
if status == "in_review":
458+
out.append(Finding("WARN", ref_label(meta),
459+
"status 'in_review' but no pr: link — an in-review item "
460+
"should link its PR as evidence."))
461+
elif status == "in_progress" and not meta["has_children"]:
462+
out.append(Finding("WARN", ref_label(meta),
463+
"status 'in_progress' but no pr: link and no child tasks "
464+
"— nothing links the in-flight work."))
465+
if meta["kind"] == "epic" and status == "done":
466+
for t in item.get("tasks") or []:
467+
if t.get("status") != "done":
468+
out.append(Finding("WARN", ref_label(meta),
469+
f"epic is 'done' but child task '{t['id']}' is "
470+
f"'{t.get('status', 'todo')}'."))
471+
return out
472+
473+
474+
def print_report(findings: list[Finding], strict: bool) -> int:
475+
errors = [f for f in findings if f.level == "ERROR"]
476+
warnings = [f for f in findings if f.level == "WARN"]
477+
478+
print(f"roadmap.yaml ground-truth cross-check (repo {REPO_SLUG})\n")
479+
480+
def emit(group: list[Finding], head: str):
481+
print(f"{head} ({len(group)}):")
482+
if not group:
483+
print(" none")
484+
for f in group:
485+
print(f" [{f.level:<5}] {f.ref}")
486+
print(f" {f.reason}")
487+
print()
488+
489+
emit(errors, "ERRORS")
490+
emit(warnings, "WARNINGS")
491+
492+
print(f"Summary: {len(errors)} error(s), {len(warnings)} warning(s).")
493+
if strict and warnings:
494+
print("(--strict: warnings count as errors for the exit code.)")
495+
if errors or (strict and warnings):
496+
return 1
497+
return 0
498+
499+
500+
def check_github(data: dict, strict: bool) -> int:
501+
items = list(iter_items(data))
502+
findings: list[Finding] = []
503+
504+
ok, reason = gh_available()
505+
if ok:
506+
cache: dict = {}
507+
findings += check_pr_status(items, REPO_SLUG, cache)
508+
509+
# spec + status checks are offline and always run.
510+
findings += check_spec_links(items)
511+
findings += check_status_sanity(items)
512+
513+
if not ok:
514+
print_report(findings, strict)
515+
sys.stderr.write(
516+
f"\nerror: PR cross-check skipped — {reason}. "
517+
"Install and authenticate `gh` (`gh auth login`) to validate PR "
518+
"state; offline spec/status checks above still ran.\n")
519+
return 2
520+
521+
return print_report(findings, strict)
522+
523+
284524
def main() -> int:
285525
ap = argparse.ArgumentParser(description="Generate ROADMAP.md from roadmap.yaml")
286526
ap.add_argument("--check", action="store_true",
287527
help="exit non-zero if ROADMAP.md is stale (do not write)")
528+
ap.add_argument("--check-github", action="store_true",
529+
help="cross-check roadmap.yaml vs live GitHub PR state, "
530+
"spec links, and status sanity (does not write)")
531+
ap.add_argument("--strict", action="store_true",
532+
help="with --check-github, promote warnings to errors "
533+
"for the exit code")
288534
args = ap.parse_args()
289535

290536
with open(ROADMAP_YAML, encoding="utf-8") as fh:
291537
data = yaml.safe_load(fh)
292538

539+
if args.check_github:
540+
return check_github(data, args.strict)
541+
293542
rendered = render(data)
294543

295544
if args.check:

0 commit comments

Comments
 (0)