Skip to content

Commit 715e0e9

Browse files
committed
fix: address all 23 confirmed review findings (security, bugs, hardening)
Security - CRITICAL: gate the /codeboarding (issue_comment) path on author_association (OWNER/MEMBER/COLLABORATOR) before any checkout/analysis — closes the pwn-request/secret-exfiltration hole; README example + security note updated. - Pass base_ref/header/cta_base via env (not interpolated into bash); extract engine python -c into scripts/cb_engine.py (no ${{ }}->python-source interp); preflight prints only the error message, not the whole auth body. Bugs (diff_to_mermaid.py) - Escape Mermaid shape metacharacters (] [ ( ) { } | < > &) — a bare ] no longer breaks GitHub rendering. - n_changed is now recursive + a `changed` flag covers relation/nested-only changes; the action shows the diagram on `rendered` instead of suppressing it. - Size guard counts drawn edges and RETRIES changed-only before giving up. - _filter_changed ignores empty id/name strings. - Deleted parent keeps its subtree (renders as a deleted subgraph, symmetric with added). action.yml hardening - Guard: set -uo pipefail + graceful gh-api failure + assert base/head SHAs. - Generate-base: worktree remove/prune (not just rm -rf); cache the generated base keyed by base SHA (pay the cold full-analysis once). - Seed static_analysis.sha with the pkl (enables the committed-baseline warm-start that was previously unreachable). - Health step writes 0 first (no stale count on exception). - Post a failure comment (if: failure()) so a failed run isn't silent. Workflows - release-major-tag: anchor the semver regex ($) so prereleases don't move vN. - test-self + README example: concurrency cancel-in-progress. Tests: +cb_engine smoke tests, escaping/changed-flag cases (31 total, all green).
1 parent 6e183ea commit 715e0e9

16 files changed

Lines changed: 613 additions & 229 deletions

.github/workflows/release-major-tag.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ jobs:
2929
run: |
3030
set -euo pipefail
3131
ver="${TAG#v}"
32-
if ! printf '%s' "$ver" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
33-
echo "::notice::Tag '$TAG' is not vMAJOR.MINOR.PATCH semver; skipping major-tag move."
32+
if ! printf '%s' "$ver" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+$'; then
33+
echo "::notice::Tag '$TAG' is not a clean vMAJOR.MINOR.PATCH release (prerelease/suffix); skipping major-tag move."
3434
exit 0
3535
fi
3636
major="v${ver%%.*}"

.github/workflows/test-self.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ on:
1010
permissions:
1111
pull-requests: write
1212

13+
concurrency:
14+
group: self-test-${{ github.event.pull_request.number }}
15+
cancel-in-progress: true
16+
1317
jobs:
1418
diagram:
1519
runs-on: ubuntu-latest

README.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,22 +31,32 @@ on:
3131
permissions:
3232
pull-requests: write # the only permission needed — nothing is pushed
3333

34+
# Cancel a superseded run when new commits land on the same PR (avoid stacking
35+
# multi-minute LLM jobs).
36+
concurrency:
37+
group: codeboarding-${{ github.event.pull_request.number || github.event.issue.number }}
38+
cancel-in-progress: true
39+
3440
jobs:
3541
diagram:
3642
runs-on: ubuntu-latest
37-
# Run on (non-draft) PR events, OR when someone comments "/codeboarding" on a PR.
38-
# The if-gate is important: without it a runner spins up for every comment.
43+
# Run on (non-draft) PR events, OR when a TRUSTED collaborator comments exactly
44+
# "/codeboarding" on a PR. The if-gate matters: (1) without it a runner spins up
45+
# for every comment; (2) the author_association check is a SECURITY gate — see below.
3946
if: >
4047
(github.event_name == 'pull_request' && github.event.pull_request.draft == false) ||
4148
(github.event_name == 'issue_comment' && github.event.issue.pull_request != null &&
42-
startsWith(github.event.comment.body, '/codeboarding'))
49+
(github.event.comment.body == '/codeboarding' || startsWith(github.event.comment.body, '/codeboarding ')) &&
50+
contains(fromJSON('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association))
4351
timeout-minutes: 60
4452
steps:
4553
- uses: codeboarding/codeboarding-action@v1
4654
with:
4755
llm_api_key: ${{ secrets.OPENROUTER_API_KEY }}
4856
```
4957
58+
> ⚠️ **Security — the `author_association` gate is required.** `issue_comment` workflows run from your default branch **with full repository secrets, for any commenter**. Without the `OWNER`/`MEMBER`/`COLLABORATOR` check, anyone could comment `/codeboarding` on a fork PR and have the action check out and run the engine over their PR-head code with your `OPENROUTER_API_KEY` present (a "pwn request"). The action's guard enforces this too, but gate it at the workflow level so a runner never even starts for an untrusted commenter.
59+
5060
You need **one secret**: an LLM API key. OpenRouter is the default; pass your own model via the `agent_model` / `parsing_model` inputs if you prefer.
5161

5262
### On-demand: the `/codeboarding` command

action.yml

Lines changed: 117 additions & 111 deletions
Large diffs are not rendered by default.
4.55 KB
Binary file not shown.
4.56 KB
Binary file not shown.
18 KB
Binary file not shown.

scripts/build_cta.py

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -47,26 +47,34 @@ def detect_editors(repo_path: Path) -> list[str]:
4747

4848

4949
def build_cta(cta_base: str, owner: str, repo: str, pr: str, repo_path: Path, issues: int = 0) -> str:
50-
"""Return the markdown CTA footer, or '' when ``cta_base`` is unset."""
51-
if not cta_base:
52-
return ""
53-
base = cta_base.rstrip("/")
54-
55-
def link(path: str, **extra: str) -> str:
56-
return f"{base}/{path}?" + urlencode({"owner": owner, "repo": repo, "pr": pr, **extra})
57-
58-
editor_links = " · ".join(
59-
f"[**Open in {_EDITOR_LABEL[e]} →**]({link('open-in-editor', editor=e)})" for e in detect_editors(repo_path)
60-
)
50+
"""Return the markdown CTA footer (the warning banner shows even without a proxy URL).
6151
62-
lines = ["", "---"]
52+
The ⚠️ health banner is informational and needs no proxy, so it renders
53+
whenever ``issues > 0``; the editor/marketplace links require ``cta_base``.
54+
Returns '' only when there's nothing to show.
55+
"""
56+
parts: list[str] = []
6357
if issues > 0:
6458
noun = "issue" if issues == 1 else "issues"
65-
lines += [f"⚠️ **{issues} architecture {noun} found** — open CodeBoarding to explore them.", ""]
59+
parts.append(f"⚠️ **{issues} architecture {noun} found** — open CodeBoarding to explore them.")
6660

67-
lines += [f"🧭 See this architecture in your editor: {editor_links}", ""]
61+
if cta_base:
62+
base = cta_base.rstrip("/")
6863

69-
lines += [f"💡 New to CodeBoarding? [**Get the extension →**]({link('use-marketplace')})"]
64+
def link(path: str, **extra: str) -> str:
65+
return f"{base}/{path}?" + urlencode({"owner": owner, "repo": repo, "pr": pr, **extra})
66+
67+
editor_links = " · ".join(
68+
f"[**Open in {_EDITOR_LABEL[e]} →**]({link('open-in-editor', editor=e)})" for e in detect_editors(repo_path)
69+
)
70+
parts.append(f"🧭 See this architecture in your editor: {editor_links}")
71+
parts.append(f"💡 New to CodeBoarding? [**Get the extension →**]({link('use-marketplace')})")
72+
73+
if not parts:
74+
return ""
75+
lines = ["", "---"]
76+
for p in parts:
77+
lines += ["", p]
7078
return "\n".join(lines)
7179

7280

scripts/cb_engine.py

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
"""Engine orchestration for the action — extracted from inline ``python -c`` blocks
2+
in action.yml so it is checked in, reviewable, and unit-testable.
3+
4+
Subcommands (all paths/refs come in as argv, never interpolated into source):
5+
6+
base --repo P --out D --name N --run-id ID --depth K --source-sha SHA
7+
head --repo P --out D --name N --run-id ID --depth K --base-ref B --target-ref T --source-sha SHA
8+
health --artifact-dir D --repo P --name N --issues-out FILE
9+
10+
``base`` runs a full analysis; ``head`` runs incremental, falling back to full on
11+
``IncrementalCacheMissingError``/``BaselineUnavailableError``; ``health`` writes the
12+
WARNING/CRITICAL finding count to ``--issues-out`` (and never fails the run).
13+
14+
The engine (``codeboarding_workflows`` etc.) is imported lazily inside each
15+
function so this module imports without the engine venv present — the tests stub
16+
those modules and assert we call the engine with the right arguments.
17+
"""
18+
19+
from __future__ import annotations
20+
21+
import argparse
22+
from pathlib import Path
23+
24+
_BASE_LOG = "/tmp/cb-base.log"
25+
_HEAD_LOG = "/tmp/cb-head.log"
26+
27+
28+
def run_base(repo: str, out: str, name: str, run_id: str, depth: int, source_sha: str) -> None:
29+
from codeboarding_workflows.analysis import run_full
30+
31+
res = run_full(
32+
repo_name=name,
33+
repo_path=Path(repo),
34+
output_dir=Path(out),
35+
run_id=run_id,
36+
log_path=_BASE_LOG,
37+
depth_level=int(depth),
38+
source_sha=source_sha,
39+
)
40+
print(f"Base analysis written: {res}")
41+
42+
43+
def run_head(repo: str, out: str, name: str, run_id: str, depth: int, base_ref: str, target_ref: str, source_sha: str) -> None:
44+
from codeboarding_workflows.analysis import BaselineUnavailableError, run_full, run_incremental
45+
from diagram_analysis.exceptions import IncrementalCacheMissingError
46+
47+
try:
48+
res = run_incremental(
49+
repo_path=Path(repo),
50+
output_dir=Path(out),
51+
project_name=name,
52+
run_id=run_id,
53+
log_path=_HEAD_LOG,
54+
base_ref=base_ref,
55+
target_ref=target_ref,
56+
source_sha=source_sha,
57+
)
58+
except (IncrementalCacheMissingError, BaselineUnavailableError) as exc:
59+
print(f"Incremental unavailable ({exc}); running full analysis on head.")
60+
for p in Path(out).glob("*"):
61+
if p.is_file():
62+
p.unlink()
63+
res = run_full(
64+
repo_name=name,
65+
repo_path=Path(repo),
66+
output_dir=Path(out),
67+
run_id=run_id,
68+
log_path=_HEAD_LOG,
69+
depth_level=int(depth),
70+
source_sha=source_sha,
71+
)
72+
print(f"Head analysis written: {res}")
73+
74+
75+
def run_health(artifact_dir: str, repo: str, name: str) -> int:
76+
"""Return the WARNING/CRITICAL finding count; 0 on any failure (best-effort)."""
77+
try:
78+
from health.models import Severity
79+
from health.runner import run_health_checks
80+
from static_analyzer.analysis_cache import StaticAnalysisCache
81+
except Exception as exc: # engine without the health module
82+
print(f"Health check skipped ({exc}).")
83+
return 0
84+
try:
85+
cache = StaticAnalysisCache(artifact_dir=Path(artifact_dir), repo_root=Path(repo))
86+
sa = cache.get()
87+
issues = 0
88+
if sa is not None:
89+
report = run_health_checks(sa, repo_name=name, repo_path=Path(repo))
90+
if report is not None:
91+
for cs in report.check_summaries:
92+
for fg in getattr(cs, "finding_groups", []):
93+
if getattr(fg, "severity", None) in (Severity.WARNING, Severity.CRITICAL):
94+
issues += len(fg.entities)
95+
print(f"Architecture issues found: {issues}")
96+
return issues
97+
except Exception as exc:
98+
print(f"Health check skipped ({exc}).")
99+
return 0
100+
101+
102+
def main(argv=None) -> int:
103+
p = argparse.ArgumentParser(description=__doc__)
104+
sub = p.add_subparsers(dest="cmd", required=True)
105+
106+
b = sub.add_parser("base")
107+
for a in ("--repo", "--out", "--name", "--run-id", "--depth", "--source-sha"):
108+
b.add_argument(a, required=True)
109+
110+
h = sub.add_parser("head")
111+
for a in ("--repo", "--out", "--name", "--run-id", "--depth", "--base-ref", "--target-ref", "--source-sha"):
112+
h.add_argument(a, required=True)
113+
114+
hc = sub.add_parser("health")
115+
for a in ("--artifact-dir", "--repo", "--name", "--issues-out"):
116+
hc.add_argument(a, required=True)
117+
118+
args = p.parse_args(argv)
119+
if args.cmd == "base":
120+
run_base(args.repo, args.out, args.name, args.run_id, args.depth, args.source_sha)
121+
elif args.cmd == "head":
122+
run_head(args.repo, args.out, args.name, args.run_id, args.depth, args.base_ref, args.target_ref, args.source_sha)
123+
elif args.cmd == "health":
124+
Path(args.issues_out).write_text(str(run_health(args.artifact_dir, args.repo, args.name)))
125+
return 0
126+
127+
128+
if __name__ == "__main__":
129+
raise SystemExit(main())

0 commit comments

Comments
 (0)