Skip to content

Commit aaeb90c

Browse files
avrabeclaude
andauthored
feat(ci): Mythos delta-pass auto-runner (single-actor, OAuth-token) (#162)
Automates the human-driven discover protocol that mythos-gate.yml currently enforces by label. On every PR that touches a Tier-5 file, runs anthropics/claude-code-action (SHA-pinned) per touched file with scripts/mythos/discover.md as the prompt and captures a structured `{verdict: NO_FINDINGS | FINDING}` JSON via the action's --json-schema input. Posts a sticky <!-- mythos-auto-gate --> PR comment with per-file results; applies mythos-pass-done on all-pass, fails the job (without the label) on any FINDING. Authorization stack (defense-in-depth, "only avrabe can trigger"): 1. Job-level if: requires both `github.actor == 'avrabe'` AND the immutable `github.actor_id == '10056645'`. Usernames can be reassigned after account deletion; numeric IDs cannot. 2. Trigger is pull_request (not pull_request_target). GitHub's default policy keeps secrets away from fork-repo PRs. 3. claude-code-action pinned by full commit SHA, not the floating v1 tag. Hijacking the tag does not change what we run. 4. Explicit minimal permissions: pull-requests write (sticky comment + label), contents read. 5. concurrency: cancel-in-progress per PR head — no budget burn on rapid push cycles. 6. Detect job path-shape-validates every Tier-5 file (^[a-zA-Z0-9/_.-]+$) before piping into the matrix so a hostile filename cannot inject through ${{ matrix.file }} downstream; matrix.file is read via env: in run blocks, not direct interpolation. Auth flow uses CLAUDE_CODE_OAUTH_TOKEN from avrabe's Max plan; no separate API billing. Token usage draws from the subscription rate limit shared with interactive Claude Code use. Label-only mythos-gate.yml remains source-of-truth — the auto-runner is one way the label gets applied, not the only way. Contributors without OAuth access continue using the honor-system flow per AGENTS.md. Setup (one-time, on maintainer machine): claude update # ensure v1.0.44+ claude setup-token # prints CLAUDE_CODE_OAUTH_TOKEN Then add the token as repo secret CLAUDE_CODE_OAUTH_TOKEN. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 2841325 commit aaeb90c

3 files changed

Lines changed: 359 additions & 0 deletions

File tree

.github/workflows/mythos-auto.yml

Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
name: Mythos delta-pass (auto)
2+
3+
# Runs the Mythos discover protocol automatically on every PR that
4+
# touches a Tier-5 file. Posts findings (or "NO FINDINGS") as a sticky
5+
# PR comment. Applies the `mythos-pass-done` label when all touched
6+
# files report NO FINDINGS, which clears the label-only gate in
7+
# `mythos-gate.yml`.
8+
#
9+
# Auth: uses Claude Code via CLAUDE_CODE_OAUTH_TOKEN (Max-plan OAuth
10+
# token, not a separate API key). Token usage draws from the
11+
# subscription's rate limits — see the budget knobs below.
12+
#
13+
# Authorization (defense-in-depth, "only avrabe can trigger this"):
14+
#
15+
# 1. Job-level `if:` checks both `github.actor == 'avrabe'` AND the
16+
# immutable `github.actor_id == '10056645'`. Usernames can in
17+
# principle be reassigned after account deletion; numeric IDs
18+
# cannot. Both must match.
19+
#
20+
# 2. Trigger is `pull_request` (not `pull_request_target`). Per
21+
# GitHub's default policy, secrets are not exposed to workflow
22+
# runs from forked-repo PRs. Only same-repo branches see the
23+
# OAuth token.
24+
#
25+
# 3. The Claude Code action is pinned by full commit SHA, not the
26+
# `v1` tag. Even if `v1` is moved by an attacker who breaches the
27+
# action repository, this workflow continues to run the
28+
# SHA-pinned version.
29+
#
30+
# 4. Explicit minimal `permissions:` — only `pull-requests: write`
31+
# (for the sticky comment + label) and `contents: read`. The
32+
# OAuth token's powers are scoped by the user's Max plan, not by
33+
# `GITHUB_TOKEN`.
34+
#
35+
# 5. `concurrency: cancel-in-progress` collapses sequential pushes
36+
# to a single live run per PR head, preventing budget burn on
37+
# rapid push cycles.
38+
#
39+
# 6. The detect step path-shape-validates every Tier-5 file before
40+
# passing it into the matrix, so an attacker who manages to add
41+
# a path with shell metacharacters cannot inject through the
42+
# `matrix.file` interpolation downstream. In `run:` blocks we
43+
# always read `matrix.file` via an `env:` variable rather than
44+
# direct `${{ }}` substitution; see
45+
# https://github.blog/security/vulnerability-research/how-to-catch-github-actions-workflow-injections-before-attackers-do/
46+
#
47+
# If you fork this repo and want to run the Mythos auto-gate yourself:
48+
# fork it, change the actor allow-list to your own user, set up your
49+
# own `CLAUDE_CODE_OAUTH_TOKEN`, and remove the avrabe ID. There is no
50+
# shared budget — every fork runs against its owner's token.
51+
52+
on:
53+
pull_request:
54+
branches: [main]
55+
56+
concurrency:
57+
group: mythos-auto-${{ github.head_ref }}
58+
cancel-in-progress: true
59+
60+
permissions:
61+
contents: read
62+
pull-requests: write
63+
64+
jobs:
65+
detect:
66+
name: Detect Tier-5 changes
67+
# Single-actor lock: both username AND immutable user id must
68+
# match. github.actor_id is a string in workflow context, so
69+
# quote the numeric literal.
70+
if: >-
71+
github.actor == 'avrabe' &&
72+
github.actor_id == '10056645'
73+
runs-on: [self-hosted, linux, x64, light]
74+
outputs:
75+
files: ${{ steps.list.outputs.files }}
76+
any: ${{ steps.list.outputs.any }}
77+
steps:
78+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
79+
with:
80+
fetch-depth: 0
81+
- name: List + path-shape-validate Tier-5 files
82+
id: list
83+
env:
84+
BASE_SHA: ${{ github.event.pull_request.base.sha }}
85+
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
86+
run: |
87+
set -euo pipefail
88+
patterns=(
89+
"meld-core/src/parser.rs"
90+
"meld-core/src/merger.rs"
91+
"meld-core/src/resolver.rs"
92+
"meld-core/src/rewriter.rs"
93+
"meld-core/src/component_wrap.rs"
94+
"meld-core/src/p3_async.rs"
95+
"meld-core/src/adapter/"
96+
"meld-core/src/resource_graph.rs"
97+
"meld-core/src/segments.rs"
98+
)
99+
changed=$(git diff --name-only "$BASE_SHA"..."$HEAD_SHA")
100+
touched=()
101+
while IFS= read -r f; do
102+
[ -z "$f" ] && continue
103+
for p in "${patterns[@]}"; do
104+
case "$f" in
105+
$p*)
106+
# Path-shape-validate: only alphanumerics, slash,
107+
# dot, underscore, dash. Anything else (quote,
108+
# backslash, semicolon, dollar, …) is rejected so
109+
# downstream `${{ matrix.file }}` interpolation
110+
# cannot inject shell or markdown.
111+
if [[ "$f" =~ ^[a-zA-Z0-9/_.-]+$ ]]; then
112+
touched+=("$f")
113+
else
114+
echo "::warning::Skipping Tier-5 path with non-portable shape: $f"
115+
fi
116+
break
117+
;;
118+
esac
119+
done
120+
done <<< "$changed"
121+
if [ ${#touched[@]} -eq 0 ]; then
122+
echo "any=false" >> "$GITHUB_OUTPUT"
123+
echo "files=[]" >> "$GITHUB_OUTPUT"
124+
echo "No Tier-5 files touched; nothing to scan."
125+
exit 0
126+
fi
127+
# Emit a JSON array for matrix consumption.
128+
printf -v joined '"%s",' "${touched[@]}"
129+
echo "any=true" >> "$GITHUB_OUTPUT"
130+
echo "files=[${joined%,}]" >> "$GITHUB_OUTPUT"
131+
echo "Touched Tier-5 files:"
132+
printf ' - %s\n' "${touched[@]}"
133+
134+
scan:
135+
name: Mythos pass (${{ matrix.file }})
136+
needs: detect
137+
if: needs.detect.outputs.any == 'true'
138+
runs-on: [self-hosted, linux, x64, rust-cpu]
139+
timeout-minutes: 45
140+
strategy:
141+
fail-fast: false
142+
max-parallel: 2
143+
matrix:
144+
file: ${{ fromJSON(needs.detect.outputs.files) }}
145+
steps:
146+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
147+
148+
- name: Run Mythos discover.md on ${{ matrix.file }}
149+
id: discover
150+
# claude-code-action v1 pinned by commit SHA (not the `v1`
151+
# tag) so a hijack of the tag doesn't change the action we
152+
# run. Bump by SHA when intentionally upgrading; do not move
153+
# to a floating tag.
154+
#
155+
# `matrix.file` is path-shape-validated by the detect job
156+
# (alphanumerics + /._-) so direct interpolation into the
157+
# prompt cannot inject markdown or shell.
158+
uses: anthropics/claude-code-action@51ea8ea73a139f2a74ff649e3092c25a904aed7e # v1.0.123
159+
with:
160+
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
161+
prompt: |
162+
Read scripts/mythos/discover.md and apply it to the file
163+
${{ matrix.file }}. The {{file}} placeholder in
164+
discover.md resolves to ${{ matrix.file }}.
165+
166+
Do not relax the oracle requirement. If you cannot
167+
produce both a Kani harness and a failing PoC test for a
168+
finding, do not report that finding.
169+
170+
Emit your result strictly as JSON matching the schema in
171+
--json-schema. The "verdict" field is the only signal
172+
this workflow's gate logic consumes; everything else is
173+
for the PR comment.
174+
claude_args: |
175+
--max-turns 30
176+
--json-schema '{"type":"object","required":["verdict"],"properties":{"verdict":{"type":"string","enum":["NO_FINDINGS","FINDING"]},"file":{"type":"string"},"function":{"type":"string"},"hypothesis":{"type":"string"},"impact":{"type":"string"},"candidate_uca":{"type":"string"},"kani_harness":{"type":"string"},"poc_test":{"type":"string"}}}'
177+
178+
- name: Slugify file path for artifact name
179+
id: slug
180+
env:
181+
F: ${{ matrix.file }}
182+
run: |
183+
# actions/upload-artifact rejects '/' in names.
184+
slug=${F//\//__}
185+
echo "slug=${slug}" >> "$GITHUB_OUTPUT"
186+
187+
- name: Save structured output as artifact
188+
if: always()
189+
env:
190+
RESULT_JSON: ${{ steps.discover.outputs.structured_output }}
191+
F: ${{ matrix.file }}
192+
run: |
193+
mkdir -p mythos-out
194+
# If the action failed before emitting structured output,
195+
# synthesize a FINDING-shaped placeholder so the aggregator
196+
# treats this file as blocking rather than silently passing.
197+
if [ -z "${RESULT_JSON:-}" ]; then
198+
RESULT_JSON='{"verdict":"FINDING","file":"'"$F"'","hypothesis":"discover step failed before emitting structured output — see workflow logs"}'
199+
fi
200+
printf '%s' "$RESULT_JSON" > "mythos-out/${{ steps.slug.outputs.slug }}.json"
201+
202+
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
203+
if: always()
204+
with:
205+
name: mythos-result-${{ steps.slug.outputs.slug }}
206+
path: mythos-out/${{ steps.slug.outputs.slug }}.json
207+
if-no-files-found: error
208+
retention-days: 14
209+
210+
aggregate:
211+
name: Aggregate findings + label
212+
needs: [detect, scan]
213+
if: always() && needs.detect.outputs.any == 'true'
214+
runs-on: [self-hosted, linux, x64, light]
215+
steps:
216+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
217+
218+
- uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
219+
with:
220+
path: mythos-out
221+
pattern: mythos-result-*
222+
merge-multiple: true
223+
224+
- name: Compose verdict + sticky comment
225+
id: compose
226+
env:
227+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
228+
PR_NUMBER: ${{ github.event.pull_request.number }}
229+
REPO: ${{ github.repository }}
230+
run: |
231+
set -euo pipefail
232+
# Aggregate every per-file JSON into a single verdict.
233+
findings=0
234+
no_findings=0
235+
rows=""
236+
for f in mythos-out/*.json; do
237+
[ -e "$f" ] || continue
238+
verdict=$(jq -r '.verdict' "$f")
239+
file=$(jq -r '.file // ""' "$f")
240+
hyp=$(jq -r '.hypothesis // ""' "$f")
241+
if [ "$verdict" = "FINDING" ]; then
242+
findings=$((findings + 1))
243+
rows+="| \`$file\` | ❌ FINDING | ${hyp//|/\\|} |"$'\n'
244+
else
245+
no_findings=$((no_findings + 1))
246+
rows+="| \`$file\` | ✅ NO FINDINGS | — |"$'\n'
247+
fi
248+
done
249+
250+
if [ "$findings" -gt 0 ]; then
251+
status="❌ **${findings}** finding(s) across $((findings + no_findings)) Tier-5 file(s)"
252+
verdict=FAIL
253+
else
254+
status="✅ **NO FINDINGS** across ${no_findings} Tier-5 file(s)"
255+
verdict=PASS
256+
fi
257+
258+
body=$(cat <<MARKER
259+
<!-- mythos-auto-gate -->
260+
## Mythos delta-pass (auto)
261+
262+
${status}
263+
264+
| File | Verdict | Hypothesis |
265+
|---|---|---|
266+
${rows}
267+
268+
<sub>Auto-run via \`anthropics/claude-code-action@v1\`
269+
(SHA-pinned) on the touched Tier-5 files, using the
270+
maintainer's Max-plan OAuth token. See
271+
\`.github/workflows/mythos-auto.yml\` and
272+
\`scripts/mythos/discover.md\`.</sub>
273+
MARKER
274+
)
275+
276+
printf '%s' "$body" > /tmp/mythos-body.md
277+
echo "verdict=$verdict" >> "$GITHUB_OUTPUT"
278+
279+
# Sticky-comment upsert: find by marker, PATCH if found
280+
# else POST. Marker is the literal HTML comment string.
281+
marker='<!-- mythos-auto-gate -->'
282+
existing=$(gh api "repos/${REPO}/issues/${PR_NUMBER}/comments" \
283+
--paginate --jq ".[] | select(.body | contains(\"${marker}\")) | .id" | head -n1)
284+
if [ -n "$existing" ]; then
285+
gh api -X PATCH "repos/${REPO}/issues/comments/${existing}" \
286+
-f body="@/tmp/mythos-body.md" >/dev/null
287+
echo "updated comment $existing"
288+
else
289+
gh api -X POST "repos/${REPO}/issues/${PR_NUMBER}/comments" \
290+
-f body="@/tmp/mythos-body.md" >/dev/null
291+
echo "posted new comment"
292+
fi
293+
294+
- name: Apply mythos-pass-done label (PASS only)
295+
if: steps.compose.outputs.verdict == 'PASS'
296+
env:
297+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
298+
PR_NUMBER: ${{ github.event.pull_request.number }}
299+
run: |
300+
gh pr edit "$PR_NUMBER" --add-label mythos-pass-done
301+
302+
- name: Fail job on FINDING verdict
303+
if: steps.compose.outputs.verdict == 'FAIL'
304+
run: |
305+
echo "::error::Mythos discover reported at least one confirmed finding; see PR comment"
306+
exit 1

AGENTS.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -703,6 +703,38 @@ Block the release if any `confirmed` finding lacks an `approved LS-N` in
703703
`safety/stpa/loss-scenarios.yaml` with a shipped fix or an explicit
704704
risk-acceptance note.
705705

706+
#### Auto-runner (`.github/workflows/mythos-auto.yml`)
707+
708+
The Mythos discover protocol is automated for the repository
709+
maintainer (`avrabe`, immutable user id `10056645`) via the
710+
`anthropics/claude-code-action` running against the maintainer's Max-
711+
plan OAuth token. On every PR that touches a Tier-5 file:
712+
713+
1. The detect job lists touched Tier-5 paths (same path-list as
714+
`mythos-gate.yml`) and **path-shape-validates** each one before
715+
passing into the matrix.
716+
2. Per-file matrix runs the `claude-code-action`-pinned-by-SHA with
717+
the discover.md prompt, asking for a structured JSON verdict
718+
(`NO_FINDINGS` or `FINDING`).
719+
3. The aggregate job composes a sticky `<!-- mythos-auto-gate -->`
720+
PR comment with the per-file table, and applies the
721+
`mythos-pass-done` label when every file is `NO_FINDINGS`.
722+
4. If any file is `FINDING`, the job fails and the label is not
723+
applied; the label-only `mythos-gate.yml` then keeps the PR
724+
blocked until a human reviews the finding.
725+
726+
**This auto-runner is single-actor scoped.** The job has a top-level
727+
`if: github.actor == 'avrabe' && github.actor_id == '10056645'`
728+
guard, and the `pull_request` trigger (not `pull_request_target`)
729+
means fork PRs don't get the OAuth token. Contributors should
730+
continue to expect the honor-system flow documented above (`Read
731+
scripts/mythos/discover.md ...`); the auto-runner is *one way* the
732+
label gets applied, not the only way.
733+
734+
If you fork this repo and want to run the auto-runner under your own
735+
account: change the actor allow-list in `mythos-auto.yml`, set up
736+
your own `CLAUDE_CODE_OAUTH_TOKEN` secret, and remove `avrabe`'s id.
737+
706738
### LS-N verification gate
707739

708740
CI workflow `.github/workflows/verification-gate.yml` enforces the

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,27 @@ All notable changes to this project will be documented in this file.
66

77
### Added
88

9+
- **Mythos delta-pass auto-runner** (`.github/workflows/mythos-auto.yml`).
10+
Automates the human-driven discover protocol that
11+
`mythos-gate.yml` enforces by label. On every PR that touches a
12+
Tier-5 file, runs `anthropics/claude-code-action` (SHA-pinned)
13+
against each touched file with `scripts/mythos/discover.md` as
14+
the prompt, captures a structured `{verdict: NO_FINDINGS | FINDING}`
15+
JSON via `--json-schema`, and posts a sticky `<!-- mythos-auto-gate -->`
16+
PR comment with per-file results. Applies `mythos-pass-done` when
17+
every file is `NO_FINDINGS`; fails the job (without applying the
18+
label) when any file is `FINDING`. Single-actor scoped — runs only
19+
when both `github.actor == 'avrabe'` and the immutable
20+
`github.actor_id == '10056645'` match, and only on
21+
`pull_request` (not `pull_request_target`) so fork PRs never see
22+
the OAuth token. Auth flow uses `CLAUDE_CODE_OAUTH_TOKEN` from the
23+
maintainer's Max-plan subscription (no separate API billing). The
24+
detect job path-shape-validates every Tier-5 file
25+
(`^[a-zA-Z0-9/_.-]+$`) before piping into the matrix so a hostile
26+
path cannot inject through `${{ matrix.file }}` interpolation
27+
downstream. The label-only `mythos-gate.yml` remains the source of
28+
truth; the auto-runner is *one way* the label gets applied.
29+
930
- **LS-N verification gate**
1031
(`.github/workflows/verification-gate.yml`,
1132
`tools/run_ls_verification.py`, `tools/post_verification_comment.py`).

0 commit comments

Comments
 (0)