Skip to content

Commit 2841325

Browse files
avrabeclaude
andauthored
feat(ci): LS-N verification gate (spar-pattern port) (#161)
* feat(ci): LS-N verification gate (spar-pattern port) PR-time gate that enforces meld's STPA test-naming contract: every `status: approved` entry in `safety/stpa/loss-scenarios.yaml` must have at least one `#[test] fn ls_<letter>_<num>_*` regression test in `meld-core` (e.g. LS-A-11 -> `ls_a_11_*`). Adapted from spar's rivet-driven verification gate (pulseengine/spar@ba329f3d). meld has no rivet-style executable artifact, but loss-scenarios pair with regression tests by the established naming convention; this gate makes that pairing a verifiable contract. Three files: - tools/run_ls_verification.py — Python (stdlib + PyYAML). Iterates approved LS IDs, runs `cargo test --lib --no-fail-fast <prefix>` per ID, buckets results as passed / failed / missing, writes verification-results.json. - tools/post_verification_comment.py — Marker-tagged sticky PR comment upsert via GitHub REST API. Pure stdlib (urllib). First run creates the comment, subsequent runs PATCH the body. Marker: `<!-- meld-ls-verification-gate -->`. - .github/workflows/verification-gate.yml — PR + workflow_dispatch trigger. Fail-on-failure but advisory-on-missing so the 10 older approved entries with ad-hoc test names (e.g. PR #114's `test_canonical_abi_size_fixed_size_list_saturates_on_overflow` for LS-P-4) can be migrated incrementally rather than blocking every PR. Smoke-tested locally against current main: 19 approved LS, 10 passed (LS-A-7/11/15/17/18/20/12/13/14/16), 9 missing (the older v0.7.0-era and PR-#114-era entries). No failures. Same script runs locally: python3 tools/run_ls_verification.py Inputs are integer/metadata only (PR number via env, head_ref in concurrency); no untrusted free-form text from PR titles/bodies/ comments is read in run: blocks. AGENTS.md gains a "LS-N verification gate" section under "Mythos Bug-Hunt Pipeline". Refs: pulseengine/spar@ba329f3d Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): pass --break-system-packages to pip install pyyaml Self-hosted runners (Debian/Ubuntu Python 3.12) enforce PEP 668 and reject `pip install --user pyyaml` with "externally-managed-environment". `--break-system-packages` is the documented PEP 668 opt-out for CI environments where the runner's Python install is disposable per workflow run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: alias 5 existing regression tests to ls_<>_NN_* convention The LS-N verification gate (this PR) discovered 9 approved loss-scenarios without a matching `ls_<letter>_<num>_*` regression test. Five of those already had regression tests pinning the fix under historical names; this commit adds thin convention aliases so the gate's discovery query finds them. The original tests stay in place (single source of truth, preserves git blame / grep continuity); each alias is a `#[test] fn` that delegates to the original test body. | LS | Original test | Alias | |-----|---------------|-------| | LS-P-4 | test_canonical_abi_size_fixed_size_list_saturates_on_overflow | ls_p_4_canonical_abi_size_saturates_on_overflow | | LS-P-5 | test_parser_rejects_truncated_module_section_issue_118 | ls_p_5_parser_rejects_truncated_module_section | | LS-R-10 | test_issue112_item5_intra_adapter_preserves_from_import_module | ls_r_10_intra_adapter_preserves_from_import_module | | LS-CP-3 | test_issue112_item4_sort_adapter_sites_is_canonical | ls_cp_3_sort_adapter_sites_is_canonical | | LS-A-10 | cabi_alignment_stackful_retptr_writes_i64_at_offset_8 | ls_a_10_cabi_align_retptr_writeback | Gate result drops from 10 passed / 9 missing to 15 passed / 4 missing. The remaining four (LS-CP-4, LS-A-8, LS-A-9, LS-A-19) genuinely lack regression tests and land in follow-up PRs: - LS-CP-4: DWARF passthrough emits address-incorrect debug info - LS-A-8 : Inner-list rep_func selected by HashMap iteration order - LS-A-9 : Async callback POLL falls through to YIELD path - LS-A-19: Resource import dedup uses ends_with() suffix match The LS-CP-3 alias only covers the adapter_sites-order half of the scenario; the caller_encoding_fallback half also still needs a dedicated regression test (tracked alongside LS-A-8/9/19/CP-4). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 67873b4 commit 2841325

9 files changed

Lines changed: 577 additions & 0 deletions

File tree

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
name: LS-N verification gate
2+
3+
# Verifies that every approved loss-scenario in
4+
# `safety/stpa/loss-scenarios.yaml` has a passing regression test by
5+
# naming convention (`LS-A-11` -> `ls_a_11_*`). Posts a single sticky PR
6+
# comment summarising passed / failed / missing counts. Fails the job
7+
# only when an existing test fails; missing tests are reported as a
8+
# warning (advisory) so older approved scenarios with ad-hoc test names
9+
# can be migrated incrementally rather than blocking every PR.
10+
#
11+
# Adapted from spar's rivet-driven verification gate
12+
# (pulseengine/spar@ba329f3d). meld has no rivet-style executable
13+
# artifact, but `status: approved` LS entries pair with regression tests
14+
# by the established `ls_<letter>_<num>_*` naming convention; this gate
15+
# makes that pairing a verifiable contract.
16+
#
17+
# Inputs are all integer/metadata fields (PR number, head_ref); no
18+
# untrusted free-form text from PR titles/bodies/comments is read in
19+
# `run:` blocks, so the standard injection vectors do not apply.
20+
21+
on:
22+
pull_request:
23+
branches: [main]
24+
workflow_dispatch:
25+
26+
concurrency:
27+
group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
28+
cancel-in-progress: true
29+
30+
permissions:
31+
contents: read
32+
pull-requests: write
33+
34+
jobs:
35+
verify:
36+
name: LS-N verification gate
37+
runs-on: [self-hosted, linux, x64, rust-cpu]
38+
timeout-minutes: 30
39+
env:
40+
CARGO_TERM_COLOR: always
41+
CARGO_INCREMENTAL: 0
42+
steps:
43+
- uses: actions/checkout@v4
44+
45+
- uses: dtolnay/rust-toolchain@stable
46+
47+
- uses: Swatinem/rust-cache@v2
48+
49+
- name: Install PyYAML
50+
# Self-hosted runners ship Debian/Ubuntu Python with PEP 668
51+
# protection; `--break-system-packages` is the documented opt-out
52+
# for CI environments where the runner's Python install is
53+
# disposable per workflow run.
54+
run: pip install --user --break-system-packages pyyaml
55+
56+
- name: Run LS-N verification
57+
id: verify
58+
continue-on-error: true
59+
run: |
60+
python3 tools/run_ls_verification.py \
61+
--results-json verification-results.json
62+
63+
- name: Upload results artifact
64+
if: always()
65+
uses: actions/upload-artifact@v4
66+
with:
67+
name: verification-results
68+
path: verification-results.json
69+
if-no-files-found: warn
70+
71+
- name: Post sticky PR comment
72+
if: github.event_name == 'pull_request' && always()
73+
env:
74+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
75+
PR_NUMBER: ${{ github.event.pull_request.number }}
76+
run: |
77+
python3 tools/post_verification_comment.py "$PR_NUMBER"
78+
79+
- name: Fail job if any approved LS-N test failed
80+
# Exit code 1 from run_ls_verification.py = a regression test
81+
# for an approved LS entry failed. Exit 2 = missing tests only;
82+
# treated as advisory. Exit 0 = all approved entries verified.
83+
if: steps.verify.outcome == 'failure'
84+
run: |
85+
# Re-check: outcome == failure can mean exit 1 (real fail) or
86+
# exit 2 (missing only). Inspect the JSON to decide.
87+
failed=$(python3 -c "import json; print(json.load(open('verification-results.json'))['failed_count'])")
88+
if [ "$failed" -gt 0 ]; then
89+
echo "::error::$failed approved LS-N entries have failing regression tests; see PR comment"
90+
exit 1
91+
fi
92+
echo "::warning::Some approved LS-N entries are missing regression tests (advisory only)"

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ credentials.json
4141
# Test output
4242
test-output/
4343

44+
# Local LS-N verification gate output
45+
verification-results.json
46+
4447
# Claude local files
4548
.claude/
4649

AGENTS.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -703,6 +703,40 @@ Block the release if any `confirmed` finding lacks an `approved LS-N` in
703703
`safety/stpa/loss-scenarios.yaml` with a shipped fix or an explicit
704704
risk-acceptance note.
705705

706+
### LS-N verification gate
707+
708+
CI workflow `.github/workflows/verification-gate.yml` enforces the
709+
test-naming contract on every PR: each `status: approved` entry in
710+
`safety/stpa/loss-scenarios.yaml` must have at least one `#[test] fn
711+
ls_<letter>_<num>_*` in `meld-core` (e.g. `LS-A-11``ls_a_11_*`).
712+
713+
The gate runs `tools/run_ls_verification.py`, which iterates approved
714+
LS IDs and invokes `cargo test --lib --no-fail-fast <prefix>` per
715+
entry, then posts a single sticky PR comment with passed / failed /
716+
missing counts via `tools/post_verification_comment.py`.
717+
718+
Same script runs locally:
719+
720+
```bash
721+
python3 tools/run_ls_verification.py --results-json /tmp/ls.json
722+
```
723+
724+
Buckets and gate behaviour:
725+
726+
- **Passed** — ≥1 matching test, all green. Approved entry is verified.
727+
- **Failed** — ≥1 matching test failed. **Hard-fails the gate** (block merge).
728+
- **Missing** — zero tests match the `ls_<letter>_<num>_*` prefix.
729+
Advisory only; surfaces as a warning so older approved scenarios
730+
with ad-hoc test names (e.g. PR #114's
731+
`test_canonical_abi_size_fixed_size_list_saturates_on_overflow` for
732+
LS-P-4) can be migrated incrementally rather than blocking every PR.
733+
734+
Adapted from spar's rivet-driven verification gate
735+
(pulseengine/spar@ba329f3d), with meld's STPA loss-scenario artifacts
736+
substituted for rivet's executable artifacts. Same sticky-comment
737+
pattern (marker `<!-- meld-ls-verification-gate -->`, upsert via
738+
GitHub REST API).
739+
706740
### Release Process
707741

708742
#### Pre-Release Checklist (MANDATORY)

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,27 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- **LS-N verification gate**
10+
(`.github/workflows/verification-gate.yml`,
11+
`tools/run_ls_verification.py`, `tools/post_verification_comment.py`).
12+
PR-time gate that enforces the test-naming contract: each
13+
`status: approved` entry in `safety/stpa/loss-scenarios.yaml` must
14+
have at least one `#[test] fn ls_<letter>_<num>_*` in `meld-core`
15+
(e.g. `LS-A-11``ls_a_11_*`). Runs the matching tests via cargo,
16+
buckets results as passed / failed / missing, and upserts a single
17+
sticky PR comment (marker `<!-- meld-ls-verification-gate -->`).
18+
Failed tests hard-fail the gate; missing tests are advisory so the
19+
10 older approved entries with ad-hoc test names (e.g. PR #114's
20+
`test_canonical_abi_size_fixed_size_list_saturates_on_overflow`
21+
for LS-P-4) can migrate incrementally rather than blocking every
22+
PR. Adapted from spar's rivet-driven verification gate
23+
(pulseengine/spar@ba329f3d); meld substitutes its STPA loss-
24+
scenario artifacts for rivet's executable artifacts, resolving
25+
test linkage via naming convention. The same script runs locally
26+
via `python3 tools/run_ls_verification.py`.
27+
728
## [0.8.1] — 2026-05-16
829

930
### Fixed

meld-core/src/adapter/fact.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5630,4 +5630,14 @@ mod tests {
56305630
would see stale bytes",
56315631
);
56325632
}
5633+
5634+
/// LS-N verification gate convention alias. Pins LS-A-10
5635+
/// (async-lift retptr writeback skips CABI alignment padding)
5636+
/// via the discoverable `ls_a_10_*` name. Same body as the
5637+
/// pre-existing `cabi_alignment_stackful_retptr_writes_i64_at_offset_8`
5638+
/// regression test.
5639+
#[test]
5640+
fn ls_a_10_cabi_align_retptr_writeback() {
5641+
cabi_alignment_stackful_retptr_writes_i64_at_offset_8();
5642+
}
56335643
}

meld-core/src/parser.rs

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3461,6 +3461,13 @@ mod tests {
34613461
}
34623462
}
34633463

3464+
/// LS-N verification gate convention alias for the truncated-
3465+
/// module-section regression above. Same body, canonical name.
3466+
#[test]
3467+
fn ls_p_5_parser_rejects_truncated_module_section() {
3468+
test_parser_rejects_truncated_module_section_issue_118();
3469+
}
3470+
34643471
#[test]
34653472
fn test_convert_canonical_options_default() {
34663473
// Empty options list should produce defaults
@@ -4309,6 +4316,16 @@ mod tests {
43094316
assert_eq!(flat_bytes, u32::MAX, "flat_byte_size must saturate");
43104317
}
43114318

4319+
/// LS-N verification gate convention alias for the saturation
4320+
/// regression test above. Delegates to the original test body so
4321+
/// the canonical `ls_p_4_*` name is discoverable via
4322+
/// `tools/run_ls_verification.py` without renaming the historical
4323+
/// test that pins issue #112 / v0.4 follow-up coverage.
4324+
#[test]
4325+
fn ls_p_4_canonical_abi_size_saturates_on_overflow() {
4326+
test_canonical_abi_size_fixed_size_list_saturates_on_overflow();
4327+
}
4328+
43124329
/// align_up must not panic when given a saturated u32::MAX size and
43134330
/// a non-trivial alignment — the previous `(size + align - 1)` form
43144331
/// would overflow.

meld-core/src/resolver.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4832,6 +4832,17 @@ mod tests {
48324832
);
48334833
}
48344834

4835+
/// LS-N verification gate convention alias for the
4836+
/// adapter-sites canonical-sort regression above. Pins the
4837+
/// LS-CP-3 (HashMap iteration leaks into adapter_sites order)
4838+
/// fix via the discoverable `ls_cp_3_*` name. The
4839+
/// `caller_encoding_fallback` half of LS-CP-3 still needs a
4840+
/// dedicated regression test — tracked as a follow-up.
4841+
#[test]
4842+
fn ls_cp_3_sort_adapter_sites_is_canonical() {
4843+
test_issue112_item4_sort_adapter_sites_is_canonical();
4844+
}
4845+
48354846
/// Item 5 unit-level PoC: when two `ModuleResolution`s share the
48364847
/// same `import_name` but have different `from_import_module`s, the
48374848
/// promoted adapter sites must preserve the `from_import_module` in
@@ -4976,6 +4987,14 @@ mod tests {
49764987
(LS-R-10 / UCA-R-3 regression)"
49774988
);
49784989
}
4990+
4991+
/// LS-N verification gate convention alias. Pins LS-R-10
4992+
/// (intra-component adapter promotion drops from_import_module
4993+
/// disambiguator) via the discoverable `ls_r_10_*` name.
4994+
#[test]
4995+
fn ls_r_10_intra_adapter_preserves_from_import_module() {
4996+
test_issue112_item5_intra_adapter_preserves_from_import_module();
4997+
}
49794998
}
49804999

49815000
// ----------------------------------------------------------------------

0 commit comments

Comments
 (0)