feat(ci): LS-N verification gate (spar-pattern port) (#161)

avrabe · claude · web-flow · commit 284132564e86 · 2026-05-16T23:45:27.000-05:00
* feat(ci): LS-N verification gate (spar-pattern port) PR-time gate that enforces meld's STPA test-naming contract: every `status: approved` entry in `safety/stpa/loss-scenarios.yaml` must have at least one `#[test] fn ls_<letter>_<num>_*` regression test in `meld-core` (e.g. LS-A-11 -> `ls_a_11_*`). Adapted from spar's rivet-driven verification gate (pulseengine/spar@ba329f3d). meld has no rivet-style executable artifact, but loss-scenarios pair with regression tests by the established naming convention; this gate makes that pairing a verifiable contract. Three files: - tools/run_ls_verification.py — Python (stdlib + PyYAML). Iterates approved LS IDs, runs `cargo test --lib --no-fail-fast <prefix>` per ID, buckets results as passed / failed / missing, writes verification-results.json. - tools/post_verification_comment.py — Marker-tagged sticky PR comment upsert via GitHub REST API. Pure stdlib (urllib). First run creates the comment, subsequent runs PATCH the body. Marker: ``. - .github/workflows/verification-gate.yml — PR + workflow_dispatch trigger. Fail-on-failure but advisory-on-missing so the 10 older approved entries with ad-hoc test names (e.g. PR #114's `test_canonical_abi_size_fixed_size_list_saturates_on_overflow` for LS-P-4) can be migrated incrementally rather than blocking every PR. Smoke-tested locally against current main: 19 approved LS, 10 passed (LS-A-7/11/15/17/18/20/12/13/14/16), 9 missing (the older v0.7.0-era and PR-#114-era entries). No failures. Same script runs locally: python3 tools/run_ls_verification.py Inputs are integer/metadata only (PR number via env, head_ref in concurrency); no untrusted free-form text from PR titles/bodies/ comments is read in run: blocks. AGENTS.md gains a "LS-N verification gate" section under "Mythos Bug-Hunt Pipeline". Refs: pulseengine/spar@ba329f3d Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): pass --break-system-packages to pip install pyyaml Self-hosted runners (Debian/Ubuntu Python 3.12) enforce PEP 668 and reject `pip install --user pyyaml` with "externally-managed-environment". `--break-system-packages` is the documented PEP 668 opt-out for CI environments where the runner's Python install is disposable per workflow run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: alias 5 existing regression tests to ls_<>_NN_* convention The LS-N verification gate (this PR) discovered 9 approved loss-scenarios without a matching `ls_<letter>_<num>_*` regression test. Five of those already had regression tests pinning the fix under historical names; this commit adds thin convention aliases so the gate's discovery query finds them. The original tests stay in place (single source of truth, preserves git blame / grep continuity); each alias is a `#[test] fn` that delegates to the original test body. | LS | Original test | Alias | |-----|---------------|-------| | LS-P-4 | test_canonical_abi_size_fixed_size_list_saturates_on_overflow | ls_p_4_canonical_abi_size_saturates_on_overflow | | LS-P-5 | test_parser_rejects_truncated_module_section_issue_118 | ls_p_5_parser_rejects_truncated_module_section | | LS-R-10 | test_issue112_item5_intra_adapter_preserves_from_import_module | ls_r_10_intra_adapter_preserves_from_import_module | | LS-CP-3 | test_issue112_item4_sort_adapter_sites_is_canonical | ls_cp_3_sort_adapter_sites_is_canonical | | LS-A-10 | cabi_alignment_stackful_retptr_writes_i64_at_offset_8 | ls_a_10_cabi_align_retptr_writeback | Gate result drops from 10 passed / 9 missing to 15 passed / 4 missing. The remaining four (LS-CP-4, LS-A-8, LS-A-9, LS-A-19) genuinely lack regression tests and land in follow-up PRs: - LS-CP-4: DWARF passthrough emits address-incorrect debug info - LS-A-8 : Inner-list rep_func selected by HashMap iteration order - LS-A-9 : Async callback POLL falls through to YIELD path - LS-A-19: Resource import dedup uses ends_with() suffix match The LS-CP-3 alias only covers the adapter_sites-order half of the scenario; the caller_encoding_fallback half also still needs a dedicated regression test (tracked alongside LS-A-8/9/19/CP-4). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
diff --git a/.github/workflows/verification-gate.yml b/.github/workflows/verification-gate.yml
@@ -0,0 +1,92 @@
+name: LS-N verification gate
+
+# Verifies that every approved loss-scenario in
+# `safety/stpa/loss-scenarios.yaml` has a passing regression test by
+# naming convention (`LS-A-11` -> `ls_a_11_*`). Posts a single sticky PR
+# comment summarising passed / failed / missing counts. Fails the job
+# only when an existing test fails; missing tests are reported as a
+# warning (advisory) so older approved scenarios with ad-hoc test names
+# can be migrated incrementally rather than blocking every PR.
+#
+# Adapted from spar's rivet-driven verification gate
+# (pulseengine/spar@ba329f3d). meld has no rivet-style executable
+# artifact, but `status: approved` LS entries pair with regression tests
+# by the established `ls_<letter>_<num>_*` naming convention; this gate
+# makes that pairing a verifiable contract.
+#
+# Inputs are all integer/metadata fields (PR number, head_ref); no
+# untrusted free-form text from PR titles/bodies/comments is read in
+# `run:` blocks, so the standard injection vectors do not apply.
+
+on:
+  pull_request:
+    branches: [main]
+  workflow_dispatch:
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+  pull-requests: write
+
+jobs:
+  verify:
+    name: LS-N verification gate
+    runs-on: [self-hosted, linux, x64, rust-cpu]
+    timeout-minutes: 30
+    env:
+      CARGO_TERM_COLOR: always
+      CARGO_INCREMENTAL: 0
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: dtolnay/rust-toolchain@stable
+
+      - uses: Swatinem/rust-cache@v2
+
+      - name: Install PyYAML
+        # Self-hosted runners ship Debian/Ubuntu Python with PEP 668
+        # protection; `--break-system-packages` is the documented opt-out
+        # for CI environments where the runner's Python install is
+        # disposable per workflow run.
+        run: pip install --user --break-system-packages pyyaml
+
+      - name: Run LS-N verification
+        id: verify
+        continue-on-error: true
+        run: |
+          python3 tools/run_ls_verification.py \
+            --results-json verification-results.json
+
+      - name: Upload results artifact
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: verification-results
+          path: verification-results.json
+          if-no-files-found: warn
+
+      - name: Post sticky PR comment
+        if: github.event_name == 'pull_request' && always()
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUMBER: ${{ github.event.pull_request.number }}
+        run: |
+          python3 tools/post_verification_comment.py "$PR_NUMBER"
+
+      - name: Fail job if any approved LS-N test failed
+        # Exit code 1 from run_ls_verification.py = a regression test
+        # for an approved LS entry failed. Exit 2 = missing tests only;
+        # treated as advisory. Exit 0 = all approved entries verified.
+        if: steps.verify.outcome == 'failure'
+        run: |
+          # Re-check: outcome == failure can mean exit 1 (real fail) or
+          # exit 2 (missing only). Inspect the JSON to decide.
+          failed=$(python3 -c "import json; print(json.load(open('verification-results.json'))['failed_count'])")
+          if [ "$failed" -gt 0 ]; then
+            echo "::error::$failed approved LS-N entries have failing regression tests; see PR comment"
+            exit 1
+          fi
+          echo "::warning::Some approved LS-N entries are missing regression tests (advisory only)"
diff --git a/.gitignore b/.gitignore
@@ -41,6 +41,9 @@ credentials.json
 # Test output
 test-output/
 
+# Local LS-N verification gate output
+verification-results.json
+
 # Claude local files
 .claude/
 
diff --git a/AGENTS.md b/AGENTS.md
@@ -703,6 +703,40 @@ Block the release if any `confirmed` finding lacks an `approved LS-N` in
 `safety/stpa/loss-scenarios.yaml` with a shipped fix or an explicit
 risk-acceptance note.
 
+### LS-N verification gate
+
+CI workflow `.github/workflows/verification-gate.yml` enforces the
+test-naming contract on every PR: each `status: approved` entry in
+`safety/stpa/loss-scenarios.yaml` must have at least one `#[test] fn
+ls_<letter>_<num>_*` in `meld-core` (e.g. `LS-A-11` → `ls_a_11_*`).
+
+The gate runs `tools/run_ls_verification.py`, which iterates approved
+LS IDs and invokes `cargo test --lib --no-fail-fast <prefix>` per
+entry, then posts a single sticky PR comment with passed / failed /
+missing counts via `tools/post_verification_comment.py`.
+
+Same script runs locally:
+
+```bash
+python3 tools/run_ls_verification.py --results-json /tmp/ls.json
+```
+
+Buckets and gate behaviour:
+
+- **Passed** — ≥1 matching test, all green. Approved entry is verified.
+- **Failed** — ≥1 matching test failed. **Hard-fails the gate** (block merge).
+- **Missing** — zero tests match the `ls_<letter>_<num>_*` prefix.
+  Advisory only; surfaces as a warning so older approved scenarios
+  with ad-hoc test names (e.g. PR #114's
+  `test_canonical_abi_size_fixed_size_list_saturates_on_overflow` for
+  LS-P-4) can be migrated incrementally rather than blocking every PR.
+
+Adapted from spar's rivet-driven verification gate
+(pulseengine/spar@ba329f3d), with meld's STPA loss-scenario artifacts
+substituted for rivet's executable artifacts. Same sticky-comment
+pattern (marker `<!-- meld-ls-verification-gate -->`, upsert via
+GitHub REST API).
+
 ### Release Process
 
 #### Pre-Release Checklist (MANDATORY)
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,27 @@ All notable changes to this project will be documented in this file.
 
 ## [Unreleased]
 
+### Added
+
+- **LS-N verification gate**
+  (`.github/workflows/verification-gate.yml`,
+  `tools/run_ls_verification.py`, `tools/post_verification_comment.py`).
+  PR-time gate that enforces the test-naming contract: each
+  `status: approved` entry in `safety/stpa/loss-scenarios.yaml` must
+  have at least one `#[test] fn ls_<letter>_<num>_*` in `meld-core`
+  (e.g. `LS-A-11` → `ls_a_11_*`). Runs the matching tests via cargo,
+  buckets results as passed / failed / missing, and upserts a single
+  sticky PR comment (marker `<!-- meld-ls-verification-gate -->`).
+  Failed tests hard-fail the gate; missing tests are advisory so the
+  10 older approved entries with ad-hoc test names (e.g. PR #114's
+  `test_canonical_abi_size_fixed_size_list_saturates_on_overflow`
+  for LS-P-4) can migrate incrementally rather than blocking every
+  PR. Adapted from spar's rivet-driven verification gate
+  (pulseengine/spar@ba329f3d); meld substitutes its STPA loss-
+  scenario artifacts for rivet's executable artifacts, resolving
+  test linkage via naming convention. The same script runs locally
+  via `python3 tools/run_ls_verification.py`.
+
 ## [0.8.1] — 2026-05-16
 
 ### Fixed
diff --git a/meld-core/src/adapter/fact.rs b/meld-core/src/adapter/fact.rs
@@ -5630,4 +5630,14 @@ mod tests {
              would see stale bytes",
         );
     }
+
+    /// LS-N verification gate convention alias. Pins LS-A-10
+    /// (async-lift retptr writeback skips CABI alignment padding)
+    /// via the discoverable `ls_a_10_*` name. Same body as the
+    /// pre-existing `cabi_alignment_stackful_retptr_writes_i64_at_offset_8`
+    /// regression test.
+    #[test]
+    fn ls_a_10_cabi_align_retptr_writeback() {
+        cabi_alignment_stackful_retptr_writes_i64_at_offset_8();
+    }
 }
diff --git a/meld-core/src/parser.rs b/meld-core/src/parser.rs
@@ -3461,6 +3461,13 @@ mod tests {
         }
     }
 
+    /// LS-N verification gate convention alias for the truncated-
+    /// module-section regression above. Same body, canonical name.
+    #[test]
+    fn ls_p_5_parser_rejects_truncated_module_section() {
+        test_parser_rejects_truncated_module_section_issue_118();
+    }
+
     #[test]
     fn test_convert_canonical_options_default() {
         // Empty options list should produce defaults
@@ -4309,6 +4316,16 @@ mod tests {
         assert_eq!(flat_bytes, u32::MAX, "flat_byte_size must saturate");
     }
 
+    /// LS-N verification gate convention alias for the saturation
+    /// regression test above. Delegates to the original test body so
+    /// the canonical `ls_p_4_*` name is discoverable via
+    /// `tools/run_ls_verification.py` without renaming the historical
+    /// test that pins issue #112 / v0.4 follow-up coverage.
+    #[test]
+    fn ls_p_4_canonical_abi_size_saturates_on_overflow() {
+        test_canonical_abi_size_fixed_size_list_saturates_on_overflow();
+    }
+
     /// align_up must not panic when given a saturated u32::MAX size and
     /// a non-trivial alignment — the previous `(size + align - 1)` form
     /// would overflow.
diff --git a/meld-core/src/resolver.rs b/meld-core/src/resolver.rs
@@ -4832,6 +4832,17 @@ mod tests {
         );
     }
 
+    /// LS-N verification gate convention alias for the
+    /// adapter-sites canonical-sort regression above. Pins the
+    /// LS-CP-3 (HashMap iteration leaks into adapter_sites order)
+    /// fix via the discoverable `ls_cp_3_*` name. The
+    /// `caller_encoding_fallback` half of LS-CP-3 still needs a
+    /// dedicated regression test — tracked as a follow-up.
+    #[test]
+    fn ls_cp_3_sort_adapter_sites_is_canonical() {
+        test_issue112_item4_sort_adapter_sites_is_canonical();
+    }
+
     /// Item 5 unit-level PoC: when two `ModuleResolution`s share the
     /// same `import_name` but have different `from_import_module`s, the
     /// promoted adapter sites must preserve the `from_import_module` in
@@ -4976,6 +4987,14 @@ mod tests {
              (LS-R-10 / UCA-R-3 regression)"
         );
     }
+
+    /// LS-N verification gate convention alias. Pins LS-R-10
+    /// (intra-component adapter promotion drops from_import_module
+    /// disambiguator) via the discoverable `ls_r_10_*` name.
+    #[test]
+    fn ls_r_10_intra_adapter_preserves_from_import_module() {
+        test_issue112_item5_intra_adapter_preserves_from_import_module();
+    }
 }
 
 // ----------------------------------------------------------------------
diff --git a/tools/post_verification_comment.py b/tools/post_verification_comment.py
diff --git a/tools/run_ls_verification.py b/tools/run_ls_verification.py