fix(verify-claims): taint-flow surfaces unsupported-language claim passes (INV-javam)

jgstern-agent · jgstern-agent · commit 7c6c0dd1de5a · 2026-04-15T10:12:25.000-04:00
The io-boundaries side of INV-javam landed in the previous PR; this
PR completes the invariant on the taint-flow side.

When `hypergumbo verify-claims` evaluates a `taint_flow` constraint on
a repo whose languages have no sources / sinks in the taint catalog,
the claim trivially "confirms" — there are no propagation findings
because there was nothing to propagate. The verdict is a lie by
omission: the language wasn't analyzed at all.

cmd_verify_claims now tracks `unsupported_taint_languages` — the set
of repo languages where the taint catalog has zero sources AND zero
sinks. When taint claims are present and any unsupported languages
are detected, stderr carries an explicit notice:

  Note: no taint-flow catalog for language(s): brainfuck, nim.
  Claims touching these languages are NOT actually verified — taint-
  flow has no sources/sinks to trace. Treat 'confirmed' verdicts on
  these languages as inconclusive. (INV-javam)

Why stderr, not a verdict change: changing `confirmed` to a new
status (e.g. `indeterminate`) would require plumbing language-scope
into individual ClaimVerdict records and could break downstream
consumers. The stderr signal is the lower-risk move — human reviewers
see it, programmatic consumers can capture stderr when they care,
and the JSON output schema stays stable. If we later decide a
per-verdict `language_supported` field is worth the break, that's a
natural follow-up.

3 tests:
- test_verify_claims_notice_for_unsupported_taint_language: notice
  fires for brainfuck-language repo with a taint_flow claim.
- test_verify_claims_no_notice_when_no_taint_claims: notice does NOT
  fire for boundary-only claims (only taint-flow ones trigger it).
- test_verify_claims_no_notice_when_taint_language_supported:
  notice does NOT fire when every language has catalog coverage
  (anti-regression).

INV-javam stays at pending_validation; bakeoff will confirm the
two-part fix (io-boundaries + verify-claims) behaves correctly
end-to-end on unsupported-language repos.

Signed-off-by: jgstern-agent &lt;josh-agent@iterabloom.com&gt;
diff --git a/.ci/affected-tests.txt b/.ci/affected-tests.txt
@@ -1,8 +1,8 @@
 # Test selection manifest
-# Generated by smart-test at 2026-04-15T09:45:01-04:00
+# Generated by smart-test at 2026-04-15T10:12:20-04:00
 # Mode: targeted
 # Baseline: 02dba9744d2c86e26f06565aad4ebcae7ef0f4a8
-# Changed files: 37
+# Changed files: 38
 # Changed source files: 7
 # Selected tests: 66
 #
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -23,6 +23,7 @@ This changelog tracks the **tool version** (package releases). The **schema vers
 
 ### Fixed
 
+- **verify-claims surfaces languages where taint-flow has no catalog** (INV-javam, taint-flow side): when `hypergumbo verify-claims` evaluates a `taint_flow` constraint on a repo whose languages have no sources/sinks in the taint catalog, a stderr notice now fires ("no taint-flow catalog for language(s): X, Y. Claims touching these languages are NOT actually verified ... Treat 'confirmed' verdicts on these languages as inconclusive. INV-javam"). Before this, a trivially-passing claim against an unanalyzed language gave false security confidence — the verdict was "confirmed" because no propagation findings existed, not because the code was safe. The notice only fires when taint claims are present (not on pure boundary claims) and only when at least one detected language has zero coverage. JSON output schema unchanged for backward compatibility; the signal goes to stderr for human review.
 - **io-boundaries distinguishes "no I/O detected" from "language unsupported"** (INV-javam / UAT DQ-03+DQ-04): previously, `hypergumbo io-boundaries` on a codebase containing an unsupported language (pre-banaf TypeScript, pre-vibur Elixir, pre-rujos Kotlin, Solidity, Nim, ...) returned zero boundaries with no warning — output identical to a genuinely I/O-free codebase, and downstream taint-flow assertions trivially passed with false security confidence. `IoBoundaryCatalog` now carries an `is_supported: bool` field (False when no YAML / alias / parent resolves), `io_boundary.is_language_supported(lang)` exposes this to callers, `cmd_io_boundaries` emits a stderr notice ("no I/O primitive catalog for language(s): X, Y. Zero boundaries reported for these languages does NOT mean the code is I/O-free — INV-javam"), and the JSON output includes a stable `unsupported_languages: []` field so programmatic consumers can detect the condition. Complements the organic catalog expansion in WI-banaf/sakan/rujos/vibur: even after coverage grows, some languages will always lack catalogs and the invariant needs to hold.
 - **Laravel `apiResource()` no longer emits phantom HTML-form routes; `.except()` / `.only()` honored** (WI-jorim / UAT BUG-07): `Route::apiResource('posts', PostController::class)` now produces 5 routes (index/store/show/update/destroy) instead of 7 — the `GET /create` and `GET /{id}/edit` HTML-form routes that don't exist for an API resource are dropped. Chained `.except([...])` / `.only([...])` modifiers (variadic strings or array literal) are now parsed and applied to both `resource` and `apiResource`. Multiple modifiers compose in source-order. Variable args (e.g. `->except($actions)`) and unrelated chained methods (e.g. `->name(...)`, `->middleware(...)`) are correctly ignored. On koel: ~40 phantom routes were eliminated (~19% of the 207 originally reported). New constant `LARAVEL_RESOURCE_ACTIONS` and `LARAVEL_API_RESOURCE_EXCLUDED_ACTIONS` make the action set self-documenting.
 - **Subcommand parser cleanup** (WI-balij / UAT UX-03 + UX-04): two argparse plumbing rough edges.
diff --git a/packages/hypergumbo-core/src/hypergumbo_core/cli.py b/packages/hypergumbo-core/src/hypergumbo_core/cli.py
@@ -3270,6 +3270,11 @@ class _Edge:
 
     # Run taint-flow analysis if any claims have taint_flow constraints
     taint_findings = None
+    # INV-javam: track languages with no taint coverage so callers can
+    # distinguish "no taint-flow violations" from "language not analyzed".
+    # Without this, taint-flow trivially passes every claim on unsupported
+    # languages and the verify-claims output lies by omission.
+    unsupported_taint_languages: list[str] = []
     has_taint_claims = any(c.constraint_taint_flow is not None for c in claims)
     if has_taint_claims:
         from .taint import load_builtin_taint_catalog, propagate_taint_structural
@@ -3282,7 +3287,13 @@ class _Edge:
         all_sources = []
         all_sinks = []
         all_sanitizers = []
-        for lang in languages:
+        for lang in sorted(languages):
+            src_count = len(taint_catalog.sources_for_language(lang))
+            snk_count = len(taint_catalog.sinks_for_language(lang))
+            if src_count == 0 and snk_count == 0:
+                # Neither sources nor sinks for this language — taint-flow
+                # cannot meaningfully analyze it. Surface the gap.
+                unsupported_taint_languages.append(lang)
             all_sources.extend(taint_catalog.sources_for_language(lang))
             all_sinks.extend(taint_catalog.sinks_for_language(lang))
             all_sanitizers.extend(taint_catalog.sanitizers_for_language(lang))
@@ -3297,6 +3308,9 @@ class _Edge:
 
     # Output
     if getattr(args, "json_output", False):
+        # Preserve the legacy flat-list schema for programmatic consumers;
+        # INV-javam's unsupported_taint_languages signal goes to stderr to
+        # avoid breaking existing pipelines that parse verify-claims JSON.
         print(json.dumps([v.to_dict() for v in verdicts], indent=2))
     else:
         violated = 0
@@ -3314,6 +3328,19 @@ class _Edge:
         else:
             print(f"All {len(verdicts)} claim(s) CONFIRMED")
 
+    # INV-javam: warn to stderr when taint claims were evaluated against a
+    # repo whose languages have no taint catalog coverage. Even a "all
+    # confirmed" verdict is misleading when the language wasn't analyzed.
+    if has_taint_claims and unsupported_taint_languages:
+        langs = ", ".join(unsupported_taint_languages)
+        print(
+            f"\nNote: no taint-flow catalog for language(s): {langs}. "
+            "Claims touching these languages are NOT actually verified — "
+            "taint-flow has no sources/sinks to trace. Treat 'confirmed' "
+            "verdicts on these languages as inconclusive. (INV-javam)",
+            file=sys.stderr,
+        )
+
     has_violations = any(v.verdict == "violated" for v in verdicts)
     return 1 if has_violations else 0
 
diff --git a/packages/hypergumbo-core/tests/test_cli_verify_claims.py b/packages/hypergumbo-core/tests/test_cli_verify_claims.py
@@ -363,3 +363,149 @@ def test_verify_claims_taint_no_sources(tmp_path: Path, capsys) -> None:
 
     rc = cmd_verify_claims(args)
     assert rc == 0
+
+
+# ============================================================================
+# INV-javam: taint-flow surfaces languages it can't verify
+# ============================================================================
+
+
+def test_verify_claims_notice_for_unsupported_taint_language(
+    tmp_path: Path, capsys,
+) -> None:
+    """INV-javam: when a repo has taint-flow claims but the language has
+    no sources/sinks in the taint catalog, stderr carries an explicit
+    notice. Otherwise 'confirmed' is misleading — the language wasn't
+    actually analyzed.
+    """
+    # Brainfuck has no taint catalog entries whatsoever; a claim against
+    # a repo in that language will trivially 'confirm' without the notice.
+    bmap = _make_behavior_map(
+        nodes=[
+            {"id": "brainfuck:m.bf:1:main:function", "name": "main",
+             "kind": "function", "language": "brainfuck", "path": "m.bf",
+             "span": {"start_line": 1, "end_line": 5}},
+        ],
+        edges=[],
+    )
+    input_file = tmp_path / "hg.json"
+    input_file.write_text(json.dumps(bmap))
+
+    claims = {
+        "claims": [
+            {
+                "id": "TF-001",
+                "text": "No secrets to disk",
+                "constraint": {
+                    "taint_flow": {
+                        "source_taint": "secret",
+                        "prohibited_sink_zone": "host_fs",
+                    },
+                },
+            },
+        ],
+    }
+    claims_file = tmp_path / "claims.yaml"
+    claims_file.write_text(yaml.dump(claims))
+
+    args = FakeArgs()
+    args.path = str(tmp_path)
+    args.input = str(input_file)
+    args.claims = str(claims_file)
+    args.json_output = False
+
+    rc = cmd_verify_claims(args)
+    # Verdict is still "confirmed" (no taint findings) but the notice
+    # must be present so humans don't misread the verdict as a pass.
+    assert rc == 0
+    _, err = capsys.readouterr()
+    assert "brainfuck" in err
+    assert "no taint-flow catalog" in err
+    assert "NOT actually verified" in err
+    assert "INV-javam" in err
+
+
+def test_verify_claims_no_notice_when_no_taint_claims(
+    tmp_path: Path, capsys,
+) -> None:
+    """INV-javam anti-regression: the taint-flow notice only fires when
+    taint claims are actually evaluated. Pure boundary claims on an
+    unsupported language shouldn't trigger it.
+    """
+    bmap = _make_behavior_map(
+        nodes=[
+            {"id": "brainfuck:m.bf:1:main:function", "name": "main",
+             "kind": "function", "language": "brainfuck", "path": "m.bf",
+             "span": {"start_line": 1, "end_line": 5}},
+        ],
+        edges=[],
+    )
+    input_file = tmp_path / "hg.json"
+    input_file.write_text(json.dumps(bmap))
+
+    claims = {
+        "claims": [
+            {"id": "SC-001", "text": "No net",
+             "constraint": {"boundary": "net_send", "must_not_exist": True}},
+        ]
+    }
+    claims_file = tmp_path / "claims.yaml"
+    claims_file.write_text(yaml.dump(claims))
+
+    args = FakeArgs()
+    args.path = str(tmp_path)
+    args.input = str(input_file)
+    args.claims = str(claims_file)
+    args.json_output = False
+
+    rc = cmd_verify_claims(args)
+    assert rc == 0
+    _, err = capsys.readouterr()
+    assert "no taint-flow catalog" not in err
+
+
+def test_verify_claims_no_notice_when_taint_language_supported(
+    tmp_path: Path, capsys,
+) -> None:
+    """INV-javam anti-regression: don't fire the notice when every
+    detected language has taint-catalog coverage (no false alarm
+    on fully-supported codebases).
+    """
+    bmap = _make_behavior_map(
+        nodes=[
+            {"id": "python:a.py:1:f:function", "name": "f",
+             "kind": "function", "language": "python", "path": "a.py",
+             "span": {"start_line": 1, "end_line": 5}},
+        ],
+        edges=[],
+    )
+    input_file = tmp_path / "hg.json"
+    input_file.write_text(json.dumps(bmap))
+
+    claims = {
+        "claims": [
+            {
+                "id": "TF-001",
+                "text": "No plaintext to disk",
+                "constraint": {
+                    "taint_flow": {
+                        "source_taint": "plaintext",
+                        "prohibited_sink_zone": "host_fs",
+                    },
+                },
+            },
+        ],
+    }
+    claims_file = tmp_path / "claims.yaml"
+    claims_file.write_text(yaml.dump(claims))
+
+    args = FakeArgs()
+    args.path = str(tmp_path)
+    args.input = str(input_file)
+    args.claims = str(claims_file)
+    args.json_output = False
+
+    rc = cmd_verify_claims(args)
+    assert rc == 0
+    _, err = capsys.readouterr()
+    assert "no taint-flow catalog" not in err