Skip to content

Commit 7c6c0dd

Browse files
author
jgstern-agent
committed
fix(verify-claims): taint-flow surfaces unsupported-language claim passes (INV-javam)
The io-boundaries side of INV-javam landed in the previous PR; this PR completes the invariant on the taint-flow side. When `hypergumbo verify-claims` evaluates a `taint_flow` constraint on a repo whose languages have no sources / sinks in the taint catalog, the claim trivially "confirms" — there are no propagation findings because there was nothing to propagate. The verdict is a lie by omission: the language wasn't analyzed at all. cmd_verify_claims now tracks `unsupported_taint_languages` — the set of repo languages where the taint catalog has zero sources AND zero sinks. When taint claims are present and any unsupported languages are detected, stderr carries an explicit notice: Note: no taint-flow catalog for language(s): brainfuck, nim. Claims touching these languages are NOT actually verified — taint- flow has no sources/sinks to trace. Treat 'confirmed' verdicts on these languages as inconclusive. (INV-javam) Why stderr, not a verdict change: changing `confirmed` to a new status (e.g. `indeterminate`) would require plumbing language-scope into individual ClaimVerdict records and could break downstream consumers. The stderr signal is the lower-risk move — human reviewers see it, programmatic consumers can capture stderr when they care, and the JSON output schema stays stable. If we later decide a per-verdict `language_supported` field is worth the break, that's a natural follow-up. 3 tests: - test_verify_claims_notice_for_unsupported_taint_language: notice fires for brainfuck-language repo with a taint_flow claim. - test_verify_claims_no_notice_when_no_taint_claims: notice does NOT fire for boundary-only claims (only taint-flow ones trigger it). - test_verify_claims_no_notice_when_taint_language_supported: notice does NOT fire when every language has catalog coverage (anti-regression). INV-javam stays at pending_validation; bakeoff will confirm the two-part fix (io-boundaries + verify-claims) behaves correctly end-to-end on unsupported-language repos. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent 688fe97 commit 7c6c0dd

4 files changed

Lines changed: 177 additions & 3 deletions

File tree

.ci/affected-tests.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-15T09:45:01-04:00
2+
# Generated by smart-test at 2026-04-15T10:12:20-04:00
33
# Mode: targeted
44
# Baseline: 02dba9744d2c86e26f06565aad4ebcae7ef0f4a8
5-
# Changed files: 37
5+
# Changed files: 38
66
# Changed source files: 7
77
# Selected tests: 66
88
#

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ This changelog tracks the **tool version** (package releases). The **schema vers
2323

2424
### Fixed
2525

26+
- **verify-claims surfaces languages where taint-flow has no catalog** (INV-javam, taint-flow side): when `hypergumbo verify-claims` evaluates a `taint_flow` constraint on a repo whose languages have no sources/sinks in the taint catalog, a stderr notice now fires ("no taint-flow catalog for language(s): X, Y. Claims touching these languages are NOT actually verified ... Treat 'confirmed' verdicts on these languages as inconclusive. INV-javam"). Before this, a trivially-passing claim against an unanalyzed language gave false security confidence — the verdict was "confirmed" because no propagation findings existed, not because the code was safe. The notice only fires when taint claims are present (not on pure boundary claims) and only when at least one detected language has zero coverage. JSON output schema unchanged for backward compatibility; the signal goes to stderr for human review.
2627
- **io-boundaries distinguishes "no I/O detected" from "language unsupported"** (INV-javam / UAT DQ-03+DQ-04): previously, `hypergumbo io-boundaries` on a codebase containing an unsupported language (pre-banaf TypeScript, pre-vibur Elixir, pre-rujos Kotlin, Solidity, Nim, ...) returned zero boundaries with no warning — output identical to a genuinely I/O-free codebase, and downstream taint-flow assertions trivially passed with false security confidence. `IoBoundaryCatalog` now carries an `is_supported: bool` field (False when no YAML / alias / parent resolves), `io_boundary.is_language_supported(lang)` exposes this to callers, `cmd_io_boundaries` emits a stderr notice ("no I/O primitive catalog for language(s): X, Y. Zero boundaries reported for these languages does NOT mean the code is I/O-free — INV-javam"), and the JSON output includes a stable `unsupported_languages: []` field so programmatic consumers can detect the condition. Complements the organic catalog expansion in WI-banaf/sakan/rujos/vibur: even after coverage grows, some languages will always lack catalogs and the invariant needs to hold.
2728
- **Laravel `apiResource()` no longer emits phantom HTML-form routes; `.except()` / `.only()` honored** (WI-jorim / UAT BUG-07): `Route::apiResource('posts', PostController::class)` now produces 5 routes (index/store/show/update/destroy) instead of 7 — the `GET /create` and `GET /{id}/edit` HTML-form routes that don't exist for an API resource are dropped. Chained `.except([...])` / `.only([...])` modifiers (variadic strings or array literal) are now parsed and applied to both `resource` and `apiResource`. Multiple modifiers compose in source-order. Variable args (e.g. `->except($actions)`) and unrelated chained methods (e.g. `->name(...)`, `->middleware(...)`) are correctly ignored. On koel: ~40 phantom routes were eliminated (~19% of the 207 originally reported). New constant `LARAVEL_RESOURCE_ACTIONS` and `LARAVEL_API_RESOURCE_EXCLUDED_ACTIONS` make the action set self-documenting.
2829
- **Subcommand parser cleanup** (WI-balij / UAT UX-03 + UX-04): two argparse plumbing rough edges.

packages/hypergumbo-core/src/hypergumbo_core/cli.py

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3270,6 +3270,11 @@ class _Edge:
32703270

32713271
# Run taint-flow analysis if any claims have taint_flow constraints
32723272
taint_findings = None
3273+
# INV-javam: track languages with no taint coverage so callers can
3274+
# distinguish "no taint-flow violations" from "language not analyzed".
3275+
# Without this, taint-flow trivially passes every claim on unsupported
3276+
# languages and the verify-claims output lies by omission.
3277+
unsupported_taint_languages: list[str] = []
32733278
has_taint_claims = any(c.constraint_taint_flow is not None for c in claims)
32743279
if has_taint_claims:
32753280
from .taint import load_builtin_taint_catalog, propagate_taint_structural
@@ -3282,7 +3287,13 @@ class _Edge:
32823287
all_sources = []
32833288
all_sinks = []
32843289
all_sanitizers = []
3285-
for lang in languages:
3290+
for lang in sorted(languages):
3291+
src_count = len(taint_catalog.sources_for_language(lang))
3292+
snk_count = len(taint_catalog.sinks_for_language(lang))
3293+
if src_count == 0 and snk_count == 0:
3294+
# Neither sources nor sinks for this language — taint-flow
3295+
# cannot meaningfully analyze it. Surface the gap.
3296+
unsupported_taint_languages.append(lang)
32863297
all_sources.extend(taint_catalog.sources_for_language(lang))
32873298
all_sinks.extend(taint_catalog.sinks_for_language(lang))
32883299
all_sanitizers.extend(taint_catalog.sanitizers_for_language(lang))
@@ -3297,6 +3308,9 @@ class _Edge:
32973308

32983309
# Output
32993310
if getattr(args, "json_output", False):
3311+
# Preserve the legacy flat-list schema for programmatic consumers;
3312+
# INV-javam's unsupported_taint_languages signal goes to stderr to
3313+
# avoid breaking existing pipelines that parse verify-claims JSON.
33003314
print(json.dumps([v.to_dict() for v in verdicts], indent=2))
33013315
else:
33023316
violated = 0
@@ -3314,6 +3328,19 @@ class _Edge:
33143328
else:
33153329
print(f"All {len(verdicts)} claim(s) CONFIRMED")
33163330

3331+
# INV-javam: warn to stderr when taint claims were evaluated against a
3332+
# repo whose languages have no taint catalog coverage. Even a "all
3333+
# confirmed" verdict is misleading when the language wasn't analyzed.
3334+
if has_taint_claims and unsupported_taint_languages:
3335+
langs = ", ".join(unsupported_taint_languages)
3336+
print(
3337+
f"\nNote: no taint-flow catalog for language(s): {langs}. "
3338+
"Claims touching these languages are NOT actually verified — "
3339+
"taint-flow has no sources/sinks to trace. Treat 'confirmed' "
3340+
"verdicts on these languages as inconclusive. (INV-javam)",
3341+
file=sys.stderr,
3342+
)
3343+
33173344
has_violations = any(v.verdict == "violated" for v in verdicts)
33183345
return 1 if has_violations else 0
33193346

packages/hypergumbo-core/tests/test_cli_verify_claims.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,3 +363,149 @@ def test_verify_claims_taint_no_sources(tmp_path: Path, capsys) -> None:
363363

364364
rc = cmd_verify_claims(args)
365365
assert rc == 0
366+
367+
368+
# ============================================================================
369+
# INV-javam: taint-flow surfaces languages it can't verify
370+
# ============================================================================
371+
372+
373+
def test_verify_claims_notice_for_unsupported_taint_language(
374+
tmp_path: Path, capsys,
375+
) -> None:
376+
"""INV-javam: when a repo has taint-flow claims but the language has
377+
no sources/sinks in the taint catalog, stderr carries an explicit
378+
notice. Otherwise 'confirmed' is misleading — the language wasn't
379+
actually analyzed.
380+
"""
381+
# Brainfuck has no taint catalog entries whatsoever; a claim against
382+
# a repo in that language will trivially 'confirm' without the notice.
383+
bmap = _make_behavior_map(
384+
nodes=[
385+
{"id": "brainfuck:m.bf:1:main:function", "name": "main",
386+
"kind": "function", "language": "brainfuck", "path": "m.bf",
387+
"span": {"start_line": 1, "end_line": 5}},
388+
],
389+
edges=[],
390+
)
391+
input_file = tmp_path / "hg.json"
392+
input_file.write_text(json.dumps(bmap))
393+
394+
claims = {
395+
"claims": [
396+
{
397+
"id": "TF-001",
398+
"text": "No secrets to disk",
399+
"constraint": {
400+
"taint_flow": {
401+
"source_taint": "secret",
402+
"prohibited_sink_zone": "host_fs",
403+
},
404+
},
405+
},
406+
],
407+
}
408+
claims_file = tmp_path / "claims.yaml"
409+
claims_file.write_text(yaml.dump(claims))
410+
411+
args = FakeArgs()
412+
args.path = str(tmp_path)
413+
args.input = str(input_file)
414+
args.claims = str(claims_file)
415+
args.json_output = False
416+
417+
rc = cmd_verify_claims(args)
418+
# Verdict is still "confirmed" (no taint findings) but the notice
419+
# must be present so humans don't misread the verdict as a pass.
420+
assert rc == 0
421+
_, err = capsys.readouterr()
422+
assert "brainfuck" in err
423+
assert "no taint-flow catalog" in err
424+
assert "NOT actually verified" in err
425+
assert "INV-javam" in err
426+
427+
428+
def test_verify_claims_no_notice_when_no_taint_claims(
429+
tmp_path: Path, capsys,
430+
) -> None:
431+
"""INV-javam anti-regression: the taint-flow notice only fires when
432+
taint claims are actually evaluated. Pure boundary claims on an
433+
unsupported language shouldn't trigger it.
434+
"""
435+
bmap = _make_behavior_map(
436+
nodes=[
437+
{"id": "brainfuck:m.bf:1:main:function", "name": "main",
438+
"kind": "function", "language": "brainfuck", "path": "m.bf",
439+
"span": {"start_line": 1, "end_line": 5}},
440+
],
441+
edges=[],
442+
)
443+
input_file = tmp_path / "hg.json"
444+
input_file.write_text(json.dumps(bmap))
445+
446+
claims = {
447+
"claims": [
448+
{"id": "SC-001", "text": "No net",
449+
"constraint": {"boundary": "net_send", "must_not_exist": True}},
450+
]
451+
}
452+
claims_file = tmp_path / "claims.yaml"
453+
claims_file.write_text(yaml.dump(claims))
454+
455+
args = FakeArgs()
456+
args.path = str(tmp_path)
457+
args.input = str(input_file)
458+
args.claims = str(claims_file)
459+
args.json_output = False
460+
461+
rc = cmd_verify_claims(args)
462+
assert rc == 0
463+
_, err = capsys.readouterr()
464+
assert "no taint-flow catalog" not in err
465+
466+
467+
def test_verify_claims_no_notice_when_taint_language_supported(
468+
tmp_path: Path, capsys,
469+
) -> None:
470+
"""INV-javam anti-regression: don't fire the notice when every
471+
detected language has taint-catalog coverage (no false alarm
472+
on fully-supported codebases).
473+
"""
474+
bmap = _make_behavior_map(
475+
nodes=[
476+
{"id": "python:a.py:1:f:function", "name": "f",
477+
"kind": "function", "language": "python", "path": "a.py",
478+
"span": {"start_line": 1, "end_line": 5}},
479+
],
480+
edges=[],
481+
)
482+
input_file = tmp_path / "hg.json"
483+
input_file.write_text(json.dumps(bmap))
484+
485+
claims = {
486+
"claims": [
487+
{
488+
"id": "TF-001",
489+
"text": "No plaintext to disk",
490+
"constraint": {
491+
"taint_flow": {
492+
"source_taint": "plaintext",
493+
"prohibited_sink_zone": "host_fs",
494+
},
495+
},
496+
},
497+
],
498+
}
499+
claims_file = tmp_path / "claims.yaml"
500+
claims_file.write_text(yaml.dump(claims))
501+
502+
args = FakeArgs()
503+
args.path = str(tmp_path)
504+
args.input = str(input_file)
505+
args.claims = str(claims_file)
506+
args.json_output = False
507+
508+
rc = cmd_verify_claims(args)
509+
assert rc == 0
510+
_, err = capsys.readouterr()
511+
assert "no taint-flow catalog" not in err

0 commit comments

Comments
 (0)