Skip to content

Commit 688fe97

Browse files
author
jgstern-agent
committed
fix(io-boundaries): distinguish "no I/O" from "language unsupported" (INV-javam)
UAT-2026-04-13 DQ-03 + DQ-04 flagged the silent-failure class: `hypergumbo io-boundaries` on a codebase containing a language with no I/O primitive catalog returned zero boundaries with no warning. Output identical to a genuinely I/O-free codebase, and downstream taint-flow trivially passed every claim (false security confidence). Even as organic catalog expansion lands (WI-banaf TypeScript, WI-vibur Elixir, WI-sakan Java, WI-rujos Kotlin), the invariant has to hold independently: hypergumbo will always have some language with no catalog, and "zero" must never silently mean "we didn't look." The fix is narrow: 1. IoBoundaryCatalog gains an `is_supported: bool` field. Default True. `load_catalog()` sets it to False when no YAML file (and no alias / parent) resolves. 2. New `io_boundary.is_language_supported(lang)` helper exposes the flag to callers without materializing the full catalog. 3. `cmd_io_boundaries` tracks unsupported languages separately from supported-but-zero-matches languages and: - Prints a stderr notice ("no I/O primitive catalog for language(s): X, Y. Zero boundaries reported for these languages does NOT mean the code is I/O-free — INV-javam") so humans aren't misled. - Adds `unsupported_languages: [...]` to the JSON output (stable schema — always present, empty list when every detected language is supported) so programmatic consumers like taint-flow can refuse to assert success on unsupported code. Taint-flow consumption of this signal is deferred to a sibling PR. The current change delivers the io-boundaries side, which is what every downstream checker needs anyway. 9 new tests: - Catalog-level: nonexistent language returns is_supported=False, supported languages return True, aliased languages (typescript, cpp) inherit True from their alias target, parent-inherited languages (scala, kotlin, elixir) return True, and the module-level helper mirrors the flag. - cmd-level: the stderr notice fires for a brainfuck-language node, does NOT fire on a python-only codebase, the JSON output surfaces `unsupported_languages: ["brainfuck", "nim"]`, and emits `unsupported_languages: []` when every language is supported (stable schema guard). Tracker: moved INV-javam from `violated` to `pending_validation` — bakeoff will confirm the notice fires on real unsupported repos (TypeScript/Nim/Solidity/Elixir) and doesn't on fully-supported ones. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent c3d049f commit 688fe97

6 files changed

Lines changed: 227 additions & 4 deletions

File tree

.ci/affected-tests.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-15T09:17:20-04:00
2+
# Generated by smart-test at 2026-04-15T09:45:01-04:00
33
# Mode: targeted
44
# Baseline: 02dba9744d2c86e26f06565aad4ebcae7ef0f4a8
5-
# Changed files: 35
5+
# Changed files: 37
66
# Changed source files: 7
77
# Selected tests: 66
88
#

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ This changelog tracks the **tool version** (package releases). The **schema vers
2323

2424
### Fixed
2525

26+
- **io-boundaries distinguishes "no I/O detected" from "language unsupported"** (INV-javam / UAT DQ-03+DQ-04): previously, `hypergumbo io-boundaries` on a codebase containing an unsupported language (pre-banaf TypeScript, pre-vibur Elixir, pre-rujos Kotlin, Solidity, Nim, ...) returned zero boundaries with no warning — output identical to a genuinely I/O-free codebase, and downstream taint-flow assertions trivially passed with false security confidence. `IoBoundaryCatalog` now carries an `is_supported: bool` field (False when no YAML / alias / parent resolves), `io_boundary.is_language_supported(lang)` exposes this to callers, `cmd_io_boundaries` emits a stderr notice ("no I/O primitive catalog for language(s): X, Y. Zero boundaries reported for these languages does NOT mean the code is I/O-free — INV-javam"), and the JSON output includes a stable `unsupported_languages: []` field so programmatic consumers can detect the condition. Complements the organic catalog expansion in WI-banaf/sakan/rujos/vibur: even after coverage grows, some languages will always lack catalogs and the invariant needs to hold.
2627
- **Laravel `apiResource()` no longer emits phantom HTML-form routes; `.except()` / `.only()` honored** (WI-jorim / UAT BUG-07): `Route::apiResource('posts', PostController::class)` now produces 5 routes (index/store/show/update/destroy) instead of 7 — the `GET /create` and `GET /{id}/edit` HTML-form routes that don't exist for an API resource are dropped. Chained `.except([...])` / `.only([...])` modifiers (variadic strings or array literal) are now parsed and applied to both `resource` and `apiResource`. Multiple modifiers compose in source-order. Variable args (e.g. `->except($actions)`) and unrelated chained methods (e.g. `->name(...)`, `->middleware(...)`) are correctly ignored. On koel: ~40 phantom routes were eliminated (~19% of the 207 originally reported). New constant `LARAVEL_RESOURCE_ACTIONS` and `LARAVEL_API_RESOURCE_EXCLUDED_ACTIONS` make the action set self-documenting.
2728
- **Subcommand parser cleanup** (WI-balij / UAT UX-03 + UX-04): two argparse plumbing rough edges.
2829
- `hypergumbo foobar` no longer silently inserts `sketch` and reports `path does not exist`. The dispatch in `main()` now detects when the first positional looks like a subcommand attempt (no path separators, no leading `.`/`~`, doesn't exist on disk) and prints `'foobar' is not a valid subcommand` plus a `Did you mean: ...` line via `difflib.get_close_matches`. Exits 2.

packages/hypergumbo-core/src/hypergumbo_core/cli.py

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2925,9 +2925,17 @@ class _Edge:
29252925
languages.add(lang)
29262926

29272927
# Load catalogs for detected languages
2928+
# INV-javam: track unsupported languages (no catalog) separately from
2929+
# supported-but-zero-matches languages. The former must be surfaced
2930+
# to callers so "zero I/O detected" isn't silently indistinguishable
2931+
# from "language has no catalog at all".
29282932
catalogs = {}
2929-
for lang in languages:
2933+
unsupported_languages: list[str] = []
2934+
for lang in sorted(languages):
29302935
catalog = load_catalog(lang)
2936+
if not catalog.is_supported:
2937+
unsupported_languages.append(lang)
2938+
continue
29312939
if catalog.primitives:
29322940
catalogs[lang] = catalog
29332941
# Also key by the catalog's base language so edge-prefix lookups
@@ -3002,15 +3010,44 @@ def _is_test_chain(chain: Any) -> bool:
30023010
}
30033011
else:
30043012
output = bmap.to_dict()
3013+
# INV-javam: expose unsupported-language signal to programmatic
3014+
# consumers. Empty list when every detected language has a catalog.
3015+
output["unsupported_languages"] = unsupported_languages
30053016
print(json.dumps(output, indent=2, sort_keys=True))
30063017
elif getattr(args, "by_file", False):
30073018
_print_io_boundaries_by_file(filtered_entries, nodes_by_id, repo_root)
3019+
_print_unsupported_languages_notice(unsupported_languages)
30083020
else:
30093021
_print_io_boundaries_by_type(filtered_entries, nodes_by_id, bmap, repo_root)
3022+
_print_unsupported_languages_notice(unsupported_languages)
30103023

30113024
return 0
30123025

30133026

3027+
def _print_unsupported_languages_notice(
3028+
unsupported_languages: list[str],
3029+
) -> None:
3030+
"""INV-javam: emit an explicit notice when the repo contains languages
3031+
with no I/O primitive catalog.
3032+
3033+
Without this, the human-readable output for an unsupported language
3034+
is indistinguishable from a genuinely I/O-free codebase — and
3035+
downstream taint-flow trivially passes every claim on those
3036+
languages (false security confidence). The notice runs to stderr so
3037+
piping the boundary report to a file / grep / jq is unaffected.
3038+
"""
3039+
if not unsupported_languages:
3040+
return
3041+
langs = ", ".join(unsupported_languages)
3042+
print(
3043+
f"\nNote: no I/O primitive catalog for language(s): {langs}. "
3044+
"Zero boundaries reported for these languages does NOT mean "
3045+
"the code is I/O-free — it means hypergumbo cannot detect I/O "
3046+
"for this language yet. (INV-javam)",
3047+
file=sys.stderr,
3048+
)
3049+
3050+
30143051
def _format_io_caller(
30153052
symbol_id: str,
30163053
nodes_by_id: Dict[str, Any],

packages/hypergumbo-core/src/hypergumbo_core/io_boundary.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,13 @@ class IoBoundaryCatalog:
120120
language: str
121121
primitives: list[IoPrimitive] = field(default_factory=list)
122122
ambiguous_names: frozenset[str] = field(default_factory=frozenset)
123+
# INV-javam: True when a YAML catalog (or alias/parent) was loaded
124+
# for this language. False when no catalog exists — this is the
125+
# signal callers (io-boundaries, taint-flow) use to distinguish
126+
# "found zero I/O" from "language unsupported". Silent zeros are
127+
# the class of bug the invariant guards against: output identical
128+
# to a clean codebase, plus false security confidence in taint-flow.
129+
is_supported: bool = True
123130
_by_qualified: dict[str, IoPrimitive] = field(
124131
default_factory=dict, repr=False,
125132
)
@@ -321,6 +328,14 @@ def _from_dict(cls, data: dict) -> IoBoundaryCatalog:
321328
}
322329

323330

331+
def is_language_supported(language: str) -> bool:
332+
"""True if ``language`` has an I/O primitive catalog (directly, via
333+
alias, or with a parent). Callers use this to distinguish "found
334+
zero I/O" from "language unsupported" — the INV-javam invariant.
335+
"""
336+
return load_catalog(language).is_supported
337+
338+
324339
def load_catalog(language: str) -> IoBoundaryCatalog:
325340
"""Load the I/O primitive catalog for a language.
326341
@@ -337,7 +352,10 @@ def load_catalog(language: str) -> IoBoundaryCatalog:
337352
if alias:
338353
path = _CATALOG_DIR / f"{alias}.yaml"
339354
if not path.exists():
340-
return IoBoundaryCatalog(language=language)
355+
# INV-javam: no catalog file (and no alias resolving to one) —
356+
# callers use is_supported to emit explicit "language
357+
# unsupported" output instead of silently returning zero I/O.
358+
return IoBoundaryCatalog(language=language, is_supported=False)
341359
catalog = IoBoundaryCatalog.from_yaml(path)
342360

343361
# Merge parent catalog if defined (e.g. scala inherits java entries)

packages/hypergumbo-core/tests/test_cli_io_boundaries.py

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -961,3 +961,134 @@ def test_test_chains_excluded_by_default(self, tmp_path, capsys):
961961
)
962962
chains = data["boundaries"]["fs_write"]["chains"]
963963
assert "test" not in chains[0]["io_edge_src"]
964+
965+
966+
# ============================================================================
967+
# INV-javam: distinguish "no I/O detected" from "language unsupported"
968+
# ============================================================================
969+
970+
971+
def test_cmd_io_boundaries_emits_notice_for_unsupported_language(
972+
tmp_path: Path, capsys,
973+
) -> None:
974+
"""INV-javam: when the behavior map contains nodes in a language with
975+
no IO primitive catalog, io-boundaries prints an explicit unsupported-
976+
language notice to stderr. Without this, zero boundaries is
977+
indistinguishable from a truly I/O-free codebase.
978+
"""
979+
# A node in a language ("brainfuck") hypergumbo has no catalog for.
980+
bmap = _make_behavior_map(
981+
nodes=[
982+
{
983+
"id": "brainfuck:main.bf:1-3:main:function",
984+
"name": "main",
985+
"kind": "function",
986+
"language": "brainfuck",
987+
"path": "main.bf",
988+
"span": {"start_line": 1, "end_line": 3},
989+
},
990+
],
991+
edges=[],
992+
)
993+
args = _make_args(tmp_path, bmap)
994+
rc = cmd_io_boundaries(args)
995+
assert rc == 0
996+
997+
_, err = capsys.readouterr()
998+
assert "brainfuck" in err
999+
assert "no I/O primitive catalog" in err
1000+
assert "does NOT mean" in err # the key "don't misread zero" framing
1001+
assert "INV-javam" in err
1002+
1003+
1004+
def test_cmd_io_boundaries_json_output_exposes_unsupported_languages(
1005+
tmp_path: Path, capsys,
1006+
) -> None:
1007+
"""INV-javam: JSON consumers get an `unsupported_languages` field so
1008+
programmatic pipelines (e.g., taint-flow) can refuse to assert
1009+
success when the language isn't actually supported.
1010+
"""
1011+
bmap = _make_behavior_map(
1012+
nodes=[
1013+
{
1014+
"id": "brainfuck:main.bf:1-3:main:function",
1015+
"name": "main",
1016+
"kind": "function",
1017+
"language": "brainfuck",
1018+
"path": "main.bf",
1019+
"span": {"start_line": 1, "end_line": 3},
1020+
},
1021+
{
1022+
"id": "nim:src/thing.nim:1-5:main:function",
1023+
"name": "main",
1024+
"kind": "function",
1025+
"language": "nim",
1026+
"path": "src/thing.nim",
1027+
"span": {"start_line": 1, "end_line": 5},
1028+
},
1029+
],
1030+
edges=[],
1031+
)
1032+
args = _make_args(tmp_path, bmap, json_output=True)
1033+
rc = cmd_io_boundaries(args)
1034+
assert rc == 0
1035+
1036+
data = json.loads(capsys.readouterr().out)
1037+
assert "unsupported_languages" in data
1038+
# Both languages have no catalog; list is sorted alphabetically
1039+
assert data["unsupported_languages"] == ["brainfuck", "nim"]
1040+
1041+
1042+
def test_cmd_io_boundaries_no_notice_when_all_languages_supported(
1043+
tmp_path: Path, capsys,
1044+
) -> None:
1045+
"""INV-javam: the notice must NOT fire when every detected language
1046+
has a catalog. Anti-regression — we don't want to spam users on
1047+
fully-supported codebases.
1048+
"""
1049+
bmap = _make_behavior_map(
1050+
nodes=[
1051+
{
1052+
"id": "python:src/api.py:1-5:main:function",
1053+
"name": "main",
1054+
"kind": "function",
1055+
"language": "python",
1056+
"path": "src/api.py",
1057+
"span": {"start_line": 1, "end_line": 5},
1058+
},
1059+
],
1060+
edges=[],
1061+
)
1062+
args = _make_args(tmp_path, bmap)
1063+
rc = cmd_io_boundaries(args)
1064+
assert rc == 0
1065+
1066+
_, err = capsys.readouterr()
1067+
assert "no I/O primitive catalog" not in err
1068+
assert "INV-javam" not in err
1069+
1070+
1071+
def test_cmd_io_boundaries_json_output_empty_unsupported_when_all_supported(
1072+
tmp_path: Path, capsys,
1073+
) -> None:
1074+
"""INV-javam: programmatic consumers always get the field, even when
1075+
empty — keeps the JSON schema stable."""
1076+
bmap = _make_behavior_map(
1077+
nodes=[
1078+
{
1079+
"id": "python:src/api.py:1-5:main:function",
1080+
"name": "main",
1081+
"kind": "function",
1082+
"language": "python",
1083+
"path": "src/api.py",
1084+
"span": {"start_line": 1, "end_line": 5},
1085+
},
1086+
],
1087+
edges=[],
1088+
)
1089+
args = _make_args(tmp_path, bmap, json_output=True)
1090+
rc = cmd_io_boundaries(args)
1091+
assert rc == 0
1092+
1093+
data = json.loads(capsys.readouterr().out)
1094+
assert data["unsupported_languages"] == []

packages/hypergumbo-core/tests/test_io_boundary.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,42 @@ def test_load_nonexistent_language_returns_empty(self) -> None:
439439
assert catalog.language == "brainfuck"
440440
assert len(catalog.primitives) == 0
441441

442+
def test_unsupported_language_flagged_is_supported_false(self) -> None:
443+
"""INV-javam: a language with no catalog/alias/parent returns
444+
is_supported=False so callers can distinguish "found zero I/O"
445+
from "language unsupported".
446+
"""
447+
catalog = load_catalog("brainfuck")
448+
assert catalog.is_supported is False
449+
450+
def test_supported_language_is_supported_true(self) -> None:
451+
"""INV-javam: a language with a catalog returns is_supported=True."""
452+
assert load_catalog("python").is_supported is True
453+
assert load_catalog("java").is_supported is True
454+
455+
def test_alias_language_is_supported(self) -> None:
456+
"""INV-javam: an aliased language (typescript → javascript) is
457+
considered supported because the alias catalog loads.
458+
"""
459+
assert load_catalog("typescript").is_supported is True
460+
assert load_catalog("cpp").is_supported is True
461+
462+
def test_parent_language_is_supported(self) -> None:
463+
"""INV-javam: a language with a parent catalog (scala → java,
464+
kotlin → java, elixir → erlang) is considered supported.
465+
"""
466+
assert load_catalog("scala").is_supported is True
467+
assert load_catalog("kotlin").is_supported is True
468+
assert load_catalog("elixir").is_supported is True
469+
470+
def test_is_language_supported_helper(self) -> None:
471+
"""INV-javam: module-level helper mirrors the catalog flag for
472+
callers that don't want to materialize the full catalog.
473+
"""
474+
from hypergumbo_core.io_boundary import is_language_supported
475+
assert is_language_supported("python") is True
476+
assert is_language_supported("brainfuck") is False
477+
442478
def test_cpp_alias_loads_c_catalog(self) -> None:
443479
"""C++ has no dedicated catalog but falls back to C via alias."""
444480
catalog = load_catalog("cpp")

0 commit comments

Comments
 (0)