Skip to content

Commit 2ece852

Browse files
author
jgstern-agent
committed
fix(io_boundary): reclassify Python stdio from ipc_send to logging (WI-tolif)
The 2026-04-23 self-audit found that 70 of hypergumbo's 77 ipc_send chains were just sys.stderr writes — cli.py progress output, warnings, error messages. Same false-positive class that drove Go's log / log/slog / fmt to be moved out of ipc_send into the dedicated logging boundary back when alertmanager was producing 134 such FPs (see test_go_catalog_slog_logging). Move sys.stdout and sys.stderr from python.yaml#ipc_send to a new python.yaml#logging block. sys.stdin stays in ipc_recv — it CAN receive untrusted piped input from the parent process, which is a real IPC threat-model concern, not a cosmetic one. No taint-flow regression: AUTO_SINK_ZONE_MAP intentionally does not include 'logging' (matching how Go's log writes are treated), so moving stdio there means hypergumbo no longer auto-derives stdio as a taint sink. Project-local catalogs that want stdout/stderr treated as a disclosure sink can still declare their own taint_sinks entries via verify-claims --taint-sinks (WI-votan flag). Cross-language analogs (c.yaml, javascript.yaml, rust.yaml, scala.yaml, elixir.yaml, haskell.yaml) tracked as scope-expansion WI-dutah for a follow-up PR. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent b93678e commit 2ece852

4 files changed

Lines changed: 35 additions & 4 deletions

File tree

.ci/affected-tests.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-23T16:34:08-04:00
2+
# Generated by smart-test at 2026-04-23T16:51:30-04:00
33
# Mode: targeted
44
# Baseline: 7155820d35e44c8ef6a9551be48e9acc316241bc
55
# Changed files: 8

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ New `module_attr_ref` edge type emitted across six languages for attribute reads
3535

3636
- **`io_primitives/python.yaml#net_send`** gains `huggingface_hub.{snapshot_download, hf_hub_download}`, `huggingface_hub.HfApi.{model_info, list_repo_files, download_file}`, and `sentence_transformers.SentenceTransformer`. Surfaced by the 2026-04-23 self-audit: hypergumbo's embeddings extra demonstrably contacts HF Hub (the dogfood run printed an "unauthenticated requests to the HF Hub" warning), but io-boundaries reported zero matching `net_send` chains because the HF stack lives one layer above the `requests` / `httpx` clients the catalog already covered. Adding the wrapper-layer entries lets `verify-claims` reason about the embeddings install's network surface without having to walk into third-party code.
3737

38+
#### IO catalog — Python stdio reclassified from ipc_send to logging (WI-tolif)
39+
40+
- **`io_primitives/python.yaml`**: `sys.stdout` and `sys.stderr` move out of `ipc_send` into a new `logging` block. Same fix Go's `log` / `log/slog` / `fmt` already received (see `test_go_catalog_slog_logging`): writing to stdout/stderr is terminal/log output, not inter-process communication in any threat-model sense, and classifying it as `ipc_send` produced 70 false-positive chains in hypergumbo's own self-analysis (out of 77 ipc_send chains total). `sys.stdin` stays in `ipc_recv` — it can carry untrusted piped input from the parent process, which is a real IPC threat-model concern. Cross-language analogs (`c.yaml`, `javascript.yaml`, `rust.yaml`, `scala.yaml`) tracked separately.
41+
3842
### Changed
3943

4044
#### Taint catalog auto-derivation (WI-lokuv)

packages/hypergumbo-core/src/hypergumbo_core/io_primitives/python.yaml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -157,14 +157,21 @@ net_recv:
157157
notes: "aiohttp web server"
158158

159159
ipc_send:
160-
- module: sys
161-
attributes: [stdout, stderr]
162-
notes: "Writing to stdout/stderr sends data to the parent process or terminal"
163160
- module: multiprocessing.Queue
164161
methods: [put, put_nowait]
165162
- module: multiprocessing.Pipe
166163
methods: [send, send_bytes]
167164

165+
logging:
166+
# WI-tolif: stdio writes are terminal output / log destinations, not
167+
# inter-process communication in any threat-model sense — same reason
168+
# Go's log/slog/fmt sit in `logging` rather than `ipc_send`. Keeping
169+
# them under ipc_send produced 70+ false positives per repo
170+
# (hypergumbo's own self-audit on 2026-04-23 surfaced this).
171+
- module: sys
172+
attributes: [stdout, stderr]
173+
notes: "Terminal / log output; redirectable to file or pipe but not by itself an IPC channel"
174+
168175
ipc_recv:
169176
- module: sys
170177
attributes: [stdin]

packages/hypergumbo-core/tests/test_io_boundary.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,26 @@ def test_python_catalog_has_net_send(self) -> None:
8080
names = {p.qualified_name for p in net_sends}
8181
assert "socket.socket.send" in names
8282

83+
def test_python_catalog_stdio_is_logging_not_ipc_send(self) -> None:
84+
# WI-tolif: 2026-04-23 self-audit found that 70 of hypergumbo's 77
85+
# ipc_send chains were just sys.stderr writes (cli.py progress
86+
# output, warnings) — the same false-positive class that drove
87+
# Go's log/slog/fmt to be moved out of ipc_send into logging
88+
# (see test_go_catalog_slog_logging). Same fix here for Python:
89+
# stdout/stderr are terminal output, not inter-process communication.
90+
catalog = load_catalog("python")
91+
for attr in ("stdout", "stderr"):
92+
hit = catalog.lookup_with_module(attr, "sys")
93+
assert hit is not None, f"sys.{attr} should be in the Python IO catalog"
94+
assert hit.boundary == "logging", (
95+
f"sys.{attr} should be classified as logging, not {hit.boundary}"
96+
)
97+
# sys.stdin stays in ipc_recv — it can carry untrusted piped input
98+
# from the parent process (a real IPC threat model, not just terminal echo).
99+
hit = catalog.lookup_with_module("stdin", "sys")
100+
assert hit is not None
101+
assert hit.boundary == "ipc_recv"
102+
83103
def test_python_catalog_has_huggingface_hub_net_send(self) -> None:
84104
# WI-jihuj: 2026-04-23 self-audit found that hypergumbo's embeddings
85105
# extra demonstrably hits HuggingFace Hub (the dogfood run printed

0 commit comments

Comments
 (0)