Skip to content

Commit b93678e

Browse files
author
jgstern-agent
committed
fix(io_boundary): add huggingface_hub + sentence_transformers to net_send catalog (WI-jihuj)
The 2026-04-23 self-audit found that hypergumbo's embeddings extra demonstrably contacts HuggingFace Hub (the dogfood run printed an "unauthenticated requests to the HF Hub" warning), but io-boundaries reported zero matching net_send chains. Root cause: python.yaml#net_send covered the low-level HTTP clients (requests, aiohttp.ClientSession, httpx.{Client,AsyncClient}) but the HF stack lives one layer above them — sentence_transformers' SentenceTransformer constructor calls into huggingface_hub, which calls requests internally. The io-boundary tracer can only see edges into hypergumbo's own source, not into vendored deps, so the wrapper-layer constructor + huggingface_hub.* entries are required to surface the chain. Additions to python.yaml#net_send: - huggingface_hub.snapshot_download / hf_hub_download (functions) - huggingface_hub.HfApi.{model_info, list_repo_files, download_file} - sentence_transformers.SentenceTransformer (constructor) Test test_python_catalog_has_huggingface_hub_net_send asserts all three are loaded as net_send primitives. All 243 io_boundary + catalog tests pass. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent 0628bb1 commit b93678e

4 files changed

Lines changed: 36 additions & 57 deletions

File tree

.ci/affected-tests.txt

Lines changed: 7 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,12 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-23T14:09:10-04:00
2+
# Generated by smart-test at 2026-04-23T16:34:08-04:00
33
# Mode: targeted
4-
# Baseline: d2c72ee02dc6adad34b7567209b7c0de210db152
5-
# Changed files: 6
6-
# Changed source files: 1
7-
# Selected tests: 51
4+
# Baseline: 7155820d35e44c8ef6a9551be48e9acc316241bc
5+
# Changed files: 8
6+
# Changed source files: 0
7+
0
8+
# Selected tests: 1
89
#
910
# === CHANGED_SOURCE_FILES ===
10-
packages/hypergumbo-core/src/hypergumbo_core/framework_patterns.py
1111
# === SELECTED_TESTS ===
12-
packages/hypergumbo-core/tests/BRANCHES_test_framework_patterns.py
13-
packages/hypergumbo-core/tests/test_backend_cli_flag.py
14-
packages/hypergumbo-core/tests/test_build_grammars.py
15-
packages/hypergumbo-core/tests/test_cli_basic.py
16-
packages/hypergumbo-core/tests/test_cli_cache.py
17-
packages/hypergumbo-core/tests/test_cli_commands.py
18-
packages/hypergumbo-core/tests/test_cli_config.py
19-
packages/hypergumbo-core/tests/test_cli_dead_code.py
20-
packages/hypergumbo-core/tests/test_cli_explain.py
21-
packages/hypergumbo-core/tests/test_cli_io_boundaries.py
22-
packages/hypergumbo-core/tests/test_cli_routes.py
23-
packages/hypergumbo-core/tests/test_cli_run_behavior_map.py
24-
packages/hypergumbo-core/tests/test_cli_search.py
25-
packages/hypergumbo-core/tests/test_cli_symbols.py
26-
packages/hypergumbo-core/tests/test_cli_test_coverage.py
27-
packages/hypergumbo-core/tests/test_cli_verify_claims.py
28-
packages/hypergumbo-core/tests/test_fastapi_patterns.py
29-
packages/hypergumbo-core/tests/test_file_excludes.py
30-
packages/hypergumbo-core/tests/test_framework_patterns.py
31-
packages/hypergumbo-core/tests/test_frameworks_flag.py
32-
packages/hypergumbo-core/tests/test_gitleaks.py
33-
packages/hypergumbo-core/tests/test_locale.py
34-
packages/hypergumbo-core/tests/test_max_tier.py
35-
packages/hypergumbo-core/tests/test_no_first_party_priority.py
36-
packages/hypergumbo-core/tests/test_profile.py
37-
packages/hypergumbo-core/tests/test_run_behavior_map.py
38-
packages/hypergumbo-core/tests/test_schema_compliance.py
39-
packages/hypergumbo-core/tests/test_sketch.py
40-
packages/hypergumbo-core/tests/test_sketch_sanity.py
41-
packages/hypergumbo-core/tests/test_slice_tier_filter.py
42-
packages/hypergumbo-core/tests/test_stable_shape_ids.py
43-
packages/hypergumbo-lang-common/tests/BRANCHES_test_dart.py
44-
packages/hypergumbo-lang-common/tests/BRANCHES_test_elixir.py
45-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_cpp.py
46-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_c.py
47-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_csharp.py
48-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_go.py
49-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_java.py
50-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_js_ts.py
51-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_kotlin.py
52-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_php.py
53-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_python_ast_analysis.py
54-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_ruby.py
55-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_rust.py
56-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_scala.py
57-
packages/hypergumbo-lang-mainstream/tests/BRANCHES_test_swift.py
58-
packages/hypergumbo-lang-mainstream/tests/test_html_analysis.py
59-
packages/hypergumbo-lang-mainstream/tests/test_java.py
60-
packages/hypergumbo-lang-mainstream/tests/test_js_ts.py
61-
packages/hypergumbo-lang-mainstream/tests/test_polyglot_call_site_coverage.py
62-
packages/hypergumbo-lang-mainstream/tests/test_python_ast_analysis.py
12+
packages/hypergumbo-core/tests/test_io_boundary.py

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ New `module_attr_ref` edge type emitted across six languages for attribute reads
3131
- **New flags `--taint-sources PATH` / `--taint-sinks PATH` / `--taint-sanitizers PATH`** (repeatable; each PATH is a YAML file or directory globbed as `*.yaml` in sorted order). Claims YAML can carry the same paths under a top-level `extra_catalogs: {sources, sinks, sanitizers}` key; relative paths resolve against the claims-file directory. User source/sink entries matching `(module, name, kind)` replace the auto-derived or built-in entry; user sanitizers concatenate.
3232
- **Public helper `hypergumbo_core.taint.load_full_taint_catalog(extra_source_paths, extra_sink_paths, extra_sanitizer_paths)`**; one-line stderr summary when extra catalogs are loaded.
3333

34+
#### IO catalog — HuggingFace Hub network primitives (WI-jihuj)
35+
36+
- **`io_primitives/python.yaml#net_send`** gains `huggingface_hub.{snapshot_download, hf_hub_download}`, `huggingface_hub.HfApi.{model_info, list_repo_files, download_file}`, and `sentence_transformers.SentenceTransformer`. Surfaced by the 2026-04-23 self-audit: hypergumbo's embeddings extra demonstrably contacts HF Hub (the dogfood run printed an "unauthenticated requests to the HF Hub" warning), but io-boundaries reported zero matching `net_send` chains because the HF stack lives one layer above the `requests` / `httpx` clients the catalog already covered. Adding the wrapper-layer entries lets `verify-claims` reason about the embeddings install's network surface without having to walk into third-party code.
37+
3438
### Changed
3539

3640
#### Taint catalog auto-derivation (WI-lokuv)

packages/hypergumbo-core/src/hypergumbo_core/io_primitives/python.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,18 @@ net_send:
112112
notes: "httpx sync/async HTTP client"
113113
- module: httpx.AsyncClient
114114
methods: [get, post, put, delete, head, patch, request, send]
115+
# WI-jihuj: high-level network clients that wrap requests/httpx but live one
116+
# layer above what the io-boundary tracer can see by walking edges in
117+
# third-party code. Covers the embeddings extra's HF Hub download surface
118+
# (sentence_transformers.SentenceTransformer constructor → hf_hub).
119+
- module: huggingface_hub
120+
functions: [snapshot_download, hf_hub_download]
121+
notes: "HF Hub model/file download"
122+
- module: huggingface_hub.HfApi
123+
methods: [model_info, list_repo_files, download_file]
124+
- module: sentence_transformers
125+
functions: [SentenceTransformer]
126+
notes: "Constructor downloads from HF Hub on first use (model not cached locally)"
115127

116128
net_recv:
117129
- module: socket.socket

packages/hypergumbo-core/tests/test_io_boundary.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,19 @@ def test_python_catalog_has_net_send(self) -> None:
8080
names = {p.qualified_name for p in net_sends}
8181
assert "socket.socket.send" in names
8282

83+
def test_python_catalog_has_huggingface_hub_net_send(self) -> None:
84+
# WI-jihuj: 2026-04-23 self-audit found that hypergumbo's embeddings
85+
# extra demonstrably hits HuggingFace Hub (the dogfood run printed
86+
# "unauthenticated requests to the HF Hub"), but io-boundaries
87+
# reported zero net_send chains for it because huggingface_hub /
88+
# sentence_transformers were missing from the catalog.
89+
catalog = load_catalog("python")
90+
net_sends = {p.qualified_name for p in catalog.primitives
91+
if p.boundary == "net_send"}
92+
assert "huggingface_hub.snapshot_download" in net_sends
93+
assert "huggingface_hub.hf_hub_download" in net_sends
94+
assert "sentence_transformers.SentenceTransformer" in net_sends
95+
8396
def test_python_catalog_has_subprocess(self) -> None:
8497
catalog = load_catalog("python")
8598
subprocs = [p for p in catalog.primitives if p.boundary == "subprocess"]

0 commit comments

Comments
 (0)