You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: scrub internal references from CHANGELOG Summary + migration guide
Follow-up to the previous commit (rename-only). Actually applies the
content scrub to:
- ``docs/MIGRATION-IO-BOUNDARIES.md``: removed internal plan / PR
ordinal codenames, tracker IDs, and the agent-workspace plan-file
path from the Reference section. The technical content (strict-
stdlib rule, status declarations, stdlib_provenance shape, allowlist,
external_potential bucket, JSON diff, per-consumer "what to do if
you..." sections) is unchanged.
- ``CHANGELOG.md`` ``[Unreleased]`` Summary: removed the internal plan
/ PR-ordinal codenames and two stray tracker IDs that were
distracting a reader scanning the release notes. The UX-delta
bullets (chain-count drops, external_potential section, per-
language reliability declaration, JSON shape change) and the
reader-facing framing are unchanged. Updated the pointer to the
renamed migration guide.
No impact on code or tests; docs-only change.
Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,16 +12,18 @@ This changelog tracks the **tool version** (package releases). The **schema vers
12
12
13
13
### Summary
14
14
15
-
**`hypergumbo io-boundaries` reports differently than before.**Plan C — a three-PR refactor of the I/O catalog system — reshapes the output. When you re-run the tool on a repo you've analyzed before, expect:
15
+
**`hypergumbo io-boundaries` reports differently than before.**The I/O catalog system has been refactored. When you re-run the tool on a repo you've analyzed before, expect:
16
16
17
17
1.**`net_send` / `fs_read` / `fs_write` / `db_*` / `logging` chain counts drop** on any repo using third-party I/O wrappers (Python `requests` / `httpx` / `aiohttp`; Java OkHttp / Spring Web / Apache HttpClient / Hibernate / SLF4J / Log4j; JavaScript `axios` / `node-fetch` / `express` / Fastify / Koa; Rust `tokio::fs` / `hyper` / `reqwest` / `axum`; Scala fs2 / cats-effect / sttp / akka / ZIO / Play; Kotlin ktor / Exposed / `kotlin-logging`). Those wrappers were removed from the catalog under a strict-stdlib rule — adding one entry per popular wrapper was a maintenance treadmill with no principled stopping point.
18
18
2.**A new `external_potential` boundary section appears** in both text and `--json` output, containing every first-party call that reaches into a tier-3 external boundary node not classified by any catalog. That covers the wrappers culled in (1) plus everything that was never in the catalog (HuggingFace Hub, sentence-transformers, any other third-party library you happen to use). The bucket is the structural answer to "the catalog can't enumerate every popular wrapper" — surface untrusted-territory reach as its own first-class signal instead of growing the catalog.
19
-
3.**Per-language reliability is now declared.** Python's catalog declares `status: complete` with a `stdlib_provenance` citation pointing at `docs.python.org/3.13`. The other 12 catalogs (`c`, `elixir`, `erlang`, `go`, `haskell`, `java`, `javascript`, `kotlin`, `objc`, `rust`, `scala`, `swift`) declare `status: in_progress` until follow-up PRs audit them against their official stdlib documentation. For `external_potential` chains whose source language is `in_progress`, an `[unreliable]` marker (text) / `dst_classification_unreliable: true` field (JSON) flags the chain so consumers know absence-of-catalog-hit is not authoritative for that language yet.
20
-
4.**JSON shape change.**`boundaries.external_potential` is a new top-level key, and every `boundaries.<type>.chains[]` dict gains a `dst_classification_unreliable` field. Catalogs declaring `status: complete` (explicit OR defaulted) without a valid `stdlib_provenance` URL now hard-error at load time; URLs must be `https://` and their hostname must suffix-match `ALLOWED_PROVENANCE_HOSTNAME_SUFFIXES` (an allowlist of official stdlib documentation hosts in `io_boundary.py`). See **`docs/MIGRATION-PLAN-C.md`** for the full migration guide.
19
+
3.**Per-language reliability is now declared.** Python's catalog declares `status: complete` with a `stdlib_provenance` citation pointing at `docs.python.org/3.13`. The other 12 catalogs (`c`, `elixir`, `erlang`, `go`, `haskell`, `java`, `javascript`, `kotlin`, `objc`, `rust`, `scala`, `swift`) declare `status: in_progress` until follow-up work audits them against their official stdlib documentation. For `external_potential` chains whose source language is `in_progress`, an `[unreliable]` marker (text) / `dst_classification_unreliable: true` field (JSON) flags the chain so consumers know absence-of-catalog-hit is not authoritative for that language yet.
20
+
4.**JSON shape change.**`boundaries.external_potential` is a new top-level key, and every `boundaries.<type>.chains[]` dict gains a `dst_classification_unreliable` field. Catalogs declaring `status: complete` (explicit OR defaulted) without a valid `stdlib_provenance` URL now hard-error at load time; URLs must be `https://` and their hostname must suffix-match an allowlist of official stdlib documentation hosts. See **`docs/MIGRATION-IO-BOUNDARIES.md`** for the full migration guide.
21
21
22
-
For `verify-claims` users specifically: `IoChain.dst_tier` (PR2 of stop-stripping) plus `external_potential`(Plan C, PR C) together let you reason about untrusted-territory reach without per-library catalog growth. The original WI-jihuj concern that drove the audit — "can I `verify-claims` that hypergumbo doesn't make network calls if it uses HuggingFace Hub?" — is now structurally answered: HF Hub doesn't appear in `net_send`, but its calls surface in `external_potential` with `dst_tier=3`.
22
+
For `verify-claims` users specifically: the destination supply-chain tier on each chain, plus the new `external_potential`bucket, together let you reason about untrusted-territory reach without per-library catalog growth. An earlier open question — "can I `verify-claims` that hypergumbo doesn't make network calls if it uses HuggingFace Hub?" — is now structurally answered: HF Hub doesn't appear in `net_send`, but its calls surface in `external_potential` with `dst_tier=3`.
23
23
24
-
**Earlier in [Unreleased] (pre-Plan-C work):** External boundary nodes — synthetic `Symbol` records minted for every dangling edge endpoint (stdlib calls, third-party imports) — are now serialized into `behavior_map["nodes"]` with `kind="external_symbol"`, `path="<external>"`, `meta.external_boundary=True`, and `supply_chain.tier` populated. They were previously stripped before output, which silently re-introduced the dangling-reference problem WI-sikur was meant to fix for every disk-load consumer (`cmd_slice`, `cmd_test_coverage`, `cmd_verify_claims`, etc.). INV-miniz transitions to satisfied. Display surfaces (sketch, compact, search) continue to filter boundary nodes via the new `ir.is_external_boundary()` helper, so display-side output is unchanged. Schema bumped 0.2.2 → 0.2.3 (additive: `external_symbol` added to `Symbol.kind` enum; Span `start_line` / `end_line` minimum loosened from 1 to 0 for synthetic zero-span nodes).
24
+
Separately in this release:
25
+
26
+
External boundary nodes — synthetic `Symbol` records minted for every dangling edge endpoint (stdlib calls, third-party imports) — are now serialized into `behavior_map["nodes"]` with `kind="external_symbol"`, `path="<external>"`, `meta.external_boundary=True`, and `supply_chain.tier` populated. They were previously stripped before output, which silently re-introduced the dangling-reference problem for every disk-load consumer (`cmd_slice`, `cmd_test_coverage`, `cmd_verify_claims`, etc.). Display surfaces (sketch, compact, search) continue to filter boundary nodes via the new `ir.is_external_boundary()` helper, so display-side output is unchanged. Schema bumped 0.2.2 → 0.2.3 (additive: `external_symbol` added to `Symbol.kind` enum; Span `start_line` / `end_line` minimum loosened from 1 to 0 for synthetic zero-span nodes).
25
27
26
28
`hypergumbo io-boundaries` also gained detection of attribute-style IO primitives — `os.environ`, `sys.argv`, `process.env`, `System.out`, `os.Stdout`, `std::env::consts::OS`, and bare `stdout` / `stderr` / `stdin` in C — across Python, Go, JavaScript/TypeScript, Java, Rust (tree-sitter), and C. These entries were declared in the YAML catalog but had no matching analyzer edges, so every chain through them was invisible. `verify-claims` also gains `--taint-sources`, `--taint-sinks`, `--taint-sanitizers` flags (and an `extra_catalogs:` key in the claims YAML) for project-local trust zones, sanitizers, and label maps. Browser-local reads (`localStorage.getItem`, `indexedDB.open`, `caches.*`) are no longer misreported as host-filesystem reads.
0 commit comments