Skip to content

Commit ad02593

Browse files
author
jgstern-agent
committed
docs: scrub internal references from CHANGELOG Summary + migration guide
Follow-up to the previous commit (rename-only). Actually applies the content scrub to: - ``docs/MIGRATION-IO-BOUNDARIES.md``: removed internal plan / PR ordinal codenames, tracker IDs, and the agent-workspace plan-file path from the Reference section. The technical content (strict- stdlib rule, status declarations, stdlib_provenance shape, allowlist, external_potential bucket, JSON diff, per-consumer "what to do if you..." sections) is unchanged. - ``CHANGELOG.md`` ``[Unreleased]`` Summary: removed the internal plan / PR-ordinal codenames and two stray tracker IDs that were distracting a reader scanning the release notes. The UX-delta bullets (chain-count drops, external_potential section, per- language reliability declaration, JSON shape change) and the reader-facing framing are unchanged. Updated the pointer to the renamed migration guide. No impact on code or tests; docs-only change. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
1 parent 98c6243 commit ad02593

3 files changed

Lines changed: 50 additions & 57 deletions

File tree

.ci/affected-tests.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Test selection manifest
2-
# Generated by smart-test at 2026-04-24T12:54:52-04:00
2+
# Generated by smart-test at 2026-04-24T13:13:15-04:00
33
# Mode: targeted
44
# Baseline: db497c028ce764d25808f87a205e8a01c315766e
55
# Reason: no Python source files changed

CHANGELOG.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,18 @@ This changelog tracks the **tool version** (package releases). The **schema vers
1212

1313
### Summary
1414

15-
**`hypergumbo io-boundaries` reports differently than before.** Plan C — a three-PR refactor of the I/O catalog system — reshapes the output. When you re-run the tool on a repo you've analyzed before, expect:
15+
**`hypergumbo io-boundaries` reports differently than before.** The I/O catalog system has been refactored. When you re-run the tool on a repo you've analyzed before, expect:
1616

1717
1. **`net_send` / `fs_read` / `fs_write` / `db_*` / `logging` chain counts drop** on any repo using third-party I/O wrappers (Python `requests` / `httpx` / `aiohttp`; Java OkHttp / Spring Web / Apache HttpClient / Hibernate / SLF4J / Log4j; JavaScript `axios` / `node-fetch` / `express` / Fastify / Koa; Rust `tokio::fs` / `hyper` / `reqwest` / `axum`; Scala fs2 / cats-effect / sttp / akka / ZIO / Play; Kotlin ktor / Exposed / `kotlin-logging`). Those wrappers were removed from the catalog under a strict-stdlib rule — adding one entry per popular wrapper was a maintenance treadmill with no principled stopping point.
1818
2. **A new `external_potential` boundary section appears** in both text and `--json` output, containing every first-party call that reaches into a tier-3 external boundary node not classified by any catalog. That covers the wrappers culled in (1) plus everything that was never in the catalog (HuggingFace Hub, sentence-transformers, any other third-party library you happen to use). The bucket is the structural answer to "the catalog can't enumerate every popular wrapper" — surface untrusted-territory reach as its own first-class signal instead of growing the catalog.
19-
3. **Per-language reliability is now declared.** Python's catalog declares `status: complete` with a `stdlib_provenance` citation pointing at `docs.python.org/3.13`. The other 12 catalogs (`c`, `elixir`, `erlang`, `go`, `haskell`, `java`, `javascript`, `kotlin`, `objc`, `rust`, `scala`, `swift`) declare `status: in_progress` until follow-up PRs audit them against their official stdlib documentation. For `external_potential` chains whose source language is `in_progress`, an `[unreliable]` marker (text) / `dst_classification_unreliable: true` field (JSON) flags the chain so consumers know absence-of-catalog-hit is not authoritative for that language yet.
20-
4. **JSON shape change.** `boundaries.external_potential` is a new top-level key, and every `boundaries.<type>.chains[]` dict gains a `dst_classification_unreliable` field. Catalogs declaring `status: complete` (explicit OR defaulted) without a valid `stdlib_provenance` URL now hard-error at load time; URLs must be `https://` and their hostname must suffix-match `ALLOWED_PROVENANCE_HOSTNAME_SUFFIXES` (an allowlist of official stdlib documentation hosts in `io_boundary.py`). See **`docs/MIGRATION-PLAN-C.md`** for the full migration guide.
19+
3. **Per-language reliability is now declared.** Python's catalog declares `status: complete` with a `stdlib_provenance` citation pointing at `docs.python.org/3.13`. The other 12 catalogs (`c`, `elixir`, `erlang`, `go`, `haskell`, `java`, `javascript`, `kotlin`, `objc`, `rust`, `scala`, `swift`) declare `status: in_progress` until follow-up work audits them against their official stdlib documentation. For `external_potential` chains whose source language is `in_progress`, an `[unreliable]` marker (text) / `dst_classification_unreliable: true` field (JSON) flags the chain so consumers know absence-of-catalog-hit is not authoritative for that language yet.
20+
4. **JSON shape change.** `boundaries.external_potential` is a new top-level key, and every `boundaries.<type>.chains[]` dict gains a `dst_classification_unreliable` field. Catalogs declaring `status: complete` (explicit OR defaulted) without a valid `stdlib_provenance` URL now hard-error at load time; URLs must be `https://` and their hostname must suffix-match an allowlist of official stdlib documentation hosts. See **`docs/MIGRATION-IO-BOUNDARIES.md`** for the full migration guide.
2121

22-
For `verify-claims` users specifically: `IoChain.dst_tier` (PR2 of stop-stripping) plus `external_potential` (Plan C, PR C) together let you reason about untrusted-territory reach without per-library catalog growth. The original WI-jihuj concern that drove the audit — "can I `verify-claims` that hypergumbo doesn't make network calls if it uses HuggingFace Hub?" — is now structurally answered: HF Hub doesn't appear in `net_send`, but its calls surface in `external_potential` with `dst_tier=3`.
22+
For `verify-claims` users specifically: the destination supply-chain tier on each chain, plus the new `external_potential` bucket, together let you reason about untrusted-territory reach without per-library catalog growth. An earlier open question — "can I `verify-claims` that hypergumbo doesn't make network calls if it uses HuggingFace Hub?" — is now structurally answered: HF Hub doesn't appear in `net_send`, but its calls surface in `external_potential` with `dst_tier=3`.
2323

24-
**Earlier in [Unreleased] (pre-Plan-C work):** External boundary nodes — synthetic `Symbol` records minted for every dangling edge endpoint (stdlib calls, third-party imports) — are now serialized into `behavior_map["nodes"]` with `kind="external_symbol"`, `path="<external>"`, `meta.external_boundary=True`, and `supply_chain.tier` populated. They were previously stripped before output, which silently re-introduced the dangling-reference problem WI-sikur was meant to fix for every disk-load consumer (`cmd_slice`, `cmd_test_coverage`, `cmd_verify_claims`, etc.). INV-miniz transitions to satisfied. Display surfaces (sketch, compact, search) continue to filter boundary nodes via the new `ir.is_external_boundary()` helper, so display-side output is unchanged. Schema bumped 0.2.2 → 0.2.3 (additive: `external_symbol` added to `Symbol.kind` enum; Span `start_line` / `end_line` minimum loosened from 1 to 0 for synthetic zero-span nodes).
24+
Separately in this release:
25+
26+
External boundary nodes — synthetic `Symbol` records minted for every dangling edge endpoint (stdlib calls, third-party imports) — are now serialized into `behavior_map["nodes"]` with `kind="external_symbol"`, `path="<external>"`, `meta.external_boundary=True`, and `supply_chain.tier` populated. They were previously stripped before output, which silently re-introduced the dangling-reference problem for every disk-load consumer (`cmd_slice`, `cmd_test_coverage`, `cmd_verify_claims`, etc.). Display surfaces (sketch, compact, search) continue to filter boundary nodes via the new `ir.is_external_boundary()` helper, so display-side output is unchanged. Schema bumped 0.2.2 → 0.2.3 (additive: `external_symbol` added to `Symbol.kind` enum; Span `start_line` / `end_line` minimum loosened from 1 to 0 for synthetic zero-span nodes).
2527

2628
`hypergumbo io-boundaries` also gained detection of attribute-style IO primitives — `os.environ`, `sys.argv`, `process.env`, `System.out`, `os.Stdout`, `std::env::consts::OS`, and bare `stdout` / `stderr` / `stdin` in C — across Python, Go, JavaScript/TypeScript, Java, Rust (tree-sitter), and C. These entries were declared in the YAML catalog but had no matching analyzer edges, so every chain through them was invisible. `verify-claims` also gains `--taint-sources`, `--taint-sinks`, `--taint-sanitizers` flags (and an `extra_catalogs:` key in the claims YAML) for project-local trust zones, sanitizers, and label maps. Browser-local reads (`localStorage.getItem`, `indexedDB.open`, `caches.*`) are no longer misreported as host-filesystem reads.
2729

docs/MIGRATION-IO-BOUNDARIES.md

Lines changed: 42 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
<!-- SPDX-License-Identifier: AGPL-3.0-or-later -->
2-
# Migration Guide: Plan C (I/O catalog refactor)
2+
# Migration Guide: I/O catalog refactor
33

4-
Plan C is a three-PR refactor of hypergumbo's I/O catalog system that
5-
ships in the next release after the current `[Unreleased]` window. It
6-
changes what `hypergumbo io-boundaries` reports — and, indirectly, what
4+
This release refactors hypergumbo's I/O catalog system. It changes
5+
what `hypergumbo io-boundaries` reports — and, indirectly, what
76
`hypergumbo verify-claims` will accept — for every repo that uses
87
third-party I/O wrappers.
98

@@ -24,22 +23,23 @@ and what to do if you depend on the old behavior.
2423

2524
## Why the change
2625

27-
Before Plan C, the I/O catalog grandfathered popular third-party
26+
Previously, the I/O catalog grandfathered popular third-party
2827
wrappers (`requests`, `axios`, `okhttp3.*`, `tokio::fs`, akka,
2928
`huggingface_hub`, ...) so they would show up under `net_send` /
3029
`fs_read` / etc. instead of being silently invisible. That carve-out
3130
was a slippery slope: every popular library wanted its own entry, and
3231
there was no principled stopping point.
3332

34-
The structural answer is to stop trying to enumerate every wrapper and
35-
instead expose the **shape** of untrusted-territory reach as its own
36-
signal. That is what the new `external_potential` bucket is.
33+
The structural answer is to stop trying to enumerate every wrapper
34+
and instead expose the **shape** of untrusted-territory reach as its
35+
own signal. That is what the new `external_potential` bucket is.
3736

38-
The catalog now enumerates **only** stdlib symbols, with a per-language
39-
`status` declaration (`complete` or `in_progress`) so absence-from-the-
40-
catalog has a clear meaning: for `status: complete` languages, "not in
41-
the catalog" = "not stdlib, probably third-party"; for `status:
42-
in_progress` languages, the absence is flagged as not-yet-authoritative.
37+
The catalog now enumerates **only** stdlib symbols, with a per-
38+
language `status` declaration (`complete` or `in_progress`) so
39+
absence-from-the-catalog has a clear meaning: for `status: complete`
40+
languages, "not in the catalog" = "not stdlib, probably third-party";
41+
for `status: in_progress` languages, the absence is flagged as
42+
not-yet-authoritative.
4343

4444
## What the output looks like now
4545

@@ -70,10 +70,10 @@ chain in `external_potential` carries an `[unreliable]` marker:
7070
```
7171

7272
The `[unreliable]` marker means: "this language's stdlib catalog
73-
hasn't been audited end-to-end yet, so the absence-of-catalog-hit that
74-
caused this chain to land in `external_potential` isn't authoritative
75-
— after the catalog is promoted to `status: complete`, the chain may
76-
either stay here or move into a classical bucket."
73+
hasn't been audited end-to-end yet, so the absence-of-catalog-hit
74+
that caused this chain to land in `external_potential` isn't
75+
authoritative — after the catalog is promoted to `status: complete`,
76+
the chain may either stay here or move into a classical bucket."
7777

7878
### JSON mode
7979

@@ -158,13 +158,13 @@ Two additive changes:
158158
Total chain counts will change. If you compute health metrics off
159159
chain counts, recalibrate against the post-cull numbers.
160160

161-
### ...maintain a project-local taint catalog (WI-votan, `--taint-sources` / `--taint-sinks` / `--taint-sanitizers`)
161+
### ...maintain a project-local taint catalog (`--taint-sources` / `--taint-sinks` / `--taint-sanitizers`)
162162

163163
No change. Project-local catalogs continue to override the built-in
164164
catalog the same way they did before. If a third-party wrapper your
165165
project cares about is no longer in the global catalog, you can
166-
re-add it for your repo via a project-local catalog without affecting
167-
anyone else.
166+
re-add it for your repo via a project-local catalog without
167+
affecting anyone else.
168168

169169
### ...are a catalog contributor for a non-Python language
170170

@@ -181,9 +181,9 @@ language: <lang>
181181
status: complete | in_progress
182182

183183
# REQUIRED for status: complete. Optional for status: in_progress.
184-
# Cited at load time; URL hostname must suffix-match
185-
# ALLOWED_PROVENANCE_HOSTNAME_SUFFIXES (a curated allowlist of
186-
# official-stdlib documentation hosts).
184+
# Cited at load time; URL hostname must suffix-match an allowlist
185+
# of official-stdlib documentation hosts declared in
186+
# io_boundary.py.
187187
stdlib_provenance:
188188
source_url: https://docs.<authority>/<version>/library/index.html
189189
version: "<stdlib release>"
@@ -193,46 +193,37 @@ stdlib_provenance:
193193
a language's "list all modules" page, etc.).
194194
195195
# OPTIONAL. Stdlib symbols that are NOT I/O primitives.
196-
# Used by the external_potential filter to drop "first-party calls a
197-
# stdlib non-IO symbol" — math.sqrt, collections.deque, etc. — from
198-
# the bucket. Empty until you populate it.
196+
# Used by the external_potential filter to drop "first-party calls
197+
# a stdlib non-IO symbol" — math.sqrt, collections.deque, etc. —
198+
# from the bucket. Empty until you populate it.
199199
stdlib_other:
200200
- module: math
201201
functions: [sqrt, sin, cos, ...]
202202
```
203203
204204
Promoting a language from `in_progress` to `complete` is a regular
205-
PR. Adding to `ALLOWED_PROVENANCE_HOSTNAME_SUFFIXES` is a governance
206-
change requiring PR review (same shape as `ALLOWED_WEBSITES.md`).
207-
208-
The 12 follow-up tracker items (`WI-ganid` plus 11 siblings) track
209-
the catalog-completion backlog. Pick one, audit the catalog against
210-
the language's official docs, add `stdlib_provenance`, and flip
211-
`status` to `complete`.
205+
PR: audit the catalog against the language's official stdlib
206+
documentation, add `stdlib_provenance`, and flip `status` to
207+
`complete`. Adding a hostname to the provenance allowlist is a
208+
governance change requiring PR review (same shape as
209+
`ALLOWED_WEBSITES.md`).
212210

213211
### ...care about the strict-stdlib rule itself
214212

215213
The catalog principle is now: **catalog membership = stdlib
216214
(language ships it); absence = probably third-party, not certain**.
217-
The validator (`_validate_catalog_dict` in `io_boundary.py`) hard-
218-
errors at load time on any catalog declaring `status: complete`
219-
without provenance, so completeness claims are auditable. The
220-
allowlist (`ALLOWED_PROVENANCE_HOSTNAME_SUFFIXES`) defends against
221-
typos and unofficial sources.
215+
A load-time validator hard-errors on any catalog declaring
216+
`status: complete` without provenance, so completeness claims are
217+
auditable. The hostname allowlist defends against typos and
218+
unofficial sources.
222219

223-
Project-local catalogs (WI-votan) remain the escape hatch for
224-
"my project depends on a wrapper not in the global catalog and I
225-
want it classified" — they always take precedence over built-in
226-
entries.
220+
Project-local catalogs remain the escape hatch for "my project
221+
depends on a wrapper not in the global catalog and I want it
222+
classified" — they always take precedence over built-in entries.
227223

228224
## Reference
229225

230-
- The original plan: `/home/jgstern_agent/.claude/plans/yeah-we-should-do-parallel-pnueli.md`
231-
(in the agent workspace; not committed).
232-
- Per-PR detail: `[Unreleased]` section of `CHANGELOG.md`,
233-
subsections for "IO catalog — strict-stdlib cull (Plan C, PR A)",
234-
"IO catalog — `status` / `stdlib_provenance` / ... (Plan C, PR B)",
235-
and "IO boundaries — `external_potential` bucket (Plan C, PR C)".
236-
- Tracker: `WI-koluz` (PR B), `WI-tanas` (PR C), `INV-pomir`
237-
(provenance / allowlist invariant), `WI-ganid` + 11 siblings
238-
(per-language promotion backlog).
226+
Per-change detail — down to the API-level and YAML-level shape of
227+
each subsystem covered above — lives in the `[Unreleased]` section
228+
of `CHANGELOG.md`, under the subsections for the strict-stdlib cull,
229+
the catalog schema additions, and the `external_potential` bucket.

0 commit comments

Comments
 (0)