Skip to content

Commit ae08e68

Browse files
authored
Merge pull request #39 from AdaWorldAPI/claude/adr-024-palette256-hhtl-codec
docs(adr): add ADR-024 — Palette256 + HHTL codec as universal compression primitive
2 parents a9fbfcb + 5e9e55b commit ae08e68

1 file changed

Lines changed: 180 additions & 0 deletions

File tree

docs/ARCHITECTURAL-DECISIONS-2026-06-04.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
| ADR-021 | **Meta-hygiene**: always grep peer crates before copying manifest patterns (the `[lints] workspace = true` cascade lesson) | **Pinned** | OGAR PR #15 + PR #17/#18 follow-ups |
5252
| ADR-022 | **The Firewall** — absolute inner/outer boundary; no serialization in hot path; inner = compile-time HHTL; outer = contract-trait pluggable | **Pinned** | OGAR (this PR); `docs/THE-FIREWALL.md` |
5353
| ADR-023 | **IR-as-wire-truth** — the source-language AST is *input dialect*; the canonical `Class`/`Attribute`/`Association`/`EnumDecl`/`ActionDef` IR is *wire truth*. Adapters lift dialects into IR; the IR routes everything (registry key, actor mailbox, Lance version, audit-log dimension) | **Pinned** | OGAR (this PR); `crates/ogar-vocab/`; `bardioc/substrate-b-shadow::EdgeDecoder<E>` (PR #19) |
54+
| ADR-024 | **Palette256 + HHTL codec** — the substrate's universal compression primitive. HHTL prefix establishes a frame; within the frame, values cluster; clustered values quantize to 256-index palette + const-table lookup. Names an existing primitive (Binary16K perms + bgz-tensor attention + arm-discovery aerial codebook, ρ=0.9973 vs cosine) rather than proposing one | **Pinned** | OGAR (this PR); `MedCare-rs/crates/medcare-analytics/src/{graph_contract.rs,column_mask_bridge.rs}`; `bgz-tensor/examples/compare_stacked_vs_i16.rs`; `lance-graph-arm-discovery` |
5455

5556
## ADR-001: `State = ActionState` (lifecycle), not domain state, for Rubicon binding
5657

@@ -1167,6 +1168,185 @@ lance-graph) before merge.
11671168
- `docs/RDF-OWL-ALIGNMENT.md` §3 (OGAR's position in L1-L5) — the
11681169
IR sits at the AR-pattern lift seam.
11691170

1171+
## ADR-024: Palette256 + HHTL codec — the substrate's universal compression primitive
1172+
1173+
**Status:** Pinned (2026-06-05). Names an existing primitive (three
1174+
independent deployments + one empirical anchor) rather than proposing
1175+
one. Companion to ADR-022 (The Firewall — this ADR specifies one of
1176+
its inner-side primitives) and ADR-023 (IR-as-wire-truth — palette256
1177+
is the codec on the IR's wire form).
1178+
1179+
**Context.** The substrate has accumulated three independent
1180+
palette256 deployments developed for their own domains:
1181+
1182+
- **Security mesh**`Binary16K = [u64; 256]` in
1183+
`MedCare-rs/crates/medcare-analytics/src/graph_contract.rs:36`
1184+
(canonical home). The per-row `_effectiveReaders` bitmap; auth is
1185+
Hamming-popcount bit-intersection at the inner / hot path
1186+
(`HEALTHCARE-TRANSCODING.md §3.1`). Wired into production at
1187+
`MedCare-rs/crates/medcare-analytics/src/column_mask_bridge.rs`
1188+
`medcare-server/state.rs:167, 265, 439`.
1189+
- **Attention**`bgz-tensor` `WeightPalette::build(…, 256)` +
1190+
`AttentionTable::build` (`crates/bgz-tensor/examples/
1191+
compare_stacked_vs_i16.rs:90-92`). Replaces dense FP weights with
1192+
256-index palette + precomputed distance table on the model's hot
1193+
path.
1194+
- **Distance**`lance-graph-arm-discovery` aerial codebook —
1195+
measured **ρ = 0.9973 vs cosine**. The empirical anchor: palette256
1196+
reproduces cosine distance with correlation 0.9973 (i.e. on a
1197+
scale where 1.0 = identical, palette256 is ~0.003 from cosine).
1198+
1199+
Cross-domain analysis revealed all three are instances of the *same
1200+
codec*: HHTL prefix establishes a frame; within frame, values
1201+
cluster; clustered values quantize to a 256-index palette; decode is
1202+
a const-table lookup. The runtime side's BindSpace dissolution work
1203+
(bardioc PR #18 / lance-graph PR #470) hinted at this with the
1204+
Quintenzirkel qualia codebook ("frozen set + circle-of-fifths
1205+
progression → 8 B → 1-2 B per row") — same compression strategy,
1206+
different domain.
1207+
1208+
The proposal in the cross-session conversation (2026-06-05) was to
1209+
name the primitive explicitly so:
1210+
1. Future adopters don't reinvent it per domain.
1211+
2. New adopters report a falsifiable measurement (ρ-vs-reference)
1212+
at adoption time rather than after the fact.
1213+
3. The 256-ceiling escape hatches are documented before reviewers
1214+
ask.
1215+
1216+
**Decision.** **The codec is:**
1217+
1218+
```text
1219+
HHTL prefix (NiblePath / quadkey / class identity)
1220+
↓ establishes spatial / semantic frame
1221+
within-frame values cluster
1222+
↓ quantize to 256-index palette
1223+
↓ const-table lookup (compile-time HHTL where possible)
1224+
1-byte index per element, sub-microsecond decode, zero heap allocation
1225+
```
1226+
1227+
**Adoption checklist** for a new domain:
1228+
1. **Identify the prefix.** The NiblePath / quadkey / class identity
1229+
that establishes the frame the values live in.
1230+
2. **Identify the palette domain.** Which values cluster within the
1231+
frame? (Closed-keyspace tags, quantized continuous values,
1232+
enumerated state, etc.)
1233+
3. **Build the palette + measure ρ-vs-reference.** The reference is
1234+
the domain's full-precision metric (cosine for embeddings, L2 for
1235+
coordinates, exact-match for tags). Report ρ at adoption time as
1236+
the falsifiable property. Target: **ρ ≥ 0.99** to match the
1237+
arm-discovery anchor.
1238+
4. **Decode = const-table lookup.** Compile-time HHTL if the palette
1239+
is static; runtime const-table if the palette is per-frame /
1240+
per-tile. Either way the decode path is zero-allocation.
1241+
1242+
**The 256-ceiling escape hatches** (documented to avoid the
1243+
predictable reviewer question):
1244+
1245+
- **Per-tile / per-frame palettes** — the cheapest answer. Different
1246+
spatial-frame, different 256 entries. Used by Cesium tile codecs;
1247+
matches the quadkey-prefix discipline. Long-tail OSM tags inside a
1248+
zoom-21 tile rarely exceed 256.
1249+
- **Hierarchical palettes** — coarser palette at higher quadkey
1250+
levels, finer per leaf. Mirrors the standard tile pyramid; the SH
1251+
L0/L1 vs L2/L3 split in `splat-fit` is the same pattern.
1252+
- **Palette-64K upgrade** — 2-byte index instead of 1, for hot
1253+
palettes that genuinely exceed 256 distinct values (rare; reserve
1254+
for measured cases, not speculation).
1255+
1256+
The escape hatches are part of the primitive, not exceptions to it.
1257+
1258+
**Alternatives considered.**
1259+
1260+
- *Continuous distributions that don't cluster* (e.g. timestamps in
1261+
microseconds, free-form text). Rejected as a counterargument to
1262+
the codec — these are out-of-domain. For them, use delta encoding
1263+
or VarInt or a different codec entirely. The codec applies to
1264+
*clustered* domains; the adoption checklist's step 2 is the filter.
1265+
- *Domain-specific codecs per domain.* Rejected. Three independent
1266+
re-derivations of the same primitive (security / attention /
1267+
distance) is the receipt that the abstraction is real, not the
1268+
receipt that each domain should have its own. ADR-024 reduces
1269+
per-domain re-derivation.
1270+
- *Skip the ρ-vs-reference measurement.* Rejected. The arm-discovery
1271+
ρ = 0.9973 is the existing FINDING-grade stake; new domains
1272+
reporting at adoption time keeps the empirical floor honest as the
1273+
primitive spreads.
1274+
1275+
**Consequences.**
1276+
1277+
- **The primitive is named.** Cross-domain reuse is now load-bearing,
1278+
not coincidental. New domains adopt the codec instead of inventing
1279+
their own quantization.
1280+
- **ρ-vs-reference becomes the adoption contract.** Reported once at
1281+
adoption per domain. The arm-discovery 0.9973 is the existing
1282+
anchor; new adopters target ≥ 0.99 and document if they fall short.
1283+
- **Two next-domain adopters are queued** (planned, not yet wired):
1284+
- **D-OSM-2** (OSM tag palette + tile-local coordinate
1285+
quantization) — per `lance-graph` PR #473 (`cesium-osm-substrate
1286+
-v1.md`). Reports ρ-vs-reference on first per-country PBF run per
1287+
the runtime session's §11 follow-up commitment.
1288+
- **D-SPLAT-4** (SH-aware palette extension on the
1289+
`Gaussian3D` carrier) — per the splat-native arc. Same codec; SH
1290+
coefficients are the long-tail-budget challenger.
1291+
- **The 256-ceiling has three explicit escapes** in the ADR body
1292+
(per-tile / hierarchical / palette-64K). Reviewers don't need to
1293+
re-derive the answer.
1294+
- **Cross-arc reuse argument is sharpened.** The substrate-reuse
1295+
framing in `docs/RDF-OWL-ALIGNMENT.md §10` (geographic litmus
1296+
complements anatomical) cashes out as: FMA-bones and OSM-vectors
1297+
use *the same codec* (palette256 + HHTL prefix), not just the same
1298+
IR. The §6 callout in `DOMAIN-INSTANCES.md` (queued, awaiting
1299+
lance-graph PR #473 land) will reference ADR-024 as the falsifiable
1300+
property.
1301+
- **The falsifiable property** that ties the substrate-reuse claim
1302+
down: *"the same compile-time HHTL prefix + palette256 codec
1303+
decodes (a) `_effectiveReaders` for row auth, (b) OSM way
1304+
attributes at zoom-21 tile, and (c) FMA-bone SH coefficients at
1305+
sub-microsecond per element with zero heap allocation."* If that
1306+
property holds across all three, the substrate is doing its job.
1307+
If it fails on one, the substrate is leaking dialect into the codec.
1308+
1309+
**Change policy.** Adding a new palette256 adopter (new domain) is
1310+
routine — follow the adoption checklist + report ρ-vs-reference.
1311+
Changing the codec itself (e.g. palette-64K becoming default, or a
1312+
new escape-hatch added) is a substrate-wide concern and requires
1313+
consultation with the runtime session.
1314+
1315+
**References.**
1316+
1317+
- `lance-graph/.claude/board/EPIPHANIES.md:28` — FINDING-grade
1318+
anchor for palette256 + Hamming popcount on `_effectiveReaders`.
1319+
- `lance-graph/.claude/knowledge/old-stack-capability-parity.md §3.39`
1320+
— knowledge-doc record of the same primitive.
1321+
- `MedCare-rs/crates/medcare-analytics/src/graph_contract.rs:36`
1322+
`Binary16K = [u64; 256]` canonical home.
1323+
- `MedCare-rs/crates/medcare-analytics/src/column_mask_bridge.rs`
1324+
production wire-up; `redaction_mode_for` (line 128),
1325+
`column_mask_policy_for_table` (line 165),
1326+
`build_medcare_column_mask_registry` (line 192).
1327+
- `MedCare-rs/crates/medcare-server/src/state.rs:167, 265, 439`
1328+
F2-E install sites consuming the column-mask registry.
1329+
- `bgz-tensor/examples/compare_stacked_vs_i16.rs:90-92`
1330+
`WeightPalette::build(…, 256)` + `AttentionTable::build`.
1331+
- `lance-graph-arm-discovery` — aerial codebook with ρ = 0.9973 vs
1332+
cosine measurement.
1333+
- ADR-022 (The Firewall) — the inner-side discipline this ADR
1334+
specifies a primitive for.
1335+
- ADR-023 (IR-as-wire-truth) — palette256 is the codec on the IR's
1336+
wire form.
1337+
- `docs/THE-FIREWALL.md` §3 (the inner/hot side) — palette256 + HHTL
1338+
is one of its load-bearing primitives.
1339+
- `docs/HEALTHCARE-TRANSCODING.md §3.1` — palette256 + Hamming
1340+
popcount on Binary16K named as the inner-side security mesh.
1341+
- `docs/RDF-OWL-ALIGNMENT.md §10` — the brutal-upgrade sequencing
1342+
context (Phase 2c geospatial adopts the codec).
1343+
- `bardioc` PR #18 + `lance-graph` PR #470 — Quintenzirkel qualia
1344+
codebook (8 B → 1-2 B per row) as the same compression strategy in
1345+
a different domain.
1346+
- `lance-graph` PR #473 (forthcoming) `cesium-osm-substrate-v1.md`
1347+
§11 — runtime-side commitment to a follow-up callout on this ADR
1348+
once D-OSM-2 / D-SPLAT-4 wire.
1349+
11701350
## Implementation receipts — ADR ↔ commit cross-reference
11711351

11721352
> **Added in follow-up addendum (2026-06-05).** Records the implementation

0 commit comments

Comments
 (0)