Skip to content

Commit f96fa0c

Browse files
committed
docs(synergies): §7.5 immaterialized cascade as storage — Morton-keyed columnar grid-pyramid
Operator synthesis (2026-06-08): the Morton cascade is a coordinate transform, not a stored grid — like (lat,lon) -> quadkey is a cheap closed-form bit-interleave with no materialized grid. The cascade computes cell membership on demand; it is never built. That makes it a reusable addressing layer any payload can hang off. New §7.5 captures the storage-layer consequence: - Address vs payload (ADR-023) at storage scale: ADDRESS = Morton prefix (immaterialized, arithmetic, amortizes to ~0 — §7 best case); PAYLOAD = columnar (Lance / Parquet-family / PR #477 SoA), rows in Morton order. - The columnar-pushdown identity [G — deployed lakehouse tech]: Z-order/Hilbert row clustering for data-skipping is production tech (Delta ZORDER, Iceberg/Hudi clustering, BigQuery clustering). row=cell, row-group=tile, prefix-pushdown=tile-fetch, column-chunk=SoA role/level column, page=leaf nibble. The format's own machinery IS the cascade's storage + tile-fetch, free. - Honest sharpening: classic Parquet row-groups are scan-optimized; the cascade wants random tile access (any prefix, any version) = Lance's advantage. "parquet-shaped" = the columnar family; Lance is the substrate's actual instance. - Four payloads on one immaterialized address: delta frames (version- diff = changed Morton cells = codec P-frame; I-frame=materialized version, B-frame doesn't map onto append-only log — honest limit) [H]; radix trie (VART lazy paths) [G]; HHTL/OGIT/helix (one address three roles) [G]; CAM-PQ (semantic columns) [G]. - The unifying shape "parquet-shaped grid-pyramid shader": columnar storage (parquet) + Morton-ordered rows (grid) + level cascade (pyramid) + closed-form per-cell (shader). Every layer production- proven, just composed. - Honest limit: the immaterialization isn't total — address is immaterialized (arithmetic), payload is materialized (columns). The grid is free; the data is the SoA-headroom spend (§7). Cross-refs (§11) gain the deployed Z-order/lakehouse anchor + the Lance-vs-Parquet random-access note. Docs-only; epiphany-capture status unchanged. PII abort-guard (word-boundary): CLEAN. cargo check: clean. https://claude.ai/code/session_01PBTGaPCSnnt6u3pjXpbLwY
1 parent 25f3d09 commit f96fa0c

1 file changed

Lines changed: 83 additions & 0 deletions

File tree

docs/CASCADE-SYNERGIES-EPIPHANY.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,83 @@ amortization gate" is that same discipline, named.
285285

286286
---
287287

288+
## 7.5 The immaterialized cascade as storage — the Morton-keyed columnar grid-pyramid **[storage synthesis, 2026-06-08]**
289+
290+
**The operator's framing:** the Morton cascade is a **coordinate transform,
291+
not a stored grid** — exactly as `(lat, lon) → quadkey` is a cheap
292+
closed-form bit-interleave with no materialized grid. The cascade computes
293+
cell membership on demand; it is never *built*. That makes it a **reusable
294+
addressing layer** any payload can hang off.
295+
296+
**Address vs payload (ADR-023) at storage scale:**
297+
- **ADDRESS** = the Morton prefix — immaterialized, computed cheap, never
298+
stored as a grid. Amortizes to ≈ 0 (it's arithmetic, not data) — the §7
299+
gate's best case.
300+
- **PAYLOAD** = columnar (Lance / Parquet-family / the PR #477 SoA register
301+
file), with rows in Morton order.
302+
303+
**The columnar-pushdown identity [G — deployed lakehouse tech]:** ordering
304+
a columnar table's rows by a space-filling curve (Z-order / Hilbert) so
305+
range-predicates skip non-matching pages is **production tech** (Databricks
306+
Delta `ZORDER`, Iceberg/Hudi clustering, BigQuery clustering all ship it).
307+
The substrate is the same move:
308+
309+
| Columnar concept | Cascade concept |
310+
|---|---|
311+
| row keyed by Morton prefix | a cell |
312+
| row-group / fragment over a Morton-prefix range | **a tile** |
313+
| predicate pushdown on a Morton-prefix range | **the tile fetch** (subtree scan = data-skipping) |
314+
| column-chunk | a per-role / per-level SoA column (PR #477 `SoaEnvelope`) |
315+
| page | the leaf nibble |
316+
| column projection | fetch only the roles you need |
317+
318+
So the columnar format's **own machinery** (row-groups, pushdown,
319+
projection, data-skipping) *is* the cascade's storage + tile-fetch — free,
320+
on production-proven tech.
321+
322+
**One sharpening (honest):** classic Parquet row-groups are **scan**-
323+
optimized; the cascade wants **random** tile access (jump to any prefix,
324+
any version). That is precisely **Lance's** advantage over classic Parquet
325+
(fragment/page random-access for ML workloads). So "parquet-shaped" is the
326+
right *family* (columnar + Z-ordered); **Lance is the substrate's actual
327+
instance**, chosen because the shader needs random access, not sequential
328+
scans.
329+
330+
**Four payloads, one immaterialized address:**
331+
332+
| Payload | What it is | Grade |
333+
|---|---|---|
334+
| **delta frames** | a version-diff = the Morton cells whose payload changed since the reference version (Lance versioning) — the codec P-frame (§1). I-frame = a materialized version (vacuum point); P-frame = a version-delta. *(B-frames / bidirectional refs don't map onto an append-only version log — honest limit.)* | [H] |
335+
| **radix trie** | VART lazy-materialized prefix nodes — only touched paths stored; the trie *is* the cascade | [G] |
336+
| **HHTL / OGIT / helix** | one Morton address = compile-time HHTL identity + OGIT class identity + helix placement seed (ADR-023/024/025) | [G] |
337+
| **CAM-PQ** | 6 roles × 256 centroids = the semantic column values (§3, §7) | [G] |
338+
339+
**The unifying shape — "parquet-shaped grid-pyramid shader":**
340+
341+
```
342+
columnar storage (Lance / Parquet-family / SoA) ← the parquet
343+
+ rows in Morton order (Z-ordered) ← the grid
344+
+ level cascade (mip pyramid) ← the pyramid
345+
+ closed-form per-cell from address (§5) ← the shader
346+
```
347+
348+
The address is the cheap deterministic transform (free — §7 best case); the
349+
columns are the amortized payload (build-once, query-many — §7 gate). **The
350+
whole substrate storage layer is one Z-ordered columnar grid-pyramid,
351+
queried by Morton-prefix pushdown, executed shader-style** — every layer
352+
production-proven (Lance columnar + lakehouse Z-order + GPU shader), just
353+
composed.
354+
355+
**Honest limit (the immaterialization isn't total):** the *address* is
356+
immaterialized (arithmetic); the *payload* is materialized in the columns
357+
(it has to live somewhere). "Immaterialized cascade" = immaterialized
358+
*addressing* over materialized *columnar payload*. The grid is free; the
359+
data is the SoA-headroom spend (§7). The precise claim: you never store the
360+
grid; you always store the (Morton-ordered, palette-coded, amortized)
361+
columns.
362+
363+
---
364+
288365
## 8. The synergy matrix (everything against everything)
289366

290367
| | Morton cascade | golden helix | palette256/CAM | attention | Cesium | x265/x266 |
@@ -360,6 +437,12 @@ Ordered by leverage (highest first):
360437
mode; ITU‑T H.266 (VVC/x266) CTU + QTMT; Fuji X‑Trans CFA;
361438
Vogel/phyllotaxis golden‑angle anti‑aliasing; Product Quantization
362439
(Jégou et al.).
440+
- Deployed columnar / space‑filling‑curve clustering (the §7.5
441+
storage anchor): Databricks Delta `ZORDER BY`, Apache Iceberg /
442+
Hudi clustering, BigQuery clustering — all production data‑skipping
443+
by Z‑order/Hilbert row ordering. Lance (the substrate's columnar
444+
instance) — fragment/page random‑access columnar format, chosen over
445+
classic Parquet row‑groups for random tile access.
363446

364447
---
365448

0 commit comments

Comments
 (0)