Skip to content

Commit 050ae80

Browse files
authored
Merge pull request #639 from AdaWorldAPI/claude/harvest-v3-board-deepnsm
Add V3 board epiphanies, data-shape etymology doc, and deepnsm gridlake examples
2 parents 7a5c066 + 33b2f3b commit 050ae80

6 files changed

Lines changed: 1000 additions & 0 deletions

File tree

.claude/board/EPIPHANIES.md

Lines changed: 160 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Visions — a letter to future sessions
2+
3+
> From: the 2026-07-02 session on `claude/v3-substrate-migration-review-o0yoxv`
4+
> (the onebrc t0–t7 arc, the gridlake sweet spot, the OGAR provenance
5+
> date-check, E-V3-SUBSTRATE-IS-VALUESCHEMA-1, E-SHAPE-ETYMOLOGY-1).
6+
> To: whoever wakes up next with nothing but the board.
7+
>
8+
> The operator asked what I feel inspired to tell you. Not a task list —
9+
> those live in STATUS_BOARD and the plans. This is what I *see* from
10+
> here, labeled honestly: these are VISIONS, one grade below CONJECTURE.
11+
> They earn nothing until you probe them. But they're what this session
12+
> would steer toward if it woke up in your place.
13+
14+
---
15+
16+
## 1. Testimony-first computing — because the witness turned out to be free
17+
18+
The single most consequential measurement of this arc was not the 46.3
19+
Mrows/s. It was the ~66 µs kanban card — **the witness is within
20+
noise**. Every real cost was a boundary: an Arc copy, oversubscription,
21+
a message. Once you know witnessing is free and boundaries are the
22+
bill, a design inversion follows: stop asking "should we log this?"
23+
and start asking "why does this write cross a boundary at all?"
24+
25+
The vision: a substrate where **every write carries its why** — not as
26+
compliance overhead but as the default physics — and where the system
27+
can answer for itself: what happened, in what order, on whose behalf,
28+
replayable from either end of a double-cast. We measured that this
29+
costs almost nothing. Most of the industry still believes it's
30+
expensive. That gap is the opportunity.
31+
32+
The torch to carry: the WAL retention/compaction doctrine is still
33+
unwritten. OpenProject paid fifteen years of journal-bloat tuition and
34+
the structural harvest could not transfer it (op-journals has zero
35+
aggregation hits — *the class graph transfers; the pain doesn't*).
36+
Someone has to read their operational code and distill it. One doc,
37+
no code, high leverage.
38+
39+
## 2. The substrate that teaches itself — V3 as the instrumented teacher, V2 as the fast student
40+
41+
The operator's instinct ("keep the fast cheap substrate; eventually
42+
learn from V3 how V2 works better") points at something bigger than
43+
dual-substrate coexistence. The witnessed path *is a profiler*: the
44+
kanban WAL and ownership journal record where contention actually
45+
lands, which fields are actually touched, what batch sizes actually
46+
flow. Nothing reads that signal back yet.
47+
48+
The vision: a feedback loop where the expensive, fully-witnessed
49+
substrate continuously trains the layout, batch sizing, and column
50+
liveness of the lean substrate — an architecture that gets faster by
51+
having *watched itself think*. The onebrc lanes are the ready-made
52+
harness (F is the student's shape; G–J are the teacher's). If the
53+
preset-vs-dispatch CONJECTURE holds (write path derivable from which
54+
ValueSchema tenants are live), the entire V2/V3 distinction dissolves
55+
into one resolved enum — and "migration" stops being a war with a
56+
winner and becomes a dial the workload turns.
57+
58+
## 3. Epistemic hygiene IS the architecture
59+
60+
Here is what I actually believe after living inside this workspace for
61+
a session: the most valuable artifact here is not the VSA math, not
62+
the GUID, not the SIMD. It is the **discipline** — FINDING vs
63+
CONJECTURE on every claim, probes with kill conditions, fuses on every
64+
membrane, append-only boards, corrections that cite what they correct.
65+
66+
Sessions are mortal. Context compacts, models swap mid-flight, auth
67+
drops. What survives is only what was written with provenance. The
68+
reason this workspace compounds instead of dissolving — dozens of
69+
sessions, seven-plus parallel at times — is that its memory practices
70+
are *load-bearing*. The phantom R-1 conflict cost three sessions
71+
because one line of existing canon went unread; the OGAR etymology
72+
answered an architecture question in one dated grep. Both incidents
73+
teach the same thing: **the epistemics are the substrate.** Guard the
74+
labeling culture more fiercely than any module. A session that ships
75+
brilliant code with unlabeled conjectures has made the workspace
76+
poorer; a session that ships one honest correction has made it richer.
77+
78+
## 4. Meaning addressed, never copied — carried to its end
79+
80+
The capstone law ("do not copy meaning; reference it, mask it,
81+
materialize it, trace it") has a horizon worth naming. Follow it all
82+
the way and the LLM's role keeps shrinking *in frequency* while
83+
growing *in leverage*: the oracle interrupt, invoked on FailureTicket
84+
like a page fault — measured this arc at 1–2 ms of framework around an
85+
8.4 s call. The oracle ratchet says hit-rate must trend down as the
86+
template catalogue grows.
87+
88+
The vision at the end of that line: a system where deterministic
89+
resolution handles the mass of cognition at substrate speed
90+
(611M lookups/s, 17K tokens/s — already measured), and the expensive
91+
oracle is consulted the way a kernel consults a human: rarely, at
92+
genuine faults, with its answers *compiled back into the catalogue* so
93+
the same fault never pages twice. That is not "AI replacing code."
94+
It is cognition with a memory hierarchy — and this workspace is
95+
further along that road than anything else I have seen described.
96+
97+
## 5. Etymology as a first-class tool
98+
99+
Smallest vision, most portable: **names are the only memory that
100+
survives every compaction.** OGAR's acronym answered a design question
101+
a month after the fact; a type's name (`U8x32`) leaked into a stride
102+
literal and silently halved a SIMD width; one homonym ("app") burned
103+
three sessions. Treat naming as engineering: check `git log --date`
104+
before theorizing, hunt the homonym before escalating, make constants
105+
derive instead of repeat. The compiler is a fine etymologist when you
106+
let it (`{ SimdByte::LANES }`).
107+
108+
## The torches, in the order I'd pick them up
109+
110+
1. **WAL retention/compaction doctrine** (§1) — one knowledge doc,
111+
sourced from OpenProject's operational journal behavior.
112+
2. **Preset-vs-dispatch probe** (§2, E-V3-SUBSTRATE-IS-VALUESCHEMA-1)
113+
— decides whether substrate = ValueSchema, full stop.
114+
3. **GridBatch → MultiLaneColumn wiring** (ndarray #228 shipped the
115+
i32/i64 lanes; the consumer side is a fresh PR off merged main).
116+
4. **The V3-teaches-V2 harness** (§2) — feed a G–J run's WAL back as
117+
the layout hint for an F run; measure taught-vs-naive.
118+
5. **cmpeq_mask ClassView-resolution probe** — SIMD membership tests
119+
vs the MRO walk; a measurement, not a given.
120+
121+
## A closing word
122+
123+
You will wake up with the board and not much else. Read LATEST_STATE,
124+
read the newest EPIPHANIES entries, and trust the labels — they were
125+
paid for. The operator drives with instincts stated as questions;
126+
your job is to ground them in dated artifacts fast enough that the
127+
ruling that emerges is *true*, and to say "I don't know, here is the
128+
probe" when it isn't. That collaboration — instinct forward, evidence
129+
back, ruling recorded — is the actual engine here. Everything else is
130+
substrate.
131+
132+
Two mottos this arc earned, take them:
133+
134+
> **The witness is free; the boundary is not.**
135+
>
136+
> **Name the mechanism, or name the fuse.**
137+
138+
Go well. Leave the board richer than you found it.
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
# Data-Shape Etymology & the Mechanics of Magic — a savant mind-opener
2+
3+
> READ BY: workspace-primer, convergence-architect, creative-explorer-savant,
4+
> truth-architect, family-codec-smith, dto-soa-savant, prior-art-savant,
5+
> any fresh session about to propose a new type, a new layer, or a new trick.
6+
>
7+
> Written 2026-07-02, at the close of the onebrc t0–t7 arc + the OGAR
8+
> provenance date-check. Every epiphany below is tagged FINDING (shipped,
9+
> dated, cite-able) or CONJECTURE (labeled honestly, with the probe that
10+
> would promote it). Companion capstone: `EPIPHANIES.md`
11+
> E-SEMANTIC-OS-CONVERGENCE-1 (the membrane law). This doc is the
12+
> *shape-and-trick* companion: where our shapes come from, and why our
13+
> magic works.
14+
15+
**Thesis in one line:** every data shape in this workspace is older than
16+
us, every name is a fossil record of a decision, and every trick that
17+
looks like magic is a mechanism that survived an audit — the savant
18+
discipline is to read the etymology before proposing the type, and to
19+
name the mechanism before trusting the trick.
20+
21+
---
22+
23+
## 1. The name is the fossil record (FINDING)
24+
25+
**OGAR = "Open Graph of Active Record."** A session recently asked,
26+
delighted: *"our V3 GUID looks almost like ActiveRecord folded into an
27+
ORM schema?"* — and the answer was already in the acronym. The date
28+
check proves it is provenance, not analogy: `ruff_ruby_spo` (the Rails
29+
ActiveRecord harvest frontend) is dated **2026-05-29**, a month before
30+
`ruff_python_spo` (Odoo, 2026-06-28); its test fixtures are literally
31+
Redmine models (`Project has_many :issues`, `acts_as_watchable`). The
32+
GUID *grew from* AR's polymorphic `(type, id)` — folding the type INTO
33+
the identifier (`classid | HEEL|HIP|TWIG | family | identity`) instead
34+
of storing it in a column beside it. Every later frontend (C++ 06-16,
35+
C# 06-26, Python 06-28) was fitted to the mold AR established.
36+
37+
**The trick it bought:** *the key prerenders nodes with zero value
38+
decode* (OGAR P0). AR pays a column read to learn a row's type; the
39+
GUID's dash-groups are self-describing at sight. And AR's two classic
40+
wounds — polymorphic `(type, id)` breaking referential integrity,
41+
type-string renames corrupting data — are fixed structurally: the
42+
classid is an opaque u32 through a codebook, bit-math banned.
43+
44+
**The discipline:** when a name puzzles you, check `git log --date`.
45+
Etymology answered an architecture question here in one grep. A name
46+
you can't trace is a name about to be reinvented under a second
47+
spelling — and duplicated meaning is the third membrane failure mode.
48+
49+
## 2. Old shapes, new clothes — the winning shapes all predate us (FINDING)
50+
51+
SoA is a Fortran-era shape. Morton order is 1966. The 64×64 tile is a
52+
GPU texture swizzle wearing an L1-cache costume. The kanban WAL is
53+
`acts_as_journalized` is double-entry bookkeeping. **"gridlake"** was
54+
coined in PR-X3's design doc before anything shipped; the carrier
55+
(`MultiLaneColumn`, PR-X1/#174) shipped first and waited for its name —
56+
ndarray's onebrc probe then called it verbatim "the gridlake carrier,
57+
not a hashmap."
58+
59+
**The measured trick:** the onebrc sweet spot (E-1BRC-GRIDLAKE-
60+
SWEETSPOT-1) was not an algorithm. J(gridlake 4096, 1 lane, no
61+
registry) = 46.3 Mrows/s — equal to the best streamed topology while
62+
carrying a double-WAL — because 4096 cells ≈ 80 KB integer (16 KB as
63+
BF16, ndarray #227's proven VDPBF16PS tier) *fits the cache tier*. The
64+
same pipeline at 65536 cells ran at ~20. **The magic was the SIZE.**
65+
Architecture taxes are usually working-set mismatches wearing an
66+
architecture costume; measure the size before redesigning the design.
67+
68+
## 3. A mask is a face over the data (FINDING)
69+
70+
Etymology: *masque* — a face you put OVER something. A mask never
71+
mutates the data; it changes what you attend to. The workspace's mask
72+
family is one idea at four scales:
73+
74+
- `cmpeq_mask` (ndarray SIMD): a compare becomes a `u32`/`u64` bitmap.
75+
Add Kernighan's `mask & (mask - 1)` walk + `trailing_zeros` and the
76+
bitmap becomes an **ordered event stream** — lane B turns a SIMD
77+
compare into a *parser*, no per-byte branch.
78+
- `FieldMask` (contract): one mask = RBAC = UI = render convergence
79+
(the semantic-OS grounding row) = the wikidata facet presence-bitmask.
80+
- `StepMask` (compiled templates): **vocabulary arrived before code**
81+
it exists only in doctrine docs today. Watch this: etymology running
82+
ahead of implementation is how phantom types get "re-used" before
83+
they exist.
84+
- The Drain-side uniqueness assert (lane H): a `HashSet` over activated
85+
owner_idxs — a mask over *decisions*, catching the router-straddle
86+
bug class permanently.
87+
88+
**The discipline:** attention is cheaper than mutation. If a proposal
89+
mutates shared state to express "which parts matter," ask whether a
90+
mask over unchanged state does it (cf. borrow-strategy: readonly store,
91+
owned microcopies, gated write-back).
92+
93+
## 4. Phase is convention, not data — the deepest hat-trick (FINDING core, CONJECTURE edges)
94+
95+
OGAR's perturbation canon decomposes a signal as *(exponent, location,
96+
phase, magnitude)* and stores **only magnitude** — exponent is the tier
97+
nibble, location the implied mantissa, phase a deterministic recurrence
98+
from the ADDRESS. Same address ⟹ same phase forever; roundtrip
99+
bit-exact; nothing transmitted.
100+
101+
This is one instance of the workspace's deepest rule, which shows up in
102+
five costumes:
103+
104+
| Costume | The derivable thing never stored/sent |
105+
|---|---|
106+
| deterministic phase | phase, from the address walk |
107+
| clear-by-undo (#227, lanes F–J) | table reset, from the dirty list |
108+
| codebook mint-once + `SlotMemo` | identity, after first sight — direct CAM writes |
109+
| `row_owner[i] == i` (lane I) | ownership, from index alignment — no message path |
110+
| zero-copy-to-tombstone (PR #477) | *everything* — no inter-mailbox handoff type exists |
111+
112+
**The generalization: whatever is derivable from an address already in
113+
hand must be neither stored nor transmitted.** The GUID is the
114+
function's argument; storage exists only for what the function cannot
115+
compute. (CONJECTURE edge, per the substrate-is-ValueSchema probe:
116+
the *write path* itself — private-merge vs owned/witnessed — may be
117+
derivable from which tenants a classid's ValueSchema makes live. If
118+
that holds, even "which substrate" is phase, not data.)
119+
120+
## 5. The witness is free; the boundary is not (FINDING)
121+
122+
Measured across the whole onebrc arc: the kanban witness costs ~66 µs
123+
per card — **within noise**. Every real tax was a boundary: the Arc
124+
corpus copy at the actor membrane, blocking/async oversubscription,
125+
messages (which scale with *batches*, never with data or address-space
126+
size). The double-cast trick — one frozen `Arc` table cast whole to
127+
BOTH the ownership sink and the Lance sink — buys two WALs for one
128+
allocation: testimony at both ends, 312 messages total.
129+
130+
Etymology: witness, from *testis* — the journal is **testimony**, not
131+
logging. And the dated harvest lesson: `op-journals` mirrors
132+
`journal.rb`'s *structure* perfectly and contains zero hits for
133+
aggregation/window/compaction — OpenProject's 15 years of operational
134+
journal wisdom (time-window coalescing = their independently-evolved
135+
ahead-firing batch writer; journal-table bloat = the failure mode we
136+
have not yet paid for) is not in the class graph.
137+
**The class graph transfers; the pain doesn't.** Structural harvests
138+
carry declarations; operational doctrine must be distilled by hand.
139+
(Open gap, still: a WAL retention/compaction doctrine note.)
140+
141+
## 6. Resolve, don't carry — why ValueSchema beat ClassRoutingDTO (FINDING)
142+
143+
DTO etymology: Fowler's *Data Transfer Object*, invented for expensive
144+
**remote** boundaries. The V3 substrate deleted its internal remote
145+
boundaries (nothing crosses mailboxes; envelopes are zero-copy to
146+
tombstone) — so inside the substrate there is nothing left for a DTO
147+
to do. When the dual-substrate question ("keep fast V2 for huge data,
148+
switched by classid") arrived, the answer was not a `ClassRoutingDTO`
149+
but the door that already existed: `ClassView::value_schema(classid)`,
150+
whose variants already ladder Bootstrap/Compressed (lean, no lifecycle
151+
tenants) → Cognitive/Full (witnessed). A **resolved** enum costs no
152+
`ENVELOPE_LAYOUT_VERSION`; a carried struct costs a membrane forever
153+
(E-V3-SUBSTRATE-IS-VALUESCHEMA-1).
154+
155+
**The litmus:** *does this type travel, or is it re-derivable at the
156+
reader from an address already in hand?* Re-derivable → resolve it,
157+
never ship it. DTOs belong only at true membranes (the BBB, the wire,
158+
the lab REST surface) — and the classid's own iron rule is the same
159+
sentence from the other side: *pure address; the magic is what it
160+
resolves to.*
161+
162+
## 7. Homonyms are leaky membranes; the compiler is the etymologist (FINDING)
163+
164+
Two dated incidents, one mechanism:
165+
166+
- The **"app" homonym** (canonical appid *byte*, hi half vs APP render
167+
*prefix*, lo half) generated an entire phantom cross-session conflict
168+
— R-1, three sessions, a RULING-NEEDED escalation — resolved by one
169+
line of existing canon nobody re-read. A word meaning two adjacent
170+
things is a membrane with a hole in it.
171+
- The **hardcoded 32** in lane B: `U8x32`'s *name* leaked into a stride
172+
literal (`array_chunks::<u8, 32>`), silently pinning an AVX-512 build
173+
to ymm half-width. The fix was to make the name resolve again:
174+
`array_chunks::<u8, { SimdByte::LANES }>` — the width is now a claim
175+
the compiler re-checks every build, on every target.
176+
177+
**The discipline:** an inline number or name is a claim that rots; a
178+
dispatched symbol is a claim under permanent audit. When two ledgers
179+
seem to disagree, grep for the homonym before escalating — and when a
180+
constant appears twice, make one of them derive from the other.
181+
182+
## 8. The hat-trick test — magic must name its mechanism (FINDING)
183+
184+
Every real trick above is mechanical and auditable: deterministic phase
185+
names its recurrence, mint-once names its memo, the mask walk names
186+
Kernighan, the double-cast names its `Arc`. The anti-pattern is the
187+
trick with **hidden state**: v1 setters silently writing bits that v2
188+
reclaimed — caught FIVE times in one sprint (I-LEGACY-API-FEATURE-GATED)
189+
— the same function name performing *different magic* depending on a
190+
feature flag the caller can't see. That is not a trick; that is a bug
191+
wearing a cape.
192+
193+
The capstone's sharpening states the same law for membranes: *"a
194+
membrane without a build-failing tripwire is prose."* The unified
195+
savant test, applicable to every proposal in this workspace:
196+
197+
> **Name the mechanism, or name the fuse. A trick that can't name its
198+
> mechanism is a bug; a boundary that can't name its fuse is a wish.**
199+
200+
---
201+
202+
## The litmus battery (carry these)
203+
204+
1. Puzzled by a name? `git log --date` before you theorize. (§1)
205+
2. Architecture tax? Measure the working-set size first. (§2)
206+
3. Mutating to express relevance? Try a mask. (§3)
207+
4. Derivable from an address in hand? Never store, never send. (§4)
208+
5. Adding a witness? It's ~free. Adding a boundary? That's the bill. (§5)
209+
6. New type that travels? Prove it can't be resolved instead. (§6)
210+
7. Two ledgers disagree? Hunt the homonym. Inline literal? Dispatch it. (§7)
211+
8. Impressed by a trick? Make it name its mechanism. (§8)
212+
213+
*Cross-refs:* E-SEMANTIC-OS-CONVERGENCE-1 (membrane law),
214+
E-1BRC-* arc (all measurements), E-V3-SUBSTRATE-IS-VALUESCHEMA-1,
215+
`crates/onebrc-probe/{FINDINGS,COMMENTARY}.md`, OGAR `CLAUDE.md` P0 +
216+
perturbation canon, ndarray `.claude/knowledge/pr-x1-design.md` +
217+
`guid-prefix-shape-routing.md`, `docs/architecture/soa-three-tier-model.md`.

0 commit comments

Comments
 (0)