You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/board/AGENT_LOG.md
+10Lines changed: 10 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,13 @@
1
+
## 2026-06-16 — 5-specialist framing of #497 OCR-transcode plans → plans rebaselined to #498 + probes spec'd
2
+
3
+
**Main thread (Opus 4.8 1M) + 5 Opus specialists in parallel** (cascade-architect / family-codec-smith / palette-engineer / dto-soa-savant / truth-architect), each read the 7 merged #497 plans + post-#498 source in full (Rule 7 — read, don't grep-judge). Operator: *"review the plans against your awareness of the new architecture incl. the last 15 PR arc (Morton Cascade + Helix 48 + turbovec residue) — send 5 specialist framing it."* See `EPIPHANIES.md` E-OCR-PLAN-DRIFT-1 for the consolidated framing.
4
+
5
+
**Two showstoppers:** (1) the "reversible without a hash" migration rationale is FALSE in code (no `residue→rank` inverse; `vocabulary.rs` is a stored string-table keyed by rank) — truth-architect; (2) the "Morton-tile stacked-pyramid perturbation-shader cascade" does NOT exist (0 hits; Morton rejected for Hilbert) — cascade-architect. **Convergent drift (≥4 lenses):** dead 48 B HelixResidue (now 6 B), D-OCR-50 already shipped (#498), `ValueSchema::Ocr`/`Meta`-5-jobs/`TurbovecResidue`-wrong-carrier §0 tripwires, HHTL = coherent address-trie not a blur.
6
+
7
+
**Outcome:** all 7 plans corrected on `claude/wonderful-hawking-lodtql` (rebaselined #496→#498, Morton purged, reversibility reframed, §0 tripwires fixed, master critical-path fixed = the open CodeRabbit Major on #497). New `ocr-probes-v1.md` (4 gating probes OCR-RT/DET/POST/SCHEMA + 3 cascade perf probes). **OCR-SCHEMA shipped as a contract test** (`ocr::tests::ocr_schema_fit_rides_existing_preset_no_new_variant`). contract 620 lib green; fmt clean. Both #497 + #498 review threads resolved/dispositioned.
8
+
9
+
**Next:** open the follow-up PR; run OCR-DET (deepnsm example) + OCR-RT (needs deepnsm+helix wiring) before any transcode code is funded.
**Main thread (Opus 4.8 1M) + 3 Opus brutal hardeners** (PP-13 brutally-honest-tester / PP-15 baton-handoff-auditor / PP-16 preflight-drift-auditor), all pinned to the plan by `file:line`. Verdicts: **HOLD / CATCH-LATENT / READY-TO-DISPATCH** — all fixes spec-text, no architectural rewrite; all three confirmed the grounding + dependency-wall claims + measure-first ratio.
Copy file name to clipboardExpand all lines: .claude/board/EPIPHANIES.md
+23Lines changed: 23 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,26 @@
1
+
## 2026-06-16 — E-OCR-PLAN-DRIFT-1 — the #497 OCR-transcode plans drifted from the substrate in 6 ways; 2 were showstoppers
2
+
3
+
**Status:** FINDING (5-specialist framing — cascade-architect / family-codec-smith / palette-engineer / dto-soa-savant / truth-architect, each read the merged plans + source in full).
4
+
**Confidence:** High — every claim cited plan file:line vs current source file:line; convergent across ≥4 lenses for the load-bearing ones.
5
+
6
+
**Context.** #497 (Tesseract→tesseract-rs transcode plan family, 7 design docs) and #498 (helix `Signed360` + GUID keystone) merged within hours. The #497 plans were authored against the pre-#498 branch, so they reason against a substrate that shifted under them. Five specialists framed the merged plans against the post-#498 architecture.
7
+
8
+
**The two showstoppers:**
9
+
1. **The "reversible without a hash" rationale is false in code** (truth-architect). The migration's headline — "OCR text reconstructs from residue + codebook, no string column" — has no support: `deepnsm/vocabulary.rs` maps `rank→&str` via a stored table, every decode entry point takes a *known* rank as input, and there is no `residue→rank` inverse (helix encode is lossy). The "reversible residue" was a renamed stored string-table keyed by index — the very thing it claimed to avoid.
10
+
2. **The "Morton-tile stacked-pyramid perturbation-shader cascade" does not exist** (cascade-architect). 0 hits in either repo; Morton is explicitly *rejected* for Hilbert (`linalg/hilbert.rs:50`). Three deliverables (D-OCR-52, the reconstruction round-trip, the whole soa-centroid synthesis plan) were built on a fabricated subsystem name.
11
+
12
+
**Convergent drift (≥4 lenses):**
13
+
- Plans argue "HelixResidue is 48 B, category-wrong, don't use it" — #498 made it **6 B** (a stored `Signed360` place index), which IS the keep-the-index design the plan wanted. Every byte budget was dead (Full 154→112, carve `[32,186)`→`[32,144)`).
14
+
- D-OCR-50 (`LayoutBlock::to_node_row`) already SHIPPED in #498 — described as future work.
- HHTL Doc→Page→Block→Line→Token onto HEEL/HIP/TWIG+family is a *coherent address-trie, NOT a Frankenstein* (family-codec + cascade) — but it spends the similarity-basin semantics on layout, so OCR nodes must be `classid`-marked as layout-addressed.
17
+
18
+
**Disposition.** All 7 plans corrected (rebaselined to #498; Morton purged → real primitives `framebuffer::build_mipmap_pyramid` / `splat3d/depth_cascade` / CAKES; reversibility reframed to identity→content-store + codebook-as-repair-signal; §0 tripwires fixed; master critical-path fixed per the open CodeRabbit Major). Unmeasured claims (int8-exact LSTM, bit-reproducible diff, 200k-LOC 1:1 layout) gated behind 4 probes in `ocr-probes-v1.md` (OCR-RT/DET/POST/SCHEMA); **OCR-SCHEMA shipped as a contract test** proving OCR rides an existing preset (no new `ValueSchema` variant).
19
+
20
+
**Lesson.** When two PRs touch the same substrate within hours, the later merge silently invalidates the earlier plan's premises. Plans citing sizes/budgets/file:line must be rebaselined the moment a substrate PR lands — and "reversible / never-stored" claims must be PROVEN against the actual decode path before becoming a migration's rationale.
> **Skip-by-rule:** OCR introduces NO bespoke row geometry. It rides the existing value-tenant carve.
8
8
@@ -35,41 +35,69 @@ point of the splat-native / "one representation, many views" doctrine, applied t
35
35
36
36
- Mint an OCR class family in OGAR (`ogar-ontology`): `Document → Page → Block →
37
37
Line → Token`, with leaf token subtypes (`Word`, `Number`, `Date`, `Currency`,
38
-
`Glyph`, `TableCell`). Until OGAR mints them, hardcode the classid prefix space
39
-
per the reserve-don't-reclaim ladder (the classid bytes stay reserved at offset 0).
38
+
`Glyph`, `TableCell`). **Stay at classid `0x0000_0000` (bootstrap address,
39
+
identity-only discrimination) until OGAR actually mints the class** — do NOT
40
+
hardcode a non-zero classid prefix, which would wake prefix-routing with no
41
+
registry entry and fall silently to `ReadMode::DEFAULT` (matches shipped `ocr.rs`,
42
+
which writes `NodeGuid::new(classid, 0,0,0, FAMILY_DEFAULT, identity)`).
43
+
-**HHTL = a layout-address trie for OCR nodes, NOT a similarity cascade.** The
44
+
5 layout levels map onto 3 key tiers + family + identity as a *prefix*
45
+
decomposition: Document/Page/Block → HEEL/HIP/TWIG (radix-walk prefix), Line →
46
+
family (locality basin), Token → identity. This deliberately forgoes the
47
+
*similarity-basin* reading of HEEL/HIP/TWIG/family (canon's coarse→fine
48
+
neighbourhood tiers); the OCR `classid` marks these nodes as layout-addressed so no
49
+
cross-document family-purity / two-basin benchmark runs against these coordinates.
40
50
-`ClassView` for the OCR class declares `edge_codec_flavor` (`CoarseOnly`) and
41
-
`value_schema` (the OCR preset, §3).
51
+
`value_schema` (ride `Full` POC / `Compressed` — no new variant, §3).
42
52
43
53
## 3. OCR `ValueSchema` preset over EXISTING tenants (D-OCR-51)
44
54
45
-
The 480-byte value slab already carves into `VALUE_TENANTS`. An OCR token is **not
46
-
a stored string and not a hash** — it is the *terminal of the perturbation cascade*,
47
-
reconstructed exactly like every other node. Text = codebook index + residue.
55
+
The 480-byte value slab already carves into `VALUE_TENANTS`. An OCR token's
56
+
**recognized string is NOT stored in the node** (I-VSA-IDENTITIES, enforced by
57
+
shipped `ocr.rs:97-101`): the node is the *identity that points to* OCR content;
58
+
the string + pixel geometry live in an external content store keyed by `identity`.
59
+
The value tenants carry typed scalars + a compressed similarity coordinate for
60
+
*repair / disambiguation*, never a reversible text payload.
48
61
49
-
| Tenant (existing) | OCR role |
62
+
| Tenant (existing, post-#498 sizes) | OCR role |
50
63
|---|---|
51
-
|helix residue = **centroid attention field** (NOT a stored code) |The 24-bit golden index is the **query↔centroid alignment**(φ-spiral direction = how this point attends to its place-centroid); the Morton-tile stacked-pyramid perturbation-shader is **multi-scale attention** (coarse centroid → fine perturbation = HHTL cascade in residue space). The field is **evaluated from the φ-template, never stored** ("8K resolution at Super-8 cost" — only the index is kept). Place=HHTL centroid; residue=perturbation off it. The 48-byte`ValueTenant::HelixResidue` is category-wrong (stores a field that must be computed) — do NOT use it. |
52
-
|`TurbovecResidue` (16 B, PQ) |PQ edgeresidue → CAKES nearest-valid-token search over the codebook |
|`HelixResidue` (**6 B = 48 bit `Signed360`**, NOT 48 B) |A **stored**golden-spiral *place index* (rim 3 B + sign-partition polar 1 B + golden azimuth 2 B; `helix/src/residue.rs:63-116`). The 6 B IS the kept index; the multi-scale field is the *deterministic decode* of it (`RollingFloor::quantize` / `HemispherePoint::lift`, pure `&self`) — "8K resolution at Super-8 cost." It is a place code, **not** a confidence carrier. (The old "48-byte, category-wrong, do NOT use it" line was written pre-#498 against a bits→bytes slip; the tenant is **6 B** and is exactly the keep-the-index design — use it.)|
65
+
|`TurbovecResidue` (16 B, `Pq32x4`) |The **edge-block** PQ residue (`EdgeCodecFlavor::Pq32x4`, rank-preserving / absolute-distance-lossy, ICC 0.11–0.29). NOT the glyph→word carrier — nearest-**valid**-token needs absolute distance, so the glyph→word search uses **DeepNSM's `Codebook` CamCodes** (6×256×16, 6 B; `deepnsm/src/codebook.rs`) + `vocabulary.rs` reverse, not this tenant.|
66
+
|`Meta` (u64) |A SMALL codebook anchor only (a ≤12-bit vocab rank fits). It does NOT carry confidence (→ `Energy`, shipped `ocr.rs:112-114`), repair flags (→ `Plasticity`), or the OOV recoder-code (→ external content store). `Meta` is the cognitive `MetaWord`; overloading it 5 ways is an I-LEGACY-API-FEATURE-GATED hazard (one u64, different meaning per class). Prefer a future `ValueTenant::OcrEvidence` (OD-1) for OCR-specific evidence.|
0 commit comments