Skip to content

Commit 4acea26

Browse files
authored
Merge pull request #498 from AdaWorldAPI/claude/wonderful-hawking-lodtql
feat(contract): GUID decode→read-mode keystone + helix Signed360 right-size + OCR→NodeRow transcode
2 parents cfcd4af + ef5a362 commit 4acea26

12 files changed

Lines changed: 820 additions & 30 deletions

File tree

.claude/board/LATEST_STATE.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,18 @@
1010
1111
---
1212

13+
> **2026-06-15 — REVERTED (operator)** — the tesseract-rs `soa` wiring below was **deleted** (branch reset to master `420de08`). Operator: *"we don't want to use original Tesseract, we want to transcode it into Rust — delete everything you copied from original Tesseract into tesseract-rs."* Wrapping the original Tesseract C engine + parsing its TSV is the wrong direction; the real goal is a **pure-Rust OCR**. The contract-side transcode (`LayoutBlock::to_node_row`) + keystone STAY — they are OCR-engine-agnostic (a pure-Rust OCR feeds the same `LayoutBlock``NodeRow`); only the original-Tesseract coupling was removed. The strike-through entry below is retained per APPEND-ONLY.
14+
>
15+
> ~~**2026-06-15 — cross-repo landed****tesseract-rs fork wired to the transcode.**~~ *(REVERTED — see above)* `AdaWorldAPI/tesseract-rs` branch `claude/wonderful-hawking-lodtql` commit `1687c718`: opt-in `soa` feature (default-OFF — standalone OCR build untouched) + `src/soa.rs::tsv_to_nodes(tsv, classid, min_conf) -> Vec<NodeRow>` parsing tesseract `get_tsv_text` word rows → `contract::ocr::LayoutBlock``to_node_row`. Contract dep is a path dep mirroring smb-office-rs (sibling checkout). **Edition-2015 compatible** (the fork has no `edition` field → 2015: root `extern crate` + submodule root-relative `use` + explicit `TryInto` — all caught + fixed by verifying in a 2015 scratch crate against the real contract before pushing, 2 tests green). Pushed via `GH_TOKEN`+pygithub (out-of-MCP-scope fork). Could NOT compile the full crate here (no tesseract C-lib) — the transcode LOGIC is what's verified; the fork's own CI needs a co-located lance-graph for `--features soa`.
16+
>
17+
> **2026-06-15 — branch work (post-#496)** — **tesseract OCR → NodeRow transcode POC (keystone payoff).** `lance_graph_contract::ocr::LayoutBlock::to_node_row(classid, identity) -> NodeRow` — the reference transcode any `OcrProvider` (tesseract-rs + others) reuses, the keystone end-to-end: `classid → classid_read_mode → ValueSchema` gates WHICH tenants land; `BlockKind::entity_type() -> u16` → `ValueTenant::EntityType`, `confidence: f32` → `ValueTenant::Energy`, each written at its canon slab offset via the new `ValueTenant::{value_offset(), byte_len()}` (derived accessors over the locked carve — not new properties). **`text`/`bbox` are NOT bundled** (`I-VSA-IDENTITIES`: node = identity + typed scalars; the string + pixel geometry live in an external content store keyed by `identity`). Schema-gated (`schema.has(t)` before each write) so a Bootstrap-resolving class writes an empty slab; transcoded rows ride the `SoaEnvelope` zero-copy (verified). §0 anti-invention: reuses the existing EntityType/Energy tenants, no "ocr_kind" field. +4 tests; **623 contract lib green; clippy `-D warnings` + fmt clean.** Lives in the contract (next to the `ocr` types it uses, zero-dep, testable here — no OCR C-lib, no fork); tesseract-rs just adds the contract dep + calls it (integration step). Branch, not yet a PR.
18+
>
19+
> **2026-06-15 — branch work (post-#496)** — **keystone (contract half): GUID decode + classid→read-mode `LazyLock`.** `lance_graph_contract::canonical_node::{GuidParts, ReadMode, classid_read_mode}` + `NodeGuid::{heel(), hip(), twig(), decode() -> GuidParts, read_mode() -> ReadMode}` (re-exported from `lib.rs`). **The "read the GUID as a GUID" surface** the operator spec'd: `decode()` returns all six canon groups (classid + HHT·HEEL/HIP/TWIG + family·"Leaf" + identity) in one read; `ReadMode` bundles the two *already-existing* read-mode axes (`ValueSchema` + `EdgeCodecFlavor`) — **NOT a new node property, NOT a SoA column** (§0 anti-invention; it's the resolution lens, nothing stored on the row); `classid_read_mode(u32)` is the **single source both the consumer and OGAR inherit** — a `LazyLock<HashMap<u32,ReadMode>>` builtin registry (same immutable-after-init pattern `lance-graph-ontology` uses for its seed namespace registry), zero-fallback to `ReadMode::DEFAULT` for any unconfigured classid. `ReadMode::DEFAULT = {Full, CoarseOnly}` mirrors the `ClassView::value_schema` POC default (paired revert; `read_mode_default_is_full_poc` guards it). `Display` deduped onto the new HHT accessors. +6 tests (decode round-trip, HHT↔Display, read-mode single-source, carrier delegation, full-slab connect); **619 contract lib green; clippy `-D warnings` + fmt clean.** Delivers the contract-side half of the #496 keystone; the ontology-side `NiblePath::from_guid_prefix` (20→≤16-nibble subset) meets it at the classid (follow-up). Branch, not yet a PR.
20+
>
21+
> **2026-06-15 — branch work (post-#496)** — **helix `Signed360` codec + `HelixResidue` right-sized 48 B → 6 B.** Operator caught a slab over-allocation: `HelixResidue` reserved **48 *bytes*** but the intent was a 24-bit equal-area hemisphere **doubled = 48 *bit* = 6 B** (a bits→bytes slip; 42 dead bytes), and the tenant used **none** of the `helix` crate (zero-dep contract — only a doc string). Fixed: **(1) `helix::Signed360`** — the signed full-sphere codec: `HemispherePoint::signed_lift(n,N,sign)` (`y = sign·√(1−u)` → full sphere, `r²+y²=1`), `Sign{Pos,Neg}`, and `Signed360 {rim: ResidueEdge, polar: signed-lift centred@128 (sign recoverable), azimuth: u16 over 360°}` + `ResidueEncoder::encode_signed`. +9 tests; **helix 72 lib + 7 doctests green; lib clippy `-D warnings` + fmt clean.** **(2) contract** `HelixResidue.elems_per_row` 48→6, downstream tenants shifted (Turbovec 118 / Energy 134 / Plasticity 138 / EntityType 142), budgets re-locked (**Full 154→112, Compressed 98→56**); **613 contract green.** **NO `HelixFlavour` enum** — one canonical encoding, one tenant size (a fixed-offset SoA can't vary width per-class; Hemisphere = degenerate `sign=+`); the contract stays zero-dep, the producer writes `Signed360::to_bytes` into the 6 B. Cheap NOW (POC FULL default, no persisted real instances); after instances persist it's a version bump. Branch, not yet a PR. New: `TD-HELIX-PROBE-CLIPPY` (pre-existing `probe_mantissa_fill` clippy/fmt drift, NOT introduced here — helix is excluded so CI-invisible, same class as the standing `causal-edge` 47/1 red).
22+
>
23+
> **2026-06-15 — MERGED #496** (integrated-cognitive-planner reference map + ValueSchema + FULL POC default): `lance_graph_contract::canonical_node::{ValueSchema, ValueTenant, VALUE_TENANTS}` — the value-side `EdgeCodecFlavor` analog (9 append-only tenants carving `[32,186)`; presets Bootstrap/Cognitive/Compressed/Full). `ClassView::value_schema()` default flipped **Bootstrap→Full (TEMPORARY POC** — every unconfigured class materialises the full slab so consumers transcode against it; `TD-VALUESCHEMA-FULL-POC-DEFAULT` revert-when-POC-concludes; type-level `ValueSchema::default()` stays Bootstrap, only class→schema *resolution* flips). New reference plan `.claude/plans/integrated-cognitive-planner-v1.md` — **§0 ANTI-INVENTION GUARDRAIL (READ FIRST)**, §1–§7 grounded file:line map, §8 7-item additive ledger, §9 3-hardener verdicts; the SPEC for the integrated-planner refactor (~90% exists; remaining = the keystone + 6 seams, NOT a new build). CI 5/5 green; contract 613 lib tests; merge `2e58e034`. **The keystone = `NiblePath::from_guid_prefix` (the 20→≤16-nibble subset) + classid→ClassView read-mode on `lance-graph-ontology::registry` (already an immutable conflict-refusing `entity_type↔NiblePath` bijection)** — the single next unblock that converges the refactor, the tesseract-rs OCR transcode (`contract::ocr` → NodeRow), AND the OGAR-identity migration (`soa-migration-diff-resolution-2026-06-13.md`). HEEL=cache `dolce_id` / HIP·TWIG=deterministic subClassOf descent / registry=recorder-not-minter (verified `registry.rs`+`wikidata_hhtl.rs`). New: `TD-COARSERESIDUE-NO-VALUE-TENANT`, `TD-LAZY-IMPORT-VERSION-PIN`; IDEAS CLAM-residue-ladder TODO.
24+
>
1325
> **2026-06-13 — shipped (autoattended, cross-repo)** (turbovec ⇄ ndarray): new excluded standalone crate **`crates/lance-graph-turbovec`** — Google TurboQuant (arXiv 2504.19874, the AdaWorldAPI `turbovec` fork) bridged onto the spine. `TurboVec` wraps `turbovec::TurboQuantIndex` with a `Kernel::{NativeLut, PolyfillGemm}` A/B switch. **Cross-repo (branch `claude/wonderful-hawking-lodtql` in turbovec + ndarray + lance-graph):** turbovec re-pointed from crates.io `ndarray 0.17` → the AdaWorldAPI fork (path, P0 forks-only; `blas` opt-in so default builds BLAS-free; `rust-toolchain.toml` = 1.95.0); new `turbovec::search_polyfill` (feature `ndarray-simd`) expresses scoring as a batched int8 GEMM via **`ndarray::simd::matmul_i8_to_i32`** (re-exported through `simd.rs` — AMX `TDPBUSD` tile → AVX-512 VPDPBUSD → AVX-VNNI → scalar, dispatched inside ndarray, zero intrinsics in turbovec). **Measured finding (E-TURBOVEC-AMX-WRONG-TOOL-1):** the polyfill GEMM is 11.4× SLOWER than the native nibble-LUT (TurboQuant trades the matmul away → AMX accelerates the op it removed); native LUT stays production, polyfill is the AMX-ready baseline. Placement: index → spine, kernel-math → ndarray (already owns clam/cam_pq/cascade/amx_matmul). Synergy map (HDR popcount stacking early-exit, Belichtungsmesser σ thresholds, preheating vs palette256) in `crates/lance-graph-turbovec/KNOWLEDGE.md`. Tests green in all three repos; benchmark via `examples/kernel_speed.rs`. NOT a merged PR yet (branch work).
1426
>
1527
> **2026-06-03 — hardened (follow-up after #460)** (D-HELIX-1 wiring): `crates/helix` now takes **ndarray as a MANDATORY, non-optional git dependency** (`git = AdaWorldAPI/ndarray @ master`), replacing the optional `path` dep + `ndarray-hpc` feature. Why: (1) codex P2 — an optional *path* dep still forces Cargo to read the local sibling manifest at resolution, so a clean checkout failed before feature selection; (2) directive "ndarray is mandatory for lance-graph". `simd.rs` always uses `ndarray::simd` (no scalar fallback); the self-contained fork → no import cycle. 63 unit + 6 doctests green; clippy/fmt clean. See E-HELIX-NDARRAY-MANDATORY.

.claude/board/PR_ARC_INVENTORY.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,34 @@
3535
3636
---
3737

38+
## #498 GUID decode→read-mode keystone + helix Signed360 right-size + OCR→NodeRow transcode
39+
40+
**Status:** OPEN 2026-06-15 (branch `claude/wonderful-hawking-lodtql`, 8 commits post-#496). In review (CodeRabbit + codex). **NOTE:** this entry documents the helix / keystone / OCR / causal-edge work that CodeRabbit on PR #498 mis-attributed to #496 — those changes are **#498's, not #496's**. #496 shipped only ValueSchema presets + the reference plan (its immutable entry below correctly shows the pre-right-size 154/98 B budgets).
41+
42+
**Added:** (1) **Keystone** `canonical_node::{GuidParts, ReadMode, classid_read_mode}` + `NodeGuid::{heel/hip/twig, decode()→GuidParts, read_mode()}` — read-the-GUID-as-a-GUID decode + a `LazyLock<HashMap>` classid→read-mode registry (the single source consumer + OGAR inherit); `ReadMode` bundles the two existing axes (`ValueSchema` + `EdgeCodecFlavor`), no new property. (2) `helix::{Sign, Signed360}` + `HemispherePoint::signed_lift` + `ResidueEncoder::encode_signed` — signed full-sphere codec; **`HelixResidue` value-tenant right-sized 48 B → 6 B** (bits→bytes slip fix) → downstream offsets shifted (Turbovec 160→118, Energy 176→134, …), budgets re-locked (Full 154→112, Compressed 98→56), value carve now `[32,144)`. (3) `ocr::{BlockKind::entity_type, LayoutBlock::to_node_row}` + `ValueTenant::{value_offset, byte_len}` — OCR-engine-agnostic transcode. (4) causal-edge `test_build_fast` boundary `<`→`<=` (standing red on main fixed). (5) **`ENVELOPE_LAYOUT_VERSION` 1→2** — gates the value-slab offset shift (codex P2). Tests: contract 623 lib, helix 73 lib + 7 doc, causal-edge green.
43+
44+
**Locked:** (1) **one `NodeGuid` only** — the #490-retired `identity::NodeGuid` (UUIDv8) stays retired; the keystone extends the canon `canonical_node::NodeGuid`. (2) `ReadMode::DEFAULT = {Full, CoarseOnly}` mirrors the ClassView POC default; both flip back to Bootstrap together (guard `read_mode_default_is_full_poc`). (3) **`Signed360` sign-partition**`|y|`-in-7-bits + sign in the partition (Pos ⇒ polar [128,255], Neg ⇒ [0,127]); sign is exact at `|y|≈0` at the rim (codex P2 fix, regression test `signed360_neg_sign_survives_near_rim_at_high_total`). (4) text/bbox never bundled into the node — content store keyed by identity (I-VSA-IDENTITIES). (5) a value-slab offset shift is **version-gated, not reserved-gap** — safe pre-persistence (FULL is POC-only; codex P2 disposition).
45+
46+
**Deferred:** ontology-side `NiblePath::from_guid_prefix` (the keystone's other half); pure-Rust OCR via `ocrs`/`rten` (the tesseract-rs C-wrapper POC was reverted — wrong direction). `TD-VALUESCHEMA-FULL-POC-DEFAULT` paired-revert note updated (ReadMode::DEFAULT pairs with ClassView).
47+
48+
**Docs:** board `LATEST_STATE` + `TECH_DEBT` updated; this entry.
49+
50+
**Confidence (2026-06-15):** open — both codex P2s dispositioned (ENVELOPE_LAYOUT_VERSION bump for the offset shift; Signed360 sign-partition fix + regression test); CodeRabbit's #496-vs-#498 misattribution corrected here.
51+
52+
## #496 integrated-cognitive-planner reference map + ValueSchema presets + FULL POC default
53+
54+
**Status:** MERGED 2026-06-15 (merge commit `2e58e034`), branch `claude/wonderful-hawking-lodtql`. CI 5/5 green (format/clippy/linux-build/test/test-with-coverage). CodeRabbit 2 threads resolved; codex 2×P2 dispositioned (FULL-default intentional; CoarseResidue tracked as TD).
55+
56+
**Added:** `lance_graph_contract::canonical_node::{ValueSchema, ValueTenant, VALUE_TENANTS}` — value-side analog of `EdgeCodecFlavor`; 9 append-only tenants carve `[32,186)`; 4 presets (Bootstrap EMPTY / Cognitive 58 B / Compressed 98 B / Full 154 B). `ClassView::value_schema()` default flipped **Bootstrap→Full (TEMPORARY POC)** + guard test `value_schema_default_is_full_temporary_poc`. New `.claude/plans/integrated-cognitive-planner-v1.md` (file:line reference map). Lance pin doc-sweep 6→7 / 0.29→0.30 across CLAUDE.md + boards + plans. Contract 613 lib tests.
57+
58+
**Locked:** (1) **§0 ANTI-INVENTION GUARDRAIL** — no new skewed SoA properties; the 9 ValueTenants + 4 BindSpace columns are closed; new capability = new column/class, never a new layer; specialisation is opt-IN (mint a class). (2) FULL POC default is class→schema *resolution* only; type-level `ValueSchema::default()` stays Bootstrap (substrate zero-fallback intact). (3) emit channels `emitted_edges`(CausalEdge64 words) vs `emitted_moves`(KanbanMove) are SEPARATE — no `KanbanMove→u64` cast. (4) `cycle()` stays inherent (object-safety, keeps `Box<dyn>` consumers). (5) seam #2 as-of read is closure-injected (planner ⊥ async `at_version`). (6) dual `RungLevel` — mirror thinking-engine's `should_elevate`, don't duplicate.
59+
60+
**Deferred:** the §8 7-item additive ledger (CognitiveCycle sequencer / RungLevel constructors / temporal.rs A→contract + B core temporal_read / ScopedReference / MarkingRow / NiblePath `from_guid_prefix` / ExecTarget::can_drive) — gated on D-MBX-A6-P3 + the keystone. `TD-VALUESCHEMA-FULL-POC-DEFAULT` (revert FULL→Bootstrap when POC concludes), `TD-COARSERESIDUE-NO-VALUE-TENANT`, `TD-LAZY-IMPORT-VERSION-PIN`.
61+
62+
**Docs:** `integrated-cognitive-planner-v1.md` (§0 guardrail, §1–§7 grounded map, §2.1 ExecTarget, §3.1 causal-arc, §4.1 0-friction, §8 cross-savant synthesis, §9 hardening verdicts). 5-savant expansion + 3-hardener (PP-13/15/16) folded.
63+
64+
**Confidence (2026-06-15):** working — merged clean, CI green, 613 contract lib tests. The plan is the SPEC for the integrated-planner refactor; the keystone (`from_guid_prefix` + classid→ClassView read-mode on `registry.rs`) is the single next unblock for refactor + tesseract + OGAR-identity migration.
65+
3866
## #459 helix-place-residue-codec — golden-spiral Place/Residue codec (zero-dep + optional ndarray-hpc)
3967

4068
**Status:** MERGED 2026-06-03 (merge commit `ef35ff1`), branch `claude/gallant-rubin-Y9pQd`. New standalone crate; autoattended wave (5 read-only research agents + 4 parallel Sonnet leaf workers + central consolidation). 63 unit + 6 doctests green on both feature configs; clippy -D warnings + fmt clean. One CodeRabbit review round resolved pre-merge.

0 commit comments

Comments
 (0)