docs(plan): Tesseract → tesseract-rs 1:1 transcode (LSTM hosted via embedanything) — v2#497
Conversation
📝 WalkthroughWalkthroughAdds seven design-only markdown documents to ChangesOCR Integration Architecture Design Plans
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ddbd9775e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
🧹 Nitpick comments (9)
.claude/plans/ocr-canonical-soa-integration-v1.md (2)
79-79: 💤 Low valueClarify CHAODA reference.
Line 79 mentions "CHAODA flags anomalous tokens" without introducing what CHAODA is. If it's an external algorithm or crate, add a brief note or parenthetical definition for readers unfamiliar with it (e.g., "(Clustered Hierarchical Outlier Detection via Aggregation — see crate X)").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/ocr-canonical-soa-integration-v1.md at line 79, The reference to CHAODA on line 79 lacks context for unfamiliar readers. Add a brief clarification by including a parenthetical note or expanded definition immediately after the first mention of CHAODA that explains what it stands for (e.g., "Clustered Hierarchical Outlier Detection via Aggregation") and optionally references the relevant crate or documentation where it is defined, so readers understand the purpose and origin of this algorithm.
57-65: ⚖️ Poor tradeoffValidate ValueSchema preset selection.
The plan states "define
ValueSchema::Ocr(or selectCognitiveif its mask already covers the above)." Looking at the provided context,Cognitiveincludes Meta/Qualia/Fingerprint/Energy/Plasticity/EntityType but not TurbovecResidue or HelixResidue. The OCR use table (lines 48-55) lists both TurbovecResidue and HelixResidue as OCR tenants. This means either:
- Define a new
ValueSchema::Ocrvariant that includes all required tenants, or- Adjust the OCR tenant list to match an existing schema.
The current tentative wording ("or select") is good design hygiene for a deferred decision, but the D-OCR-51 acceptance should clarify which path was chosen (new schema vs existing), since the choice affects the FieldMask declaration and the value-slab carve.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/ocr-canonical-soa-integration-v1.md around lines 57 - 65, Verify and clarify the ValueSchema selection for OCR tenants in the plan. Check that the chosen ValueSchema variant (either the new ValueSchema::Ocr or an existing preset like Cognitive) includes all OCR-required tenants listed in the OCR use table (TurbovecResidue and HelixResidue). Update the plan text to explicitly state which path was selected—either define a new ValueSchema::Ocr variant that encompasses all required tenants or adjust the OCR tenant list to match an existing schema. Ensure this decision is documented in D-OCR-51 acceptance criteria so the FieldMask declaration and value-slab carve implementation are unambiguous..claude/plans/tesseract-rs-traineddata-ndarray-v1.md (1)
37-46: 💤 Low valueAdd language specifier to directory-tree code block.
Line 37 opens a code fence without a language tag. Add
```textor```treefor clarity.🔧 Proposed fix
- ``` + ```text traineddata/ container.rs // TessdataManager: offset table parse → component byte slices (CODEGEN: D-OCR-40)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/tesseract-rs-traineddata-ndarray-v1.md around lines 37 - 46, The opening code fence at line 37 in the markdown file is missing a language specifier, making it unclear what format the code block represents. Change the opening backticks from ``` to ```text to properly identify the directory tree structure and improve readability in the rendered markdown.Source: Linters/SAST tools
.claude/plans/tesseract-rs-transcode-master-v1.md (2)
3-19: 💤 Low valueFix blockquote formatting: remove extra spaces after
>.Lines 3-19 (the metadata header) have multiple spaces after the blockquote symbol
>, violating MD027. Single space is standard Markdown.🔧 Proposed fix: normalize blockquote spacing
- > **Type:** plan family root (forward marker / co-architecture). Plants the + > **Type:** plan family root (forward marker / co-architecture). Plants the - sub-plans; owns the deliverable index, the dependency DAG, and the + sub-plans; owns the deliverable index, the dependency DAG, and the - skip-list rationale. + skip-list rationale. - > **Status:** PLANTED 2026-06-15 — design only, no code. Layout/contracts proposed + > **Status:** PLANTED 2026-06-15 — design only, no code. Layout/contracts proposed - against the post-#496 front. + against the post-#496 front.(Apply similarly to lines 7, 9-11, 13-14, 15-17, 19.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/tesseract-rs-transcode-master-v1.md around lines 3 - 19, The metadata header block in the file uses blockquote formatting with extra spaces after the `>` symbol, violating MD027 formatting standards. Fix this by normalizing all blockquote lines throughout the header section to use exactly one space after the `>` symbol instead of multiple spaces. This applies to all blockquote lines in the metadata header, including the Type, Status, Front, Canon anchors, and Skip-by-rule sections.Source: Linters/SAST tools
41-54: 💤 Low valueAdd language specifier to diagram code block.
Line 41 opens a fenced code block without a language tag. Since this is a textual flowchart/diagram, add
```textor```diagramfor consistency and future syntax-highlighting.🔧 Proposed fix
- ``` + ```text PDF / image🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/tesseract-rs-transcode-master-v1.md around lines 41 - 54, The fenced code block beginning with "PDF / image" is missing a language specifier on the opening fence. Add a language tag (text or diagram) to the opening backticks of this code block to enable proper syntax highlighting and maintain consistency with markdown best practices. Change the opening ``` to ```text or ```diagram.Source: Linters/SAST tools
.claude/plans/tesseract-rs-ast-dll-codegen-v1.md (2)
30-33: 💤 Low valueAdd language specifier to flow diagram code block.
Line 30 opens a fenced code block (the
C++ source → ... → formatted .rsflow) without a language tag. Add```textfor consistency.🔧 Proposed fix
- ``` + ```text C++ source ──(libclang)──► Clang AST ──► [AST DLL: stable IR dump] ──► RustAst builder🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/tesseract-rs-ast-dll-codegen-v1.md around lines 30 - 33, The opening fence for the flow diagram code block (starting with the C++ source → Clang AST → ... flow) is missing a language specifier. Change the opening ``` to ```text to properly tag the code block language for consistency with markdown formatting standards.Source: Linters/SAST tools
19-28: ⚖️ Poor tradeoffClarify ruff codegen adaptation scope.
The plan states we reuse ruff's patterns (codegen/formatter/dto_check) to emit Rust (not Python). This is honest, but §5 ("Module assignment") does not explicitly address the effort of adapting
ruff_python_codegen/ruff_formatterto emit Rust source instead of Python. A brief note on whether those crates are language-agnostic or require shims would help scope the D-OCR-41 work.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/tesseract-rs-ast-dll-codegen-v1.md around lines 19 - 28, Add a brief note in section 5 ("Module assignment") that explicitly scopes the adaptation effort for reusing ruff's codegen and formatter crates. Clarify whether ruff_python_codegen and ruff_formatter are language-agnostic enough to emit Rust source with minimal changes, or whether they require shims/wrapper layers to decouple them from Python-specific logic. This note should directly address the D-OCR-41 work scope and help readers understand the actual effort required to adapt these crates from Python emission to Rust emission..claude/plans/tesseract-rs-neural-layout-ocrs-v1.md (2)
35-38: 💤 Low valueAdd language specifier to pipeline diagram code block.
Line 35 opens a fenced code block (the
preprocess → ocrs::detection → ... → tokens+confidenceflow) without a language tag. Add```textfor clarity.🔧 Proposed fix
- ``` + ```text preprocess (image/imageproc) ─► ocrs::detection ─► ocrs::layout_analysis (reading order)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/tesseract-rs-neural-layout-ocrs-v1.md around lines 35 - 38, The fenced code block containing the pipeline diagram (starting with preprocess and flowing through ocrs::detection, ocrs::layout_analysis, etc.) is missing a language specifier. Change the opening triple backticks from ``` to ```text to explicitly declare the code block language type for proper markdown rendering and clarity.Source: Linters/SAST tools
40-42: ⚖️ Poor tradeoffClarify rten model asset availability.
The plan says "Confirm the converter + current model assets are present in the fork before relying on them (D-OCR-30 acceptance)." This is a prudent gating criterion, but the acceptance gate should explicitly state: (1) which ONNX models are the source (detection + recognition), (2) whether
rten-convertis confirmed to work on them, and (3) whether.rtenblobs are vendored in the fork or fetched/converted at build time. Currently deferred to acceptance; recommend documenting the exact check-list in a follow-up.Would you like me to open a follow-up ticket to detail the rten asset inventory and conversion steps as part of D-OCR-30 acceptance?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/tesseract-rs-neural-layout-ocrs-v1.md around lines 40 - 42, The D-OCR-30 acceptance gate described in the Models section is too vague about what must be confirmed before relying on the rten conversion approach. Expand this acceptance criterion to explicitly document three specific checks: (1) identify which ONNX models serve as the source for detection and recognition, (2) confirm that rten-convert successfully processes those models, and (3) clarify whether the resulting .rten blobs will be vendored directly in the fork or fetched and converted at build time. This ensures the acceptance gate provides a concrete checklist rather than leaving the verification steps ambiguous.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.claude/plans/ocr-canonical-soa-integration-v1.md:
- Line 79: The reference to CHAODA on line 79 lacks context for unfamiliar
readers. Add a brief clarification by including a parenthetical note or expanded
definition immediately after the first mention of CHAODA that explains what it
stands for (e.g., "Clustered Hierarchical Outlier Detection via Aggregation")
and optionally references the relevant crate or documentation where it is
defined, so readers understand the purpose and origin of this algorithm.
- Around line 57-65: Verify and clarify the ValueSchema selection for OCR
tenants in the plan. Check that the chosen ValueSchema variant (either the new
ValueSchema::Ocr or an existing preset like Cognitive) includes all OCR-required
tenants listed in the OCR use table (TurbovecResidue and HelixResidue). Update
the plan text to explicitly state which path was selected—either define a new
ValueSchema::Ocr variant that encompasses all required tenants or adjust the OCR
tenant list to match an existing schema. Ensure this decision is documented in
D-OCR-51 acceptance criteria so the FieldMask declaration and value-slab carve
implementation are unambiguous.
In @.claude/plans/tesseract-rs-ast-dll-codegen-v1.md:
- Around line 30-33: The opening fence for the flow diagram code block (starting
with the C++ source → Clang AST → ... flow) is missing a language specifier.
Change the opening ``` to ```text to properly tag the code block language for
consistency with markdown formatting standards.
- Around line 19-28: Add a brief note in section 5 ("Module assignment") that
explicitly scopes the adaptation effort for reusing ruff's codegen and formatter
crates. Clarify whether ruff_python_codegen and ruff_formatter are
language-agnostic enough to emit Rust source with minimal changes, or whether
they require shims/wrapper layers to decouple them from Python-specific logic.
This note should directly address the D-OCR-41 work scope and help readers
understand the actual effort required to adapt these crates from Python emission
to Rust emission.
In @.claude/plans/tesseract-rs-neural-layout-ocrs-v1.md:
- Around line 35-38: The fenced code block containing the pipeline diagram
(starting with preprocess and flowing through ocrs::detection,
ocrs::layout_analysis, etc.) is missing a language specifier. Change the opening
triple backticks from ``` to ```text to explicitly declare the code block
language type for proper markdown rendering and clarity.
- Around line 40-42: The D-OCR-30 acceptance gate described in the Models
section is too vague about what must be confirmed before relying on the rten
conversion approach. Expand this acceptance criterion to explicitly document
three specific checks: (1) identify which ONNX models serve as the source for
detection and recognition, (2) confirm that rten-convert successfully processes
those models, and (3) clarify whether the resulting .rten blobs will be vendored
directly in the fork or fetched and converted at build time. This ensures the
acceptance gate provides a concrete checklist rather than leaving the
verification steps ambiguous.
In @.claude/plans/tesseract-rs-traineddata-ndarray-v1.md:
- Around line 37-46: The opening code fence at line 37 in the markdown file is
missing a language specifier, making it unclear what format the code block
represents. Change the opening backticks from ``` to ```text to properly
identify the directory tree structure and improve readability in the rendered
markdown.
In @.claude/plans/tesseract-rs-transcode-master-v1.md:
- Around line 3-19: The metadata header block in the file uses blockquote
formatting with extra spaces after the `>` symbol, violating MD027 formatting
standards. Fix this by normalizing all blockquote lines throughout the header
section to use exactly one space after the `>` symbol instead of multiple
spaces. This applies to all blockquote lines in the metadata header, including
the Type, Status, Front, Canon anchors, and Skip-by-rule sections.
- Around line 41-54: The fenced code block beginning with "PDF / image" is
missing a language specifier on the opening fence. Add a language tag (text or
diagram) to the opening backticks of this code block to enable proper syntax
highlighting and maintain consistency with markdown best practices. Change the
opening ``` to ```text or ```diagram.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7f730765-e8d2-491b-bd1f-8414c331fab8
📒 Files selected for processing (6)
.claude/plans/ocr-canonical-soa-integration-v1.md.claude/plans/tesseract-rs-ast-dll-codegen-v1.md.claude/plans/tesseract-rs-lstm-recodebeam-v1.md.claude/plans/tesseract-rs-neural-layout-ocrs-v1.md.claude/plans/tesseract-rs-traineddata-ndarray-v1.md.claude/plans/tesseract-rs-transcode-master-v1.md
…CR-53; pure-Rust front-end — addresses Codex
…CR-53; pure-Rust front-end — addresses Codex
…CR-53; pure-Rust front-end — addresses Codex
|
Addressed the Codex findings:
|
… crate's own framing
…8 tenant is the dilution gap
…ld, unify perception/reasoning/grammar (ocr-canonical-soa-integration-v1.md)
…ld, unify perception/reasoning/grammar (soa-centroid-attention-field-synthesis-v1.md)
…ance rule mining), not AriGraph — the in-tree proof of field-as-learner
|
Taking ownership of this PR. All three codex findings are dispositioned against the current branch (
Title note: "embedanything" is the deliberate LSTM-hosting layer ( CodeRabbit's initial pass was rate-limited and never ran — re-triggering now. @coderabbitai review Generated by Claude Code |
|
Triggering a fresh review now. ✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/plans/tesseract-rs-transcode-master-v1.md:
- Around line 74-76: The critical path on line 76 currently shows "40 → {10,30}
→ 16 → 21 → 42 → 53" but D-OCR-53's dependency list on line 74 includes D-OCR-50
and D-OCR-51 which are not represented in this path. Resolve this inconsistency
by either updating the critical path to explicitly include nodes D-OCR-50 and
D-OCR-51 in the sequence leading to D-OCR-53 (ensuring they appear between node
42 and 53), or remove D-OCR-50 and D-OCR-51 from the dependency list on line 74
and add a clear explanation of why these dependencies are not required for the
critical path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 04fb155f-267b-46a1-85bb-5ef10b509f50
📒 Files selected for processing (7)
.claude/plans/ocr-canonical-soa-integration-v1.md.claude/plans/soa-centroid-attention-field-synthesis-v1.md.claude/plans/tesseract-rs-ast-dll-codegen-v1.md.claude/plans/tesseract-rs-layout-transcode-v1.md.claude/plans/tesseract-rs-recodebeam-transcode-v1.md.claude/plans/tesseract-rs-traineddata-gguf-v1.md.claude/plans/tesseract-rs-transcode-master-v1.md
✅ Files skipped from review due to trivial changes (5)
- .claude/plans/tesseract-rs-layout-transcode-v1.md
- .claude/plans/tesseract-rs-recodebeam-transcode-v1.md
- .claude/plans/tesseract-rs-traineddata-gguf-v1.md
- .claude/plans/soa-centroid-attention-field-synthesis-v1.md
- .claude/plans/tesseract-rs-ast-dll-codegen-v1.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .claude/plans/ocr-canonical-soa-integration-v1.md
…5-specialist framing) Five specialists (cascade / family-codec / palette / dto-soa / truth-architect) framed the merged #497 OCR-transcode plans against the post-#498 substrate. Two showstoppers + 6-way drift; all 7 plans corrected: - HelixResidue 48 B → 6 B everywhere (a stored Signed360 index, not a 48-byte field); budgets/carve rebaselined (Full 112, [32,144)); headers #496 → #498. - "Morton-tile stacked-pyramid perturbation-shader" purged (does not exist; Morton rejected for Hilbert) → real primitives (mipmap pyramid / HHTL depth-cascade / CAKES). - "reversible without a hash" reframed: no residue→rank inverse exists; node = identity → content-store lookup, codebook = repair signal (I-VSA-IDENTITIES). - §0 tripwires: no ValueSchema::Ocr variant (ride Full/Compressed); Meta de-overloaded (confidence→Energy, provenance→Plasticity, OOV→content-store); TurbovecResidue is the edge codec, glyph→word uses DeepNSM CamCodes. - master critical path 42→53 becomes 42→{50,51}→53 (resolves the open #497 CodeRabbit Major). New ocr-probes-v1.md specs the 4 gating probes (OCR-RT/DET/POST/SCHEMA) for the unmeasured claims (int8-exact LSTM, bit-reproducible diff, 200k-LOC 1:1 layout). OCR-SCHEMA shipped as a contract test proving OCR rides an existing preset. EPIPHANIES E-OCR-PLAN-DRIFT-1 + AGENT_LOG entry. contract lib green; fmt clean. https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo
…oaOwner cherry-pick + LanceVersionScheduler + SurrealMailboxView (D-PG-6) Lands four tasks from the shortest-unblocking-path list surfaced after PR #497-#501 + the AdaWorldAPI/surrealdb fork bump (lance/lance-index =7.0.0, lancedb =0.30.0, ndarray exact-rev). All four meet at the single contract trait `MailboxSoaView`, closing the cascade in one commit (E-UNBLOCK-CASCADE-1). ## Task 3 — `NiblePath::{from_guid_prefix, prefix}` (zero-dep, foundational) Ontology-side keystone follow-up of PR #498's `classid → ReadMode` LE contract. The 20-nibble `classid(8) | HEEL(4) | HIP(4) | TWIG(4)` prefix overflows the 16-nibble MAX_DEPTH: the deterministic fold drops the canon-reserved high u16 of classid (root-first pack: `classid_lo(4) | HEEL(4) | HIP(4) | TWIG(4)`), returning None when the fold would be lossy. `prefix(d)` is the O(1) ancestor view; `prefix(d).is_ancestor_of(self)` holds for every d ≤ self.depth (the routing-cache view of a deeper class path). +7 tests in `hhtl::tests`; contract lib 619 → 632 green. ## Task 2 — `impl MailboxSoaView + MailboxSoaOwner for MailboxSoA<N>` Cherry-pick of jolly-cori-clnf9 commit 463d71b (integrated-cognitive-planner-v1 §2 Seam #3, +149 LOC). Adds `pub phase: KanbanColumn` field + zero-copy repr(transparent) slice impls (edges_raw, meta_raw) + the in-RAM Rubicon driving-loop test (`test_in_ram_driving_loop_walks_rubicon_to_commit`). The contract spine (#437/#439) now drives an actual loop end-to-end — no surreal, no ractor bus needed for the in-process case. +1 driving-loop test; cognitive-shader-driver lib 85 → 86 green. ## Task 1 — `LanceVersionScheduler` over `VersionedGraph::versions()` D-MBX-9-IN core impl (the CI-gated twin of the contract slice shipped 2026-05-31). Lives in `crates/lance-graph/src/graph/scheduler.rs`. Wraps a `VersionedGraph` + inner `VersionScheduler<S = NextPhaseScheduler>` and exposes: - `drive_once(view, exec)` — read current Lance version, lower to a move - `drive_at_latest(view, exec)` — fold `versions().last()` into a move - `current_dataset_version()` — typed `DatasetVersion` over nodes head Closes `E-SUBSTRATE-IS-THE-SCHEDULER`'s OUT-direction end-to-end. The OUT direction stays propose-not-dispose (R1): returned `KanbanMove` is for the caller's `MailboxSoaOwner::try_advance_phase` to apply. +5 tests with real on-disk tempdir Lance (no mocks). ## Task 4 — `SurrealMailboxView<'a>` (D-PG-6 contract slice) Read-only `MailboxSoaView` adapter the SurrealQL projection populates via `from_columns(...)` — pure zero-copy borrow over the kv-lance scan's byte buffers. Imports `MailboxSoaView` but NOT `MailboxSoaOwner` (compile-time enforcement of `kanban.rs:1-21` "surreal=project-read-only, callcenter=commit"). `read_via_kv_lance()` returns the new typed `SurrealContainerError::BlockedColdBuild` until the surrealdb fork dep in `Cargo.toml` is uncommented — kept off by default to avoid the ~10 min cold surrealdb build for contributors who don't need it. The contract surface is available today; the integration is one Cargo.toml edit + a SurrealQL projection body in `view.rs`. +4 tests; new `lance-graph-contract` dep in surreal_container/Cargo.toml; BLOCKED(C) marker flipped to RESOLVED. ## What this unblocks - **D-MBX-9-IN-impl** — SHIPPED (the contract trait now has a Lance-backed implementor). - **D-MBX-A6-P3** — still queued, BUT Seam #3 (the in-RAM loop) is now in-tree; a downstream session can wire the emit-side without depending on the unmerged jolly branch. - **D-PG-6 (Rubicon kanban VIEW)** — contract slice SHIPPED; impl-side gated on `BlockedColdBuild` flip-on (one Cargo.toml uncomment + projection body). - **Identity-architecture v1 §3 P-SCOPE-CLASSIFY** — solved (the bijection-width fix is deterministic + ancestor-preserving + falsifiable by tests). ## Tests + clippy - lance-graph-contract: **632** (+7 hhtl) - cognitive-shader-driver: **86** (+1 driving-loop) - lance-graph::scheduler: **5** (new module, real Lance tempdir) - surreal_container::view: **4** (new module) All clippy `-D warnings` clean on the new files. Pre-existing lints in lance-graph-ontology / lance-graph-planner / ndarray_bridge.rs are out of session scope. ## Board hygiene (mandatory rule) - LATEST_STATE.md — Contract Inventory PREPEND for the new types. - EPIPHANIES.md — E-UNBLOCK-CASCADE-1: three independent landings converge on one trait surface, closing four queued deliverables in one commit. - AGENT_LOG.md — task-by-task summary with test counts. https://claude.ai/code/session_01Xzyc27Nx3f8WC5KzwfWfjx
…oaOwner cherry-pick + LanceVersionScheduler + SurrealMailboxView (D-PG-6) Lands four tasks from the shortest-unblocking-path list surfaced after PR #497-#501 + the AdaWorldAPI/surrealdb fork bump (lance/lance-index =7.0.0, lancedb =0.30.0, ndarray exact-rev). All four meet at the single contract trait `MailboxSoaView`, closing the cascade in one commit (E-UNBLOCK-CASCADE-1). ## Task 3 — `NiblePath::{from_guid_prefix, prefix}` (zero-dep, foundational) Ontology-side keystone follow-up of PR #498's `classid → ReadMode` LE contract. The 20-nibble `classid(8) | HEEL(4) | HIP(4) | TWIG(4)` prefix overflows the 16-nibble MAX_DEPTH: the deterministic fold drops the canon-reserved high u16 of classid (root-first pack: `classid_lo(4) | HEEL(4) | HIP(4) | TWIG(4)`), returning None when the fold would be lossy. `prefix(d)` is the O(1) ancestor view; `prefix(d).is_ancestor_of(self)` holds for every d ≤ self.depth (the routing-cache view of a deeper class path). +7 tests in `hhtl::tests`; contract lib 619 → 632 green. ## Task 2 — `impl MailboxSoaView + MailboxSoaOwner for MailboxSoA<N>` Cherry-pick of jolly-cori-clnf9 commit 463d71b (integrated-cognitive-planner-v1 §2 Seam #3, +149 LOC). Adds `pub phase: KanbanColumn` field + zero-copy repr(transparent) slice impls (edges_raw, meta_raw) + the in-RAM Rubicon driving-loop test (`test_in_ram_driving_loop_walks_rubicon_to_commit`). The contract spine (#437/#439) now drives an actual loop end-to-end — no surreal, no ractor bus needed for the in-process case. +1 driving-loop test; cognitive-shader-driver lib 85 → 86 green. ## Task 1 — `LanceVersionScheduler` over `VersionedGraph::versions()` D-MBX-9-IN core impl (the CI-gated twin of the contract slice shipped 2026-05-31). Lives in `crates/lance-graph/src/graph/scheduler.rs`. Wraps a `VersionedGraph` + inner `VersionScheduler<S = NextPhaseScheduler>` and exposes: - `drive_once(view, exec)` — read current Lance version, lower to a move - `drive_at_latest(view, exec)` — fold `versions().last()` into a move - `current_dataset_version()` — typed `DatasetVersion` over nodes head Closes `E-SUBSTRATE-IS-THE-SCHEDULER`'s OUT-direction end-to-end. The OUT direction stays propose-not-dispose (R1): returned `KanbanMove` is for the caller's `MailboxSoaOwner::try_advance_phase` to apply. +5 tests with real on-disk tempdir Lance (no mocks). ## Task 4 — `SurrealMailboxView<'a>` (D-PG-6 contract slice) Read-only `MailboxSoaView` adapter the SurrealQL projection populates via `from_columns(...)` — pure zero-copy borrow over the kv-lance scan's byte buffers. Imports `MailboxSoaView` but NOT `MailboxSoaOwner` (compile-time enforcement of `kanban.rs:1-21` "surreal=project-read-only, callcenter=commit"). `read_via_kv_lance()` returns the new typed `SurrealContainerError::BlockedColdBuild` until the surrealdb fork dep in `Cargo.toml` is uncommented — kept off by default to avoid the ~10 min cold surrealdb build for contributors who don't need it. The contract surface is available today; the integration is one Cargo.toml edit + a SurrealQL projection body in `view.rs`. +4 tests; new `lance-graph-contract` dep in surreal_container/Cargo.toml; BLOCKED(C) marker flipped to RESOLVED. ## What this unblocks - **D-MBX-9-IN-impl** — SHIPPED (the contract trait now has a Lance-backed implementor). - **D-MBX-A6-P3** — still queued, BUT Seam #3 (the in-RAM loop) is now in-tree; a downstream session can wire the emit-side without depending on the unmerged jolly branch. - **D-PG-6 (Rubicon kanban VIEW)** — contract slice SHIPPED; impl-side gated on `BlockedColdBuild` flip-on (one Cargo.toml uncomment + projection body). - **Identity-architecture v1 §3 P-SCOPE-CLASSIFY** — solved (the bijection-width fix is deterministic + ancestor-preserving + falsifiable by tests). ## Tests + clippy - lance-graph-contract: **632** (+7 hhtl) - cognitive-shader-driver: **86** (+1 driving-loop) - lance-graph::scheduler: **5** (new module, real Lance tempdir) - surreal_container::view: **4** (new module) All clippy `-D warnings` clean on the new files. Pre-existing lints in lance-graph-ontology / lance-graph-planner / ndarray_bridge.rs are out of session scope. ## Board hygiene (mandatory rule) - LATEST_STATE.md — Contract Inventory PREPEND for the new types. - EPIPHANIES.md — E-UNBLOCK-CASCADE-1: three independent landings converge on one trait surface, closing four queued deliverables in one commit. - AGENT_LOG.md — task-by-task summary with test counts. https://claude.ai/code/session_01Xzyc27Nx3f8WC5KzwfWfjx
v2 — corrected. 1:1 behavioral transcode of ALL Tesseract (layout included); the LSTM forward is the ONLY swapped component, HOSTED on the existing runbook (
.traineddata→ GGUF →embedanythingDTO/candle →ndarrayAMX,bgz_tensorstore, per.grok/NDARRAY_BGZ_EMBEDANYTHING_INTEGRATION.md).Plans
tesseract-rs-transcode-master-v1(v2) — 1:1 everything; LSTM-only swap; D-OCR-NN index + DAG.tesseract-rs-traineddata-gguf-v1—.traineddata→ GGUF → embedanything(candle) host; bgz_tensor weight store.tesseract-rs-layout-transcode-v1— textord/ccstruct 1:1, raw-pointer faithful (the ~200k LOC bulk); minimal Leptonica ops.tesseract-rs-recodebeam-transcode-v1— decoder transcoded over HOSTED posteriors.tesseract-rs-ast-dll-codegen-v1(v2) — clang→IR→Rust via ruff; layout now in-scope (raw-pointer), not replaced.ocr-canonical-soa-integration-v1— OCR token = canonical NodeRow + DeepNSM/CAM-PQ repair.Corrections vs v1
unsafe/raw-pointer is the accepted faithful image of intrusive C++; safe-refactor is a later oracle-gated pass.infer_sequence → [T,C]per-timestep posteriors (D-OCR-15).Retired v1 plans (
traineddata-ndarray,lstm-recodebeam,neural-layout-ocrs) deleted in this branch.Summary by CodeRabbit