Skip to content

docs(plans)+test: rebaseline #497 OCR plans to #498 + gating probes (5-specialist framing)#500

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/wonderful-hawking-lodtql
Jun 16, 2026
Merged

docs(plans)+test: rebaseline #497 OCR plans to #498 + gating probes (5-specialist framing)#500
AdaWorldAPI merged 1 commit into
mainfrom
claude/wonderful-hawking-lodtql

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Follow-up to the merged #497 (OCR-transcode plans) + #498 (helix Signed360 + GUID keystone). Five specialists (cascade-architect / family-codec-smith / palette-engineer / dto-soa-savant / truth-architect) framed the merged #497 plans against the post-#498 substrate; this PR applies the corrections + specs the gating probes. Consolidated framing: EPIPHANIES.md E-OCR-PLAN-DRIFT-1.

Two showstoppers the framing caught

  1. "reversible without a hash" is false in code — no residue→rank inverse exists; deepnsm/vocabulary.rs is a stored string-table keyed by rank, every decode takes a known rank as input. Reframed: node = identity → content-store lookup; codebook = repair signal (I-VSA-IDENTITIES), not reversible text.
  2. "Morton-tile stacked-pyramid perturbation-shader cascade" does not exist (0 hits; Morton is explicitly rejected for Hilbert). Purged from 3 deliverables → real primitives (framebuffer::build_mipmap_pyramid / splat3d/depth_cascade / CAKES ladder).

Plan corrections (all 7 #497 docs)

  • HelixResidue 48 B → 6 B everywhere (it's a stored Signed360 place index, not a 48-byte field); budgets/carve rebaselined (Full 154→112, [32,186)[32,144)); headers post-#496post-#498.
  • §0 anti-invention tripwires: dropped ValueSchema::Ocr (ride Full/Compressed or mint a class); de-overloaded Meta (confidence→Energy, provenance→Plasticity, OOV→content-store); flagged TurbovecResidue as the edge codec (rank-only) — glyph→word uses DeepNSM CamCodes.
  • D-OCR-50 marked PARTIALLY SHIPPED (LayoutBlock::to_node_row landed in feat(contract): GUID decode→read-mode keystone + helix Signed360 right-size + OCR→NodeRow transcode #498) — re-cast as "extend", not "build".
  • HHTL = a coherent layout-address trie, but classid-marked as layout-addressed (forgoing the similarity-basin reading).
  • master critical path 42 → 5342 → {50,51} → 53resolves the open CodeRabbit Major on docs(plan): Tesseract → tesseract-rs 1:1 transcode (LSTM hosted via embedanything) — v2 #497.
  • bit-repro caveat: DeepNSM repair is f32 → carve out of frozen-mode + pin floor_version.

Probes (gate the unmeasured claims)

New ocr-probes-v1.md specs 4 gating probes — OCR-RT (residue→rank round-trip), OCR-DET (repair determinism), OCR-POST (GGUF posterior parity), OCR-SCHEMA (ValueSchema fit) — plus 3 cascade perf probes. The big claims (int8-exact LSTM, bit-reproducible diff, ~200k-LOC 1:1 layout) are CONJECTURE until these run.

  • OCR-SCHEMA shipped as a contract test (ocr::tests::ocr_schema_fit_rides_existing_preset_no_new_variant) — proves OCR rides Compressed/Full, no new enum variant.

Board: EPIPHANIES.md E-OCR-PLAN-DRIFT-1, AGENT_LOG.md entry. contract lib green; clippy/fmt clean.

https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo


Generated by Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Rebaseline and correct OCR transcode planning; resolved identified architectural inconsistencies and migration issues.
  • Documentation

    • Updated integration plans to reflect post-release architecture updates.
    • Added new OCR gating probes specification document for staged validation framework.
    • Updated agent logs and epiphany records documenting resolved issues and corrected plans.
  • Tests

    • Added validation test confirming OCR functionality integrates with existing schema presets without introducing new variants.

…5-specialist framing)

Five specialists (cascade / family-codec / palette / dto-soa / truth-architect)
framed the merged #497 OCR-transcode plans against the post-#498 substrate. Two
showstoppers + 6-way drift; all 7 plans corrected:

- HelixResidue 48 B → 6 B everywhere (a stored Signed360 index, not a 48-byte field);
  budgets/carve rebaselined (Full 112, [32,144)); headers #496#498.
- "Morton-tile stacked-pyramid perturbation-shader" purged (does not exist; Morton
  rejected for Hilbert) → real primitives (mipmap pyramid / HHTL depth-cascade / CAKES).
- "reversible without a hash" reframed: no residue→rank inverse exists; node =
  identity → content-store lookup, codebook = repair signal (I-VSA-IDENTITIES).
- §0 tripwires: no ValueSchema::Ocr variant (ride Full/Compressed); Meta de-overloaded
  (confidence→Energy, provenance→Plasticity, OOV→content-store); TurbovecResidue is the
  edge codec, glyph→word uses DeepNSM CamCodes.
- master critical path 42→53 becomes 42→{50,51}→53 (resolves the open #497 CodeRabbit Major).

New ocr-probes-v1.md specs the 4 gating probes (OCR-RT/DET/POST/SCHEMA) for the
unmeasured claims (int8-exact LSTM, bit-reproducible diff, 200k-LOC 1:1 layout).
OCR-SCHEMA shipped as a contract test proving OCR rides an existing preset.
EPIPHANIES E-OCR-PLAN-DRIFT-1 + AGENT_LOG entry.

contract lib green; fmt clean.

https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Rebaselines seven OCR transcode plan documents from post-#497 to post-#498 architecture, removing two showstoppers (false reversibility rationale and nonexistent Morton-tile cascade). Adds ocr-probes-v1.md with four gating probes and three cascade-performance probes. Ships the OCR-SCHEMA deliverable as a new contract test in crates/lance-graph-contract/src/ocr.rs.

Changes

OCR Transcode Rebaseline and OCR-SCHEMA Shipment

Layer / File(s) Summary
OCR-SCHEMA contract test
crates/lance-graph-contract/src/ocr.rs
Adds ocr_schema_fit_rides_existing_preset_no_new_variant asserting HelixResidue, TurbovecResidue, EntityType, and Fingerprint are already present in ValueSchema::Compressed, that ValueSchema::Full covers all OCR-touched tenants, and that both schemas are layout-preserving—proving no new ValueSchema::Ocr variant is needed.
New gating probe specification
.claude/plans/ocr-probes-v1.md
Introduces the staged probe queue: four primary gating probes (OCR-RT, OCR-DET, OCR-POST, OCR-SCHEMA) with explicit claims, pass/fail criteria, and costs, plus three secondary cascade-performance probes and a DAG honesty section establishing OCR-RT as the highest-leverage gate before funding the full layout transcode.
ocr-canonical-soa-integration-v1.md rewrite
.claude/plans/ocr-canonical-soa-integration-v1.md
Updates front-matter to post-#498 (ENVELOPE_LAYOUT_VERSION = 2, Signed360 48B→6B); revises HHTL/OCR class addressing to bootstrap on classid = 0x0000_0000; rewrites ValueSchema preset section to forbid a new ValueSchema::Ocr variant and correct tenant semantics; replaces Hamming with L1 CAM-PQ cascade repair; tightens bit-reproducibility harness; rewrites D-OCR-50/51 deliverables.
Supporting plan and board updates
.claude/plans/soa-centroid-attention-field-synthesis-v1.md, .claude/plans/tesseract-rs-ast-dll-codegen-v1.md, .claude/plans/tesseract-rs-transcode-master-v1.md, .claude/board/AGENT_LOG.md, .claude/board/EPIPHANIES.md
Removes Morton-tile framing from the centroid attention field plan and adds CONJECTURE gate; advances post references to #498 and adds CONJECTURE qualifications on determinism in codegen and master transcode plans; records the rebaseline outcome, showstoppers found, and next steps in agent log and epiphanies board.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐇 Hop hop, the plans are set straight,
No Morton tiles, no phantom cascade gate,
Six bytes of Signed360 lead the way,
OCR-SCHEMA shipped green today!
The probes stand guard before the big transcode—
This rabbit won't fund work on a broken road. 🗺️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: rebaseling OCR plans (#497) to post-#498 architecture with gating probes and a 5-specialist review framing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24d3fd843a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .claude/plans/ocr-canonical-soa-integration-v1.md

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/plans/ocr-probes-v1.md:
- Around line 30-33: The 99% pass threshold in the OCR-RT probe definition
allows lossy residue→rank mappings to incorrectly pass the reversibility gate.
Replace the threshold-based pass criterion (≥ 99%) with an exact requirement
that fails on any miss, ensuring only perfect round-trips satisfy this gate. If
tolerance is desired, separate it into a distinct quality probe rather than
mixing it into the reversibility assertion, so the corrected plans' claim about
text-as-identity and codebook-as-repair-signal can be properly validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2324b59e-e828-4fee-8cd8-4c7e02a16568

📥 Commits

Reviewing files that changed from the base of the PR and between 4acea26 and 24d3fd8.

📒 Files selected for processing (8)
  • .claude/board/AGENT_LOG.md
  • .claude/board/EPIPHANIES.md
  • .claude/plans/ocr-canonical-soa-integration-v1.md
  • .claude/plans/ocr-probes-v1.md
  • .claude/plans/soa-centroid-attention-field-synthesis-v1.md
  • .claude/plans/tesseract-rs-ast-dll-codegen-v1.md
  • .claude/plans/tesseract-rs-transcode-master-v1.md
  • crates/lance-graph-contract/src/ocr.rs

Comment thread .claude/plans/ocr-probes-v1.md
@AdaWorldAPI AdaWorldAPI merged commit adbcbdc into main Jun 16, 2026
6 checks passed
AdaWorldAPI pushed a commit that referenced this pull request Jun 16, 2026
…gate is exact

Two review threads on the merged #500:
- codex P2: "post-POC OCR rides Compressed" was wrong — Compressed lacks
  Energy+Plasticity, so the schema-gated transcode would silently drop confidence
  (→Energy) and repair-provenance (→Plasticity). Corrected: OCR rides Full (the only
  preset with the codec residues AND the hot lifecycle columns). The OCR-SCHEMA
  contract test now asserts Compressed lacks Energy/Plasticity (regression guard).
- CodeRabbit Major: OCR-RT reversibility gate tightened 99% → 100% exact (a lossy
  residue→rank map is NOT "reversible"; tolerance moved to a separate quality probe).

https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo
AdaWorldAPI added a commit that referenced this pull request Jun 16, 2026
fix(plans)+test: #500 review — OCR rides Full not Compressed; OCR-RT gate exact
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants