lance-graph-contract: C++ codegen target (MethodSig) + UniCharSet content store#521
Conversation
|
Warning Review limit reached
More reviews will be available in 46 minutes and 45 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughTwo new public modules are added to the Changeslance-graph-contract new modules and planning docs
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e82f202cde
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
21f325f to
fbb7b11
Compare
Records the full 5-consolidate + 3-brutal-critique council on the Tesseract C++->Rust transcode next-move decision. 5 consolidation agents: core-first-architect (TARGETS-CORE, found the ocr.rs classid-keyed-registry precedent), container-architect (ADDITIVE-CONFIRMED, zero locked-node impact), adapter-shaper (THIN-CONFIRMED, scoped to old_style_included_), truth-architect (PREMATURE), integration-lead (SEQUENCE C->A->D->B). 3 brutal critics converge: the original "self-authored golden + run-twice determinism" gate is a tautology (truth-architect, brutally-honest-tester, adk-behavior-monitor all independently). Replacement gate, named by all three: a round-trip structural-equivalence falsifier (expand -> ndjson -> from_ndjson -> reassemble, assert ~= original) that is immune to the harvest-freshness drift (live 2032 triples vs committed 880) and CAN fail (real UNICHARMAP vs UNICHARSET unichar_to_id collision). baton-handoff-auditor: CATCH-LATENT, no CATCH-CRITICAL; bakes in the design constraint that reassembly derives per-overload identity from the index-prefixed has_param_type triples, never the (params) IRI suffix (no clean inverse for comma-bearing templated types). Final 8/8 decision: execute re-scoped C-FIRST. OCR-SCHEMA mis-cite dropped; Frankenstein-refusal becomes an honest deny-list test; PARITY: UNRUN honesty markers required; byte-parity promotion stays operator-gated. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…er green The re-scoped C-FIRST gate is built in ruff: reassemble() (generator stage 1, inverse of expand) + the round-trip falsifier that replaced the council- rejected self-golden. CPP-REASSEMBLE-RT runs green on real Tesseract ccutil (67 classes; class-set preservation + idempotence). The falsifier found a real bug class — 19/67 const-overload IRI collisions (GAP-CONST-OVERLOAD, queued). truth-architect's PREMATURE flag resolved: real measurement on real data. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Records the step-2 emitter council. Major reframe (3/5 agents independently): the plan's premise was false — ClassView is a field/render vocabulary with NO method-resolution surface (has_function does not appear in lance-graph-contract). 5-agent consolidation: manifest-first cut (not stubs/bodies), additive (container ADDITIVE-CONFIRMED), placement = a new ruff_cpp_codegen crate that emits TEXT only (no ruff->lance-graph compile edge), D (const-overload fix) before the emitter, PARITY markers + Frankenstein deny-list. Unresolved fork handed to the 3-brutal panel: (A) mint a minimal MethodSig POD + classid_methods LazyLock registry (method-axis sibling of classid_read_mode, EXTEND-CORE but additive) vs (B) ride the SHIPPED codegen_spine::TripletProjection + roundtrip_eq (no new type; warns MethodSig re-implements CppMethod and that re-emitting the harvest is a tautology). codegen_spine.rs verified real (632 lines, TripletProjection trait + roundtrip_eq fn + Genericity enum). Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…are IRI) The brutal panel resolved the A-vs-B fork as a false binary (honest-tester): B's codegen_spine::roundtrip_eq pattern is the build-time GATE, A's MethodSig shape is the emitted-text target, A's runtime registry is a deferred additive EXTEND-CORE. baton-auditor's decisive correction (confirmed by honest-tester): the const-overload merge is UPSTREAM in expand's (s,p,o) dedup, so the round-trip cannot observe it — it is a fixed point. Therefore D (cv-aware method IRI) must run FIRST, unanimously (behavior-monitor's emitter-first rationale refuted). Final order: D (cv-aware IRI, autonomous correctness fix — adds no predicate, falsifier CPP-REASSEMBLE-RT 48/67 -> 67/67) -> emitter scaffold (ruff_cpp_codegen, emit-text-only, ruff-side round-trip gate, no classid mint) -> MethodSig EXTEND-CORE in lance-graph (additive) -> wire/byte-parity (operator). Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…nst assumption Records D's outcome: the cv-aware method IRI + two round-trip-metric fixes (cpp_projection dedup; is_const in the methods sort key) closed the entire collision tail (48/67 -> 67/67, now a hard gate). The falsifier overturned the council's "19/67 = const overloads" inference: only 3 were const; 13 were benign duplicate template_instantiates, 2 duplicate-harvested methods, 1 a sort-order artifact. GAP-CONST-OVERLOAD resolved. Next: emitter scaffold (ruff_cpp_codegen) then the MethodSig EXTEND-CORE. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…real corpus The C-FIRST step-2 emitter shipped in ruff: project(ModelGraph)->MethodSig manifest + render to lance-graph-naming Rust text (emit-text-only, no lance-graph edge), gated by a decompile==expand signature-plane round-trip with teeth (dropped-method test proves it fails). CPP-CODEGEN-RT on real ccutil: 67 classes, 857 methods -> 124 KB MethodSig manifest, round-trip holds, PARITY markers + Frankenstein deny-list present. Next (operator-gated, additive canon growth): the MethodSig EXTEND-CORE in lance-graph-contract. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…ompile target
The MethodSig EXTEND-CORE (C-FIRST step 2's deferred-runtime-registry piece).
A new codegen_manifest module:
- MethodSig: the dispatch-relevant C++ method signature in a const-constructible
shape (all fields &'static: name, params: &'static [&'static str], ret,
is_const, is_static, overrides). It is the exact literal ruff_cpp_codegen::render
emits, so the generated text now has a real compile target. The &'static shape
is load-bearing: class_view::FieldRef is String-backed and cannot appear in a
const; MethodSig is the method-axis sibling that can.
- ClassMethods{classid, methods} + methods_for(registry, classid): the
registry-entry type + pure zero-fallback lookup (unregistered classid -> empty
slice). classid is bound OGAR-side, never minted here; the runtime
classid->methods registry DATA is generated downstream (consumer repo), NOT
stored here (honest-tester's "defer the runtime registry").
Additive (container-architect ADDITIVE-CONFIRMED): a sibling module, zero
NodeRow/ValueTenant/ValueSchema/stride/ENVELOPE_LAYOUT_VERSION impact. Body-shaping
flags (pure-virtual/constexpr/noexcept/operator/requires) are out of scope.
Board hygiene: LATEST_STATE Contract Inventory updated same commit
(D-CPP-CODEGEN-1). +2 tests (const-constructibility proof + zero-fallback);
640 contract lib green; clippy -D warnings clean.
C-FIRST: D + emitter scaffold + MethodSig EXTEND-CORE all landed; the in-env arc
is complete. Remaining is operator-gated (tesseract-rs wiring + byte-parity).
Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…e's Rust side The deferred Option A content-store tier, built (operator's "keep building here" + the leptonica-is-an-install-not-a-transcode epiphany). New unicharset module: UniCharSet (deepnsm::Vocabulary-shaped: reverse id->unichar + lookup unichar->id), load_from_str/load_from_file parsing the .unicharset text format (line 1 = count; first whitespace token per line = unichar; id = position; property columns ignored — the old_style_included_ plain-table scope), id_to_unichar/unichar_to_id (the two adapter leaves), and dump() rendering the <id>\t<unichar> table matching the C++ oracle. This is the Rust side of PROBE-OGAR-ADAPTER-UNICHARSET. The unicharset path is pure text parsing — ZERO leptonica (never touches Pix) — so it builds and unit-tests in-env with no C deps. leptonica is only an *install* (a link dep of the C++ oracle harness), never a transcode and never in the Rust path. Byte parity is now one `diff`: combine_tessdata to get eng.unicharset, a ~10-line libtesseract harness dumps id_to_unichar, `cargo run --example unicharset_dump` dumps the Rust side, diff. Byte-identical => CONJECTURE -> FINDING. Additive (sibling content-store module, zero NodeRow/tenant impact). Board: LATEST_STATE Contract Inventory (D-UNICHARSET-1). +4 tests + the unicharset_dump example; 644 contract lib green; clippy -D warnings + fmt clean. The classid->&UniCharSet LazyLock resolver (OGAR wiring) remains the follow-up. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…ICHARSET FINDING leptonica installed in-env (apt-get — an install, not a transcode), so the byte-parity probe RAN and passed. UniCharSet dump vs a C++ UNICHARSET FFI oracle on the real eng.lstm-unicharset: 112/112 byte-identical. The falsifier did its job: the documented-format parser matched 111/112; the oracle named the one real convention it missed — the NULL file-token IS the space unichar (unicharset.cpp:882 remaps "NULL" -> " "). One-line fix (load_from_str maps "NULL" -> " "), re-diff, 0 differences. NOT a Core gap. CONJECTURE -> FINDING for the unicharset adapter: the variable-length bijection rides the content-store tier with no Core gap and is byte-exact with libtesseract. Doctrine flipped (core-first-transcode-doctrine.md falsifier RESULT); EPIPHANIES E-CPP-PARITY-1; plan BYTE-PARITY ACHIEVED. The classid->ClassView->UnifiedStep dispatch wiring is mechanical remainder; the lookups themselves are now proven. +1 test (null_token_maps_to_space); contract lib green; clippy + fmt clean. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
fbb7b11 to
dce9961
Compare
Records the merged #521 (lance-graph-contract C++ codegen target MethodSig + UniCharSet content store) per the Mandatory Board-Hygiene Rule's post-merge step. PR_ARC_INVENTORY prepend (Added/Locked/Deferred/Docs/Confidence) + LATEST_STATE narrative entry + "Recently Shipped PRs" table row. Captures the PROBE-OGAR-ADAPTER-UNICHARSET FINDING: the full transcode pipeline (ruff ruff_cpp_spo harvest -> reassemble -> ruff_cpp_codegen -> these contract types) produces a UniCharSet byte-identical 112/112 to the libtesseract oracle on real eng data, proving the core-first transcode doctrine end-to-end. Pairs with ruff #20. Merge commit 620bd8e. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Records the merged #521 (lance-graph-contract C++ codegen target MethodSig + UniCharSet content store) per the Mandatory Board-Hygiene Rule's post-merge step. PR_ARC_INVENTORY prepend (Added/Locked/Deferred/Docs/Confidence) + LATEST_STATE narrative entry + "Recently Shipped PRs" table row. Captures the PROBE-OGAR-ADAPTER-UNICHARSET FINDING: the full transcode pipeline (ruff ruff_cpp_spo harvest -> reassemble -> ruff_cpp_codegen -> these contract types) produces a UniCharSet byte-identical 112/112 to the libtesseract oracle on real eng data, proving the core-first transcode doctrine end-to-end. Pairs with ruff #20. Merge commit 620bd8e. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Records the merged #521 (lance-graph-contract C++ codegen target MethodSig + UniCharSet content store) per the Mandatory Board-Hygiene Rule's post-merge step. PR_ARC_INVENTORY prepend (Added/Locked/Deferred/Docs/Confidence) + LATEST_STATE narrative entry + "Recently Shipped PRs" table row. Captures the PROBE-OGAR-ADAPTER-UNICHARSET FINDING: the full transcode pipeline (ruff ruff_cpp_spo harvest -> reassemble -> ruff_cpp_codegen -> these contract types) produces a UniCharSet byte-identical 112/112 to the libtesseract oracle on real eng data, proving the core-first transcode doctrine end-to-end. Pairs with ruff #20. Merge commit 620bd8e. Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
The Core-side of the Tesseract C++→Rust transcode — the types ruff's
ruff_cpp_codegentargets, plus the byte-parity probe's Rust side. All additive tolance-graph-contract: zeroNodeRow/ValueTenant/ValueSchema/stride/ENVELOPE_LAYOUT_VERSIONimpact (container-architect ADDITIVE-CONFIRMED).codegen_manifest—MethodSig: the&'static-backed,const-constructible method-signature type the generated text names (the method-axis sibling ofClassView's field projection;FieldRefisString-backed and can't appear in aconst, which is exactly why this is a new type).ClassMethods+methods_for: the registry entry + zero-fallback lookup; classid is bound OGAR-side, the data is generated downstream (no runtime registry stored here).unicharset—UniCharSet(deepnsm::Vocabulary-shaped:reverseid→unichar +lookupunichar→id),.unicharsetparser +id_to_unichar/unichar_to_id+dump(). The Rust side ofPROBE-OGAR-ADAPTER-UNICHARSET: pure text parsing, zero leptonica (the unicharset path never touchesPix), so it builds + unit-tests with no C deps. Theunicharset_dumpexample renders the oracle-shape table so byte-parity is a singlediff.Board hygiene in-commit: LATEST_STATE Contract Inventory (D-CPP-CODEGEN-1, D-UNICHARSET-1). Plan
transcode-extend-core-probe-v1.mdcarries the full 5-consolidate + 3-brutal council record and the C-FIRST D → emitter → EXTEND-CORE arc. 644 contract lib tests green; clippy-D warnings+ fmt clean.Pairs with the ruff PR (the harvester/codegen that produces what these types consume).
🤖 Generated with Claude Code
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Generated by Claude Code
Summary by CodeRabbit
New Features
Documentation