Skip to content

contract + ontology: complete the classid → ClassView resolution (UNICHAR adapter, keystone, OGAR→ontology wiring)#534

Merged
AdaWorldAPI merged 5 commits into
mainfrom
claude/happy-hamilton-0azlw4
Jun 18, 2026
Merged

contract + ontology: complete the classid → ClassView resolution (UNICHAR adapter, keystone, OGAR→ontology wiring)#534
AdaWorldAPI merged 5 commits into
mainfrom
claude/happy-hamilton-0azlw4

Conversation

@AdaWorldAPI

Copy link
Copy Markdown
Owner

Summary

Completes the classid → ClassView resolution surface of the OGAR Core. After this PR, all three "classid → X" axes resolve from a canonical-node GUID:

Axis Resolver Status
read-mode / value-schema classid_read_mode (via ocr.rs) already wired
method composition codegen_manifest::methods_for (the harvested has_function manifest) this PR (keystone)
ontology shape (fields/labels/template/DOLCE) OntologyRegistry::class_id_for_guidRegistryClassView this PR (wiring)

This dovetails with the just-merged #533 (virtually_overrides as a computed ClassView relation) and the E-ODOO-CORE-FIRST-STRUCTURAL direction (#530): all Core-side resolution, no new flat-ndjson predicates.

What's in it (4 commits)

1. UNICHAR UTF-8 codec — second byte-parity adapter (lance-graph-contract::unichar)
Transcode of Tesseract's ccutil/unichar.cpp: utf8_step (const-fn 256-entry lead-byte table) + utf8_to_utf32. 268/268 byte-identical to a libtesseract oracle (256 exhaustive utf8_step values + 12 decode rows). Proves the transcode generalizes to a 2nd class, and shows why a faithful transcode is mandatory: Tesseract maps 0xC0/0xC1 to step 2 and decodes overlong NUL C0 80[0], which core::str::from_utf8 rejects — a native-UTF-8 shortcut would silently diverge (pinned by from_utf8_rejects_what_tesseract_accepts). +8 tests.

2. UniCharSet keystone — classid → ClassView → adapter (lance-graph-contract::unicharset_adapter)
Composes the proven UniCharSet adapter through the OGAR Core's three movable parts — steps 2–3 of PROBE-OGAR-ADAPTER-UNICHARSET. invoke_unicharset(registry, store, classid, call): (1) ClassView composition gate via methods_for, (2) classid-keyed content-store tier (UniCharSetStore; the adapter holds no state), (3) the proven leaf. Byte-parity is inherited; the keystone proves the dispatch path is faithful and there is no Core gap (the doctrine's iron guard holds). +5 tests. Flips the core-first doctrine to proven end-to-end.

3. OGAR → lance-graph-ontology wiring (OntologyRegistry::class_id_for_guid)
Closes a gap an audit this session found: NiblePath::from_guid_prefix (canon GUID→NiblePath fold) and the registry's entity_type ↔ NiblePath bijection were both built with zero callers. class_id_for_guid(&NodeGuid) -> Option<ClassId> lays the join (from_guid_prefix(guid)? → entity_type_of(path)), so a node carrying a classid resolves its ontology class → RegistryClassView. Round-trip test pins the classid_lo ↔ entity_type consistency; zero-fallback + lossy-fold refusal hold. +1 test.

4. Board post-merge hygiene for #521 (PR_ARC + LATEST_STATE)
The post-merge governance entry owed for #521 (the contract MethodSig + UniCharSet PR).

Tests / gates

  • lance-graph-contract: 658 lib tests green; clippy --all-targets -D warnings clean; fmt clean.
  • lance-graph-ontology: 16 class/bijection/wiring tests green; registry.rs clippy-clean + fmt clean.
  • Byte-parity (in-env): UNICHAR 268/268, UniCharSet 112/112 vs a libtesseract oracle.

Notes

🤖 Generated with Claude Code
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1


Generated by Claude Code

claude added 4 commits June 18, 2026 07:34
Records the merged #521 (lance-graph-contract C++ codegen target MethodSig +
UniCharSet content store) per the Mandatory Board-Hygiene Rule's post-merge
step. PR_ARC_INVENTORY prepend (Added/Locked/Deferred/Docs/Confidence) +
LATEST_STATE narrative entry + "Recently Shipped PRs" table row.

Captures the PROBE-OGAR-ADAPTER-UNICHARSET FINDING: the full transcode
pipeline (ruff ruff_cpp_spo harvest -> reassemble -> ruff_cpp_codegen -> these
contract types) produces a UniCharSet byte-identical 112/112 to the libtesseract
oracle on real eng data, proving the core-first transcode doctrine end-to-end.
Pairs with ruff #20. Merge commit 620bd8e.

Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Transcode of Tesseract's `ccutil/unichar.cpp` (the UTF-8 layer UNICHARSET
sits on top of) as a pure-Rust, zero-leptonica adapter — the second leaf
through the harvest -> reassemble -> codegen pipeline after UniCharSet,
proving PROBE-OGAR-ADAPTER-UNICHARSET generalizes beyond one class.

New `unichar` module:
- `utf8_step(lead) -> u8`: const-fn transcription of the 256-entry lead-byte
  table (unichar.cpp:143). 1/2/3/4 for legal leads, 0 for continuation bytes
  and 0xF8.. .
- `utf8_to_utf32(bytes) -> Option<Vec<i32>>`: mirrors UNICHAR::UTF8ToUTF32
  (unichar.cpp:220) — lead-byte validation only, None on illegal lead, the
  offset-decode of first_uni (unichar.cpp:105) inlined.

Byte-parity: `examples/unichar_dump.rs` vs a libtesseract UNICHAR oracle is
268/268 identical — all 256 utf8_step values (EXHAUSTIVE) + 12 utf8_to_utf32
corpus rows.

Why a faithful transcode and not core::str: Tesseract maps 0xC0/0xC1 to step 2
and decodes the overlong NUL `C0 80` to [0]; core::str::from_utf8 rejects both.
A native-UTF-8 shortcut would silently diverge from the oracle. The
`from_utf8_rejects_what_tesseract_accepts` test pins the gap.

Additive, zero-dep, pure text. +8 tests; 653 contract lib green; clippy
--all-targets -D warnings + fmt clean. Board: LATEST_STATE D-UNICHAR-1 +
EPIPHANIES E-CPP-PARITY-2.

Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…dapter

Wires the proven UniCharSet adapter (E-CPP-PARITY-1, byte-parity 112/112)
through the OGAR Core's three movable parts — steps 2-3 of
PROBE-OGAR-ADAPTER-UNICHARSET, which prior work left as "mechanical wiring,
conjectured." This proves the core-first transcode doctrine END-TO-END for the
unicharset class, not just for one leaf's bytes.

New `unicharset_adapter` module:
- `UniCharSetStore` trait: the classid-keyed content-store tier (consumer-
  provided, dependency-inverted like ClassView). The adapter holds NO state;
  the variable-length bijection rides this tier (I-VSA-IDENTITIES).
- `UniCharCall` (DO-in) / `UniCharOut` (DO-out, zero-copy borrow) / `DispatchError`.
- `invoke_unicharset(registry, store, classid, call)` — the keystone:
    1. ClassView composition gate: codegen_manifest::methods_for(registry,
       classid) must list the method (the harvested has_function manifest),
       else MethodNotComposed (zero-fallback: unconfigured classid composes
       nothing).
    2. content-store tier: UniCharSetStore::unicharset(classid).
    3. adapter leaf: UniCharSet::{id_to_unichar, unichar_to_id}.

Byte-parity is inherited from UniCharSet; the keystone proves the dispatch path
is faithful (the NULL->space edge survives it), the gate works, and there is NO
Core gap (the doctrine's iron guard holds with zero strain). Not routed through
the heavy OrchestrationBridge cross-subsystem router; this is the adapter-
invocation primitive a UnifiedStep calls.

Additive, zero-dep. +5 tests; clippy --all-targets -D warnings + fmt clean.
Board: LATEST_STATE D-UNICHARSET-KEYSTONE; EPIPHANIES E-CPP-KEYSTONE-1;
core-first-transcode-doctrine.md steps 2-3 marked wired.

Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Closes the OGAR -> lance-graph-ontology gap an audit this session found:
NiblePath::from_guid_prefix (the canon GUID->NiblePath fold) and the registry's
entity_type <-> NiblePath bijection were BOTH built with ZERO callers -- the two
halves of the bridge forged but never chained.

OntologyRegistry::class_id_for_guid(&NodeGuid) -> Option<ClassId> lays the join:
from_guid_prefix(guid)? -> entity_type_of(path). A node row carrying a classid
now resolves its ontology class (entity_type / ClassId), which RegistryClassView
already turns into the class shape (fields/labels/template/DOLCE). No new
predicate, no new type -- a method composing two existing surfaces (aligns with
E-ODOO-CORE-FIRST-STRUCTURAL: Core-side resolution, not an SPO bolt-on).

Round-trip test pins the classid_lo <-> entity_type consistency the audit
flagged: register_class_path(t, from_guid_prefix(g)) => class_id_for_guid(g) ==
Some(t); zero-fallback (unbound GUID -> None); lossy-fold refusal (high classid
u16 -> None). 16 ontology tests green; registry.rs clippy-clean + fmt clean.

Board: LATEST_STATE wiring entry; EPIPHANIES E-OGAR-ONTOLOGY-WIRED-1; TECH_DEBT
TD-ONTOLOGY-LINT (pre-existing crate clippy debt, present on main, un-CI-gated).

Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 17 minutes and 36 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 02b56819-1a2f-472a-86a1-b259b3ad26ea

📥 Commits

Reviewing files that changed from the base of the PR and between e55a8cf and 5ed5a1f.

📒 Files selected for processing (10)
  • .claude/board/EPIPHANIES.md
  • .claude/board/LATEST_STATE.md
  • .claude/board/PR_ARC_INVENTORY.md
  • .claude/board/TECH_DEBT.md
  • .claude/knowledge/core-first-transcode-doctrine.md
  • crates/lance-graph-contract/examples/unichar_dump.rs
  • crates/lance-graph-contract/src/lib.rs
  • crates/lance-graph-contract/src/unichar.rs
  • crates/lance-graph-contract/src/unicharset_adapter.rs
  • crates/lance-graph-ontology/src/registry.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7834828a72

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if step == 0 {
return None; // illegal lead
}
out.push(first_uni(&bytes[i..]));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject truncated UTF-8 sequences

step only proves that the lead byte is not illegal, but this path still calls first_uni when the slice does not contain all bytes for that character. For malformed OCR input ending in a multibyte lead such as [0xC3], first_uni consumes the one available byte, subtracts the 2-byte offset, and pushes a negative/fabricated codepoint before advancing past the end. Since this public byte-parity decoder accepts length-delimited slices, add a remaining-length check (and return None or otherwise mirror the oracle) before decoding.

Useful? React with 👍 / 👎.

)

utf8_to_utf32 only checked the lead byte was legal; for a truncated trailing
multibyte lead (e.g. [0xC3], or a 3-byte lead with 2 bytes present) it still
called first_uni on the short slice, where take(len) decodes from the partial
bytes and the offset subtraction fabricates a codepoint ([0xC3] -> Some([64])).
The C++ UTF8ToUTF32 reads past its buffer here (UB on length-delimited input);
this length-delimited decoder now rejects it (i + step > len -> None) instead of
fabricating.

Byte-parity unchanged (268/268 vs the libtesseract oracle — the corpus has no
truncated cases). +1 test (truncated_trailing_multibyte_is_rejected); docs on
utf8_to_utf32 + first_uni updated. clippy --all-targets -D warnings + fmt clean.

Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
@AdaWorldAPI AdaWorldAPI merged commit 393e580 into main Jun 18, 2026
6 checks passed
AdaWorldAPI pushed a commit that referenced this pull request Jun 18, 2026
PR_ARC_INVENTORY #534 entry + LATEST_STATE "Recently Shipped PRs" row for the
merged #534 (classid → ClassView resolution surface: UNICHAR adapter, keystone,
OGAR→ontology wiring). Per the Mandatory Board-Hygiene Rule's post-merge step.

Co-Authored-By: Claude <noreply@anthropic.com>
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants