docs+probe: the agnostic lazy world-spine — addressing vision, locality probe (PASS), markov_soa→AriGraph, EW64-as-AriGraph by AdaWorldAPI · Pull Request #444 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-05-31T15:56:54Z

Summary

The consolidated output of a long design session on how to compress Wikidata into a lazy-loading, foveated, address-unified world-spine — converged to a single addressing thesis, validated by a real measurement, with one SoC code-move and an empirical probe shipped. Mostly docs + one runnable probe + one small refactor; no behavioral change to shipped runtime.

The one idea (the docs)

A card stores the surprise; the deck stores the expectation. Meaning = deck ⊗ delta — the free-energy framing, applied to both the key (address) and the value (content). The endgame is inherited nothingness: identity (which one) is 27 bits irreducible; description (what it is) is ~0 bits for the modal class member (inherited from the frozen OGIT deck). A typical entity stores nothing.

knowledge/delta-card-addressing-integration-map.md — the converged map (partition-as-address, 27-bit floor with ~0-bit row, sparse radix range-delegation, I/P/B-frames over Lance versioning, RISC compose-not-materialize, frozen-ISA).
knowledge/agnostic-lazy-world-spine.md — the tiered substrate (cold Lance ◄─NiblePath─► hot mailbox-SoA ◄── OGIT/DOLCE cache).
knowledge/owl-dolce-hhtl-compartments-aerial-fed.md, knowledge/splat-codebook-aerial-wikidata-compression.md — the domain/aerial seams.
plans/wikidata-lazy-spine-hydration-v1.md — the 9 D-LWS deliverables for the one missing runtime piece (the NiblePath-keyed tiered hydration manager).

The measurement — probe #1 PASS (the payoff)

crates/jc/examples/ontology_locality_probe.rs (zero-dep, hand-rolled TTL scan, reuses the splat_louvain_modularity machinery) RUN on the real on-disk ontologies (DOLCE-Ultralite, schema.org, Odoo, PROV-O, QUDT, OWL-Time):

LOCALITY    = 98.61%  (1207/1224 subClassOf edges intra-basin)   [claim was "~90%"]
FAN-OUT max = 3       (≤16 ✓; 1121 classes have exactly 1 parent-basin)
MODULARITY  = 0.3246  (>0.3 = clear community structure)
VERDICT     = PASS

⇒ On real frozen-ISA ontology structure, the 16-bit local references + the ≤16 family frontier are real — the "inherited nothingness" addressing is not a hope. Honest caveat (in the probe's own verdict): real ontologies (~10³ classes), NOT Wikidata (~10⁸) — the Wikidata P279 run remains the open probe. Conjecture → FINDING on real ontologies.

The code move — markov_soa → AriGraph, vocabulary-agnostic (SoC)

markov_soa (the Markov wave — a windowed SoA projection) was first authored in deepnsm, which made a core runtime concern depend on a linguistics sensor (layer inversion). Moved to lance-graph::graph::arigraph::markov_soa and made vocabulary-agnostic:

SpoRanks { s, p, o: u16 } are opaque — the SoA row carries no language; vocabulary is a late-resolved class property.
match takes an injected Fn(u16,u16)->u8 = AriGraph's own cam_pq::DistanceTables — not a language table. The language layer (DeepNSM/COCA/grammar) stays strictly upstream (it emits SPO into AriGraph; injecting COCA into the hot graph would be the "GoBD-with-Rumi" error).
markov_soa IS AriGraph — the cold-path Markov chain promoted to the hot-path SoA (the wave); EW64/the CE64 W-slot→witness arc is the particle.

contract::soa_view::MailboxSoaView gains a doc note: EpisodicWitness64 = AriGraph in the mailbox SoA view (deferred accessor, qualia-pattern; EW64 is a queued design, not yet a code symbol).

Findings on the board (the durable record)

The three Markovs — feat: bump arrow 57, datafusion 51, lance 2 #1 context-chain (CE64→EW64 arc, deterministic substrate), Module 6: #[track_caller] error macros for zero-cost location capture #2 hybrid+ autocomplete (markov_soa, leashed dark-horse), Claude/review lance graph architecture i6 t kf #3 sink-in-and-pray (deprecated VSA-substrate) — + the P1→P2→P3 gate-before-grail ordering.
VSA substrate decision — explicit 32k SPO-W is the substrate; VSA16k is a strictly-fuzzy proposer (cognitive priming), never cosine, never truth.
EW64 reactive seam — Lance-update = witness-pointer = SurrealDB-kanban-subscription; every link shipped, the chain open at the joints (the wiring task, gated).

Verification

cargo test --manifest-path crates/jc/Cargo.toml → 60/60 (probe file clippy-clean; pre-existing jc lints in other files untouched, out of scope).
cargo test -p lance-graph-contract → 503/503 (the EW64 doc note + soa_view).
cargo test --manifest-path crates/deepnsm/Cargo.toml → green after the markov_soa removal.
⚠️ lance-graph::graph::arigraph::markov_soa is unverified-offline — lance-graph core's lance/datafusion/arrow deps don't fetch in the sandbox; the module + its 4 tests are authored against the grounded MailboxSoaView surface but need a full-checkout compile-verify. Flagged STATUS: provisional in the module header. Its truly-correct final home is inside the EW64-in-SoA seam (P1/P2).

Honest scope

This is vision + measurement + one disciplined refactor, not a feature. The hydration manager (D-LWS), the EW64 type (P1/P2), the Wikidata-scale probe, and the markov_soa core-verify are all queued/gated, clearly labelled CONJECTURE-with-probe where they're unproven. The shipped, verified pieces are: the locality probe (PASS), the contract doc note, the deepnsm cleanup, and the board record.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

Generated by Claude Code

Summary by CodeRabbit

Documentation
- Added extensive architecture and design docs for a lazy “world-spine” hydration, delta-card addressing, plans, epiphanies and status updates describing tiered cold/hot/semantic layers and next steps.
New Features
- Added an ontology locality probe example to measure partition locality on real ontologies.
- Added vocabulary-agnostic wave-based similarity projections for AriGraph to support SoA-window comparisons and provenance-aware matching.

…tiered substrate Capstone north-star: one NiblePath address unifies ontology position = memory arena = (leaf) spatial coordinate. Tiering — COLD Lance columnar ◄─NiblePath─► HOT mailbox-SoA (agnostic bytes) ◄── SEMANTIC OGIT/DOLCE cache (C2 resolve-not- store). Reframings: (1) the cold path SPLITS — DataFusion rows/cols joins are SLOW, business-SQL ground-truth ONLY, off the hot path; HHTL hydration is address-based (NiblePath → CAM/palette/blasgraph, O(1)), not join-based. (2) DOLCE continuant/occurrent = a 1-bit permanent/temporary residence policy. (3) AriGraph SPO + labels → agnostic SoA + late labels (C2 wholesale). Markov = the CausalEdge64 W-slot → WitnessTable/EpisodicWitness64 arc (NOT the 16384 VSA bundle, which is retired legacy / discovery-layer only). Reasoning = traversing the CE64→EW64 arc + SPO, no embedding/forward-pass. Reading a text = accumulating SPO mailboxes + their causal-edge/witness arc; ambiguity resolved by counterfactual testing (recipe_kernels world⊗factual⊗counterfactual, popcount). A 250-page book ≈ 4-5k sentences ≈ ~4096 SPO mailboxes = one per-cohort WitnessTable<64> cohort. The resident agnostic row ~4096 bits (address carries class+label inheritance). Address: byte-aligned 256^4 = 2^32 ~ 4.3B — the 4-byte CAM-PQ code IS the address = class+label key = palette-distance key. Built vs new vs conjecture mapped; invariants (CAM-exact, similarity-only-in- discovery, SoA stays agnostic) recorded. The one missing runtime piece: a NiblePath-keyed tiered hydration manager. - knowledge/agnostic-lazy-world-spine.md (the north-star) - EPIPHANIES: the world-spine FINDING https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…rise, deck=expectation) Consolidates the 8-turn addressing design from the end (cookbook/delta-card) back through the full chain. The one idea: a card stores the surprise, the deck stores the expectation; meaning = deck ⊗ delta — the free-energy framing (prior + prediction-error), applied to BOTH the key (address) and the value (content). - Cookbook (value side): recipe = inherited(region×season×persona) + 8-16 delta bits; boundary = generator-vs-derivable. - Addressing (key side): partition-as-address / schema-as-deck (Quartettkarten); 27-bit truthful floor with ~0-bit row; sparse radix range-delegation (no 256^4 files); frozen ISA = compiled perfect hash, no rebalance, version-gated upgrade. - Frame model (x264/265): I=frozen radix+compacted base fragment, P=appended+ CLAM-clustered delta, B=RISC compose-cache, GOP compaction = amortized upgrade = where similarity freezes to structure. IS Lance fragment-versioning. - RISC compose-not-materialize: store generators, derive <=7-hop closure via ComposeTable/mxm; dissolves the hub problem; per-predicate composability flag. - Two trees: frozen ontology radix = address (exact); CLAM/CHESS = proposes the partition (similarity, discovery-only). Adaptive proposes, frozen ships. - Scale identities: 6-bit cohort ⊂ 16-bit book ⊂ 18-bit hot envelope(256K) ⊂ 32-bit world(cold). Reasoning = CE64→EW64 arc, not the 16384 VSA bundle. - 3 probes (Louvain/CLAM locality; delta-card residual; compose hit-rate). New: knowledge/delta-card-addressing-integration-map.md; EPIPHANIES capstone; cross-link from agnostic-lazy-world-spine.md (which it supersedes for addressing). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

The thesis the whole map reaches for: split identity (which one = 27 bits irreducible, radix trie, path-compressed) from description (what it is = ~0 bits for the modal class member, inherited whole from the frozen OGIT deck). A typical entity stores nothing — it inherits everything; only the surprising one pays. The spine's price is paid ONCE by the frozen ontology, amortized to nothing per entity. Absence is not missing data; absence IS the inheritance. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

Module doc states honest scope: real ontology subClassOf graphs from data/ontologies, NOT full Wikidata. Parser tracks current subject + predicate, strips string literals/comments, skips blank-node OWL restrictions, emits (child,parent) named-IRI edges only. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

NiblePath-keyed tiered hydration plan W1. Verified-symbols table + EpisodicWitness64/Lance-fragment risk flags + D-LWS-1..9 index. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…dix register https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…mpose-cache https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

… (centerpiece) https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

… + board-hygiene Completes wikidata-lazy-spine-hydration-v1: prefetch cascade, DOLCE-1bit eviction, probe harness (produces P1/P2/P3 gates), deferred 115M load, per-crate firewall contract, 7-risk register (R1 EpisodicWitness64 absent, R2 Lance fragment APIs not wired, R3 CLAM is probe-not-clusterer). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

… into the SoA per ractor-mailbox Two corrections (user, mid-wave; W1 drift-audit also flagged the symbol): - There is NO VSA in this design. Drop the '16384-bit VSA bundle (retired legacy)' framing entirely — reasoning is a native CE64 W-slot → EpisodicWitness arc + SPO graph walk, no fingerprint bundling. The discovery layer (aerial/ splat) uses a transient palette256/CAM-PQ distance, never a bundle. - EpisodicWitness64 is NOT a phantom and NOT shipped-as-named: it is the NEW AriGraph, migrated INTO the SoA per ractor-mailbox (cohort-local episodic memory as a SoA column). Shipped seed = WitnessTable<64> + WitnessEntry (6-bit W-slot); EpisodicWitness64's 64-bit layout (incl. the 16-bit book tier) is the design surface to settle. Relabelled NEW build target throughout + Status note. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…s, prefetch=Meta) The shock named: every link shipped, the chain open at the joints. - Layering corrected: Markov (CE64 W-slot → EW64 arc) is the BASIS; predictive- prefetch is the META on top — the prefetch IS the wiring IS the learning (Hebbian: aerial 'fire together' offline → EW64 'wire together' online). - Reactive spine (keystone): Lance update = witness pointer = SurrealDB kanban subscription trigger — one event propagating through the storage layer as the prefetch signal (why EW64 shares CE64 low-40, why kanban is in contract). - Diagnosis: island-archipelago — EpisodicWitness64/SpoWitness64 (pr-ce64-mb-4) = 0 code symbols; HotWitness = todo!() scaffold; Lance→Surreal→kanban subscription unwired. EW64 is the SEAM, not a type. Invisible in green suites. - Queued (second wave, post-probe-consolidation): one whole-seam spec. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…OCA+CAM-PQ, no cosine) The 'meet halfway' on VSA: turn the black-box bundle into an explicit, deterministic projection of the mailbox-SoA window into its COCA-rank SPO triplets + full provenance (which rows, at what proximity). The triplets stay ADDRESSABLE — no superposition destroys the register. Match is DeepNSM's OWN machinery, NOT float cosine: COCA-4096 vocabulary + the CAM-PQ 4096² u8 word-distance matrix via SimilarityTable::lookup_u8 + proximity prior. best_guess_match = nearest-triplet CAM-PQ similarity, averaged. Strictly a fuzzy proposer (cognitive priming): proposes where-to-look / what-it- resembles ('feels like a Sicilian'), never asserts; exact 32k SPO-W confirms. Consumes contract::soa_view::MailboxSoaView through the EXISTING hard dep — zero new dependency, firewall preserved (no dep on the heavy cognitive-shader-driver that implements the view). Verified: 5 markov_soa tests green (incl. best_guess_match_uses_cam_pq_not_cosine, determinism, edge-clamp, skip-untripled, empty=0); full deepnsm suite 94/4/8/1 no regressions; clippy clean in markov_soa (pre-existing lints in other files untouched, out of scope). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…e; VSA = fuzzy proposer (priming), not cosine Supersedes my two earlier mis-framings in-place (board hygiene: don't leave wrong findings standing): - (a) 'VSA = per-cycle experience/soul-print vector' — wrong scope. - (b) 'keep DeepNSM as a parallel universe' — DeepNSM migrates too. Converged finding: the explicit 32k SPO-W is the substrate (addressable, lossless, reasoning-capable, provenance-bearing — categorically > any bundle; ~32-item recovery capacity vs 32k = 1000x over). VSA16k's legitimate role = a strictly- fuzzy proposer / cognitive priming, firewall-gated to discovery; match via COCA + CAM-PQ SimilarityTable, NOT cosine. Records the markov_soa.rs artifact (e0a5049), the aerial within/cross-cohort synergy + the queued CodebookDistance adapter D-id, and the CLAUDE.md reconciliation note. Also: crates/deepnsm/Cargo.lock regen from the markov_soa build (benign). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

One word, three ranked uses; the deterministic CE64→EW64 chain is the line: 1) context-chain building = mailbox chaining through the CE64 W-slot → EpisodicWitness64 arc (deterministic, exact, addressable = THE substrate). 2) hybrid+ autocomplete = #1's chain + a fuzzy accumulated witness-bundle as speculative autocomplete, leashed to the chain that confirms it (= markov_soa + the grail-fold experiment). Invariant: unleashed, #2 degrades into #3. 3) sink-in-and-pray = old VSA-bundle-as-Markov, ceiling-bound, ungrounded — the black box (deprecated; the 'every GGUF would already be VSA' disproof). The line: #1 is the chain; #2 is the chain plus a guess it must confirm; #3 is the guess without a chain. Gate before grail: P1 AriGraph→SoA (HotWitness D-ATOM-5 todo!()s) → P2 EW64 in MailboxSoaView (qualia-pattern accessor) → P3 the grail-fold experiment (CONJECTURE, gated, Jirak-baselined, downstream of the EW64 seam — no scope creep). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…c (delete deepnsm copy) markov_soa is the Markov WAVE; EW64/the CE64 W-slot→witness arc is the PARTICLE. Complementary → same home. It was wrongly in deepnsm (core concern depending on a linguistics sensor = layer inversion). Moved to crates/lance-graph/src/graph/arigraph/markov_soa.rs. SoC deeper step: the SoA SPO row is three OPAQUE u16 ranks — vocabulary is a late-resolved CLASS property, never a SoA fact (C2 / I-VSA-IDENTITIES, applied to the triplet encoding). SPO CAN be COCA (good for input parsing) but the SoA/AriGraph mailbox-view must NOT be forced into COCA. The projector takes an injected Fn(u16,u16)->u8 distance — caller supplies AriGraph's cam_pq DistanceTables OR DeepNSM's COCA table. Reuse-by-injection; core has 0 deepnsm dep (the dep graph enforces agnosticism). - AriGraph: SpoRanks{s,p,o:u16} opaque + SoaWavePrimer + WaveProjection (4 tests). - Deleted crates/deepnsm/src/markov_soa.rs (sole ref was its own mod decl); deepnsm still 89/4/8/1 green after removal. - STATUS: AriGraph version unverified-offline (lance-graph core's lance/datafusion/arrow deps don't fetch in the sandbox) — verify on full checkout. - EPIPHANIES: the SoC + vocabulary-agnostic finding. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…); language stays upstream in DeepNSM markov_soa is NOT a generic projector that takes a COCA lens — it IS AriGraph, the cold-path Markov chain promoted to the hot-path SoA. AriGraph is agnostic and NOT necessarily English (holds business/GoBD/Wikidata/text SPO). The match metric is AriGraph's OWN cam_pq::DistanceTables, NOT a language table. The language layer (DeepNSM/COCA-4096/grammar templates) stays STRICTLY upstream: it scans flat data (usually English), parses, and EMITS SPO into AriGraph — and MUST stay English (grammar templates get messy otherwise). Injecting a COCA distance into the hot graph would be the GoBD-with-Rumi error (a language lens over an agnostic graph). Removed the wrong 'or DeepNSM COCA table' injection alternative from both the module doc and the EPIPHANIES finding. (also captures the EPIPHANIES SoC finding that the prior commit's bad pathspec dropped from the index.) https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

Trailing index sync — the pub mod removal from the move (9a5f54c) re-surfaced after a linter touch. deepnsm no longer declares markov_soa; it lives in AriGraph. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…view Add a deferred-accessor note on MailboxSoaView (beside the qualia one): the future EpisodicWitness64 column IS AriGraph promoted into the mailbox SoA view — the cold-path episodic Markov chain (arigraph::{episodic,witness_corpus}) as a hot-path per-row column = the CausalEdge64 W-slot → witness arc (Markov #1, the particle; markov_soa is the wave). EW64 is not yet a code symbol (queued, see E-EW64-IS-PREDICTIVE-PREFETCH); shipped seeds are the W-slot + WitnessTable<64> + arigraph episodic. Stays agnostic — language (DeepNSM/COCA) stays upstream. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

… Q=0.325) The empirical falsifier for the delta-card / inherited-nothingness addressing claim (probe #1 of the integration map), harvested from the W2 wave worker and RUN on the real on-disk ontologies (DOLCE-Ultralite, schema.org, Odoo, PROV-O, QUDT, OWL-Time). Measured (1170 classes, 1224 subClassOf edges, 33 top-basins): - LOCALITY = 98.61% intra-basin (the '~90% local' claim survives + exceeds) - FAN-OUT max = 3 (<=16 ✓; 1121 classes have exactly 1 parent-basin) - MODULARITY Q = 0.3246 (>0.3 = clear community structure) VERDICT: PASS — on REAL frozen-ISA ontology structure, 16-bit local references + the <=16 family frontier are real. HONEST CAVEAT (in the probe verdict): real ontologies ~10^3 classes, NOT Wikidata ~10^8; the Wikidata P279 run stays the open probe. Conjecture → FINDING on real ontologies. zero-dep jc (hand-rolled TTL scan, reuses splat_louvain_modularity machinery); 60/60 jc tests green; probe file clippy-clean (pre-existing jc lints in other files untouched). EPIPHANIES: the measured-result FINDING. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

…wave entry STATUS_BOARD: the 9 D-LWS hydration-manager rows (D-LWS-8 probe-1 SHIPPED: locality 98.6%/fan-out 3/Q=0.325 PASS), + D-MKV-SOA + D-EW64-NOTE rows. AGENT_LOG: the world-spine vision + W1/W2 wave + markov_soa SoC + EW64-as- AriGraph + probe-result session entry. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

coderabbitai · 2026-05-31T15:57:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9056ef47-af8d-4bad-aff1-a534ac9a4008

📥 Commits

Reviewing files that changed from the base of the PR and between 3e860b0 and 5c652f4.

📒 Files selected for processing (1)

crates/lance-graph/src/graph/arigraph/markov_soa.rs

🚧 Files skipped from review as they are similar to previous changes (1)

crates/lance-graph/src/graph/arigraph/markov_soa.rs

📝 Walkthrough

Walkthrough

This PR documents a lazy world-spine hydration design, adds an empirical ontology-locality probe example, and implements vocabulary-agnostic SoA wave projections for AriGraph with supporting docs and tests.

Changes

Lazy world-spine architecture with empirical validation and runtime support

Layer / File(s)	Summary
Architecture vision and design decisions `.claude/board/AGENT_LOG.md`, `.claude/board/EPIPHANIES.md`, `.claude/board/STATUS_BOARD.md`, `.claude/knowledge/agnostic-lazy-world-spine.md`, `.claude/knowledge/delta-card-addressing-integration-map.md`, `.claude/plans/wikidata-lazy-spine-hydration-v1.md`	Board findings and epiphanies (2026-05-31) document the lazy world-spine vision: unified `NiblePath` addressing, tiered hydration (cold/hot/semantic), delta-card framing, three-Markovs taxonomy, substrate decisions (~32k SPO-W truth path vs VSA16k proposer), EW64 witness seam notes, and a D-LWS implementation plan with probes/gates.
Ontology locality probe — empirical validation `crates/jc/Cargo.toml`, `crates/jc/examples/ontology_locality_probe.rs`	New Cargo example and Rust program that parses TTL `subClassOf` edges, interns class IRIs (`ClassGraph`), assigns deterministic basins, computes edge locality %, per-class fan-out, and Newman modularity Q, and emits a Pass/Marginal/Fail verdict. Includes recursive loader and unit tests for parsing, metrics, verdicts, and cycles.
AriGraph markov_soa module — runtime wave projections `crates/lance-graph/src/graph/arigraph/markov_soa.rs`, `crates/lance-graph/src/graph/arigraph/mod.rs`, `crates/lance-graph-contract/src/soa_view.rs`	Adds vocabulary-agnostic SoA-window wave projection types and operations: `SpoRanks`, `RowContribution`, `BundleProvenance`, `WaveProjection::best_guess_match` (injectable per-role distance), and `SoaWavePrimer::project` (±radius MailboxSoaView folding). Includes unit tests and a MailboxSoaView doc comment describing a future `EpisodicWitness64` accessor.

Sequence Diagram(s)

sequenceDiagram
  participant TTL as TTL Input
  participant Parser as parse_subclass_edges
  participant Graph as ClassGraph
  participant Basin as assign_basins
  participant Metrics as locality/fan_out/modularity_q
  participant Verdict as verdict
  TTL->>Parser: feed lines (strip strings/comments)
  Parser->>Graph: emit (child,parent) edges
  Graph->>Basin: assign deterministic basin roots
  Basin->>Metrics: compute edge locality, fan-out histogram, Q
  Metrics->>Verdict: evaluate thresholds -> Pass/Marginal/Fail

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

AdaWorldAPI/lance-graph#437: Adds a deferred EpisodicWitness64-related MailboxSoaView doc comment that relates to the MailboxSoaView contract referenced here.
AdaWorldAPI/lance-graph#434: Overlaps on board/epiphany docs and Markov/witness substrate discussions; this PR extends with code for markov_soa and the locality probe.

Poem

A rabbit scrawls in margin light,
Basin roots hum through day and night,
Probes and waves in tiny hops,
Markov beats and mailbox stops—
Build the spine, then let truth bite. 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three main components of the changeset: documentation of an agnostic lazy world-spine architecture, a locality probe demonstrating it passes validation criteria, and a refactoring that moves markov_soa into AriGraph as vocabulary-agnostic code with EpisodicWitness64 framing.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/jolly-cori-clnf9-worldspine

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e860b06ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T16:00:06Z

+        if c == '#' {
+            // Rest of line is a comment.
+            break;


Preserve fragments inside angle-bracket IRIs

For valid Turtle that uses full IRIs such as <http://example.org#Child> rdfs:subClassOf <http://example.org#Parent> ., this treats the # inside the IRI as the start of a comment before tokenization. The parser then truncates both class names (and normalize_iri still accepts tokens beginning with <), so different fragment IRIs in the same namespace can collapse into the same malformed class key and corrupt the locality/fan-out measurements.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-31T16:00:06Z

+            if predicate_is_subclass {
+                if let (Some(child), Some(parent)) =
+                    (current_subject.clone(), normalize_iri(tok))
+                {
+                    if child != parent {
+                        edges.push((child, parent));


Stop carrying subClassOf past delimited objects

When a valid Turtle object is written with attached punctuation, e.g. rdfs:subClassOf ex:Parent; or ex:Parent ., normalize_iri(tok) strips the delimiter and emits the parent, but predicate_is_subclass is left true. The next predicate token on the same line or on an indented continuation line can then be parsed as another superclass, adding bogus edges and skewing the probe's verdict; the object delimiter needs to reset/end the active predicate after the edge is emitted.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/plans/wikidata-lazy-spine-hydration-v1.md:
- Around line 135-147: The comment clarifies that the P1 "fan-out max=3" metric
in ontology_locality_probe.rs measures the number of DISTINCT top-basins among a
class’s direct subClassOf parents (i.e., distinct parent-basin count), not the
designed branching factor; update the code/comments so the computed fan-out
variable and any accompanying log/message (e.g., the fan-out computation in
ontology_locality_probe.rs and the variables locality and max_fanout) explicitly
state they count distinct top-basins of direct parents, and ensure PASS logic
checks both locality >= 0.90 AND max_fanout <= 16; also adjust any text that
conflates this metric with the architectural branching factor to avoid
confusion.

In `@crates/jc/examples/ontology_locality_probe.rs`:
- Around line 383-388: The current collection of subclass edges pushes every
(child,parent) pair into edges (built via intern, names, id_of) and allows
duplicates which skews locality/modularity; before using edges for metric
computation, deduplicate the Vec<(usize,usize)> (or switch to a HashSet) so only
unique (ci,pi) pairs are retained; update the same deduplication logic in the
other blocks that build edges around the later sections (the ones starting near
the other ranges) so repeated triples are eliminated prior to computing metrics.
- Around line 209-654: The file exposes core probe logic as free functions
(parse_subclass_edges, locality, fan_out, modularity_q, verdict, load_dir) which
violates the carrier-pattern rule; refactor by introducing a carrier struct
(e.g., Probe or OntologyProbe) that holds probe state (edges, files, maybe
config) and convert those free functions into inherent methods (e.g.,
Probe::parse_subclass_edges, Probe::locality, Probe::fan_out,
Probe::modularity_q, Probe::verdict, Probe::load_dir) updating any call sites to
use method calls on the carrier instance and moving any related state (edges,
files, basin, graph) into the struct so methods operate on &self / &mut self
rather than standalone parameters. Ensure signatures, visibility, and tests are
adjusted accordingly and that load_dir populates the carrier's fields instead of
returning raw tuples.

In `@crates/lance-graph/src/graph/arigraph/markov_soa.rs`:
- Around line 178-182: In SoaWavePrimer::project the window math currently casts
self.radius (u32) and focal_row (usize) to i32 via "as", which can truncate;
replace those lossy casts by using checked/widened conversions (e.g., use
i64::try_from(self.radius) or isize::try_from(focal_row) / i32::try_from where
appropriate) and propagate or handle conversion errors (return early or clamp)
before computing r and row_i; update the loop that uses r, row_i and bounds
checks with the new safe types and keep references to class_ids = soa.class_id()
and n for bounds logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: dda7d8cd-b3f0-4111-9bed-f94764e8b4db

📥 Commits

Reviewing files that changed from the base of the PR and between 67534a3 and 3e860b0.

⛔ Files ignored due to path filters (1)

crates/deepnsm/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

.claude/board/AGENT_LOG.md
.claude/board/EPIPHANIES.md
.claude/board/STATUS_BOARD.md
.claude/knowledge/agnostic-lazy-world-spine.md
.claude/knowledge/delta-card-addressing-integration-map.md
.claude/plans/wikidata-lazy-spine-hydration-v1.md
crates/jc/Cargo.toml
crates/jc/examples/ontology_locality_probe.rs
crates/lance-graph-contract/src/soa_view.rs
crates/lance-graph/src/graph/arigraph/markov_soa.rs
crates/lance-graph/src/graph/arigraph/mod.rs

coderabbitai · 2026-05-31T16:02:36Z

+### Gate P1 — Partition locality (CONJECTURE → must measure)
+- **Driver:** `jc/examples/splat_louvain_modularity.rs` (Louvain modularity =
+  popcount-AND over `contract::splat::AwarenessPlane16K` planes) +
+  `neighborhood::clam::measure_cluster_radii` on the real P279/subClassOf +
+  edge graph derived from `data/ontologies/*.ttl` (e.g. the FIBO or
+  schema.org subtree; biology subtree once Wikidata lands).
+- **Pass:** high modularity ⇒ ≥~90% of edges are intra-cohort ⇒ 16-bit
+  intra-cohort references + the family frontier are real, and the natural
+  fan-out (the 4/12/16 split) is observed, not assumed.
+- **Gates:** D-LWS-1 fan-out choice; D-LWS-4 GOP P-frame placement; D-LWS-5
+  cohort residency.
+- **Honest status:** `clam.rs` header literally says the radii-coincide-with-
+  ontology-boundaries claim "is a TEST, not a fact." Treat as **CONJECTURE**.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check what the locality probe actually measures as "fan-out" # Look for the probe implementation to see what metrics it computes rg -n -A5 -B5 "fan.?out|FAN.?OUT" crates/jc/examples/ontology_locality_probe.rs # Also check if there's any documentation of the probe's output format rg -n "max.*fan|modularity|locality.*percent" crates/jc/examples/ontology_locality_probe.rs

Repository: AdaWorldAPI/lance-graph

Length of output: 9054

Clarify that P1’s “fan-out max=3” measures distinct parent-basin count per class, not the designed branching factor.

In ontology_locality_probe.rs, fan-out is computed as the number of DISTINCT top-basins among each class’s direct subClassOf parents, and PASS is locality >= 0.90 and max_fanout <= 16.

So FAN-OUT max = 3 is consistent with the “16-frontier” cap; it just indicates the observed worst-case distinct parent-basin count is 3 (and doesn’t contradict the architectural 4/12/16 vs 16-way split unless prose conflates the metrics).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/plans/wikidata-lazy-spine-hydration-v1.md around lines 135 - 147, The comment clarifies that the P1 "fan-out max=3" metric in ontology_locality_probe.rs measures the number of DISTINCT top-basins among a class’s direct subClassOf parents (i.e., distinct parent-basin count), not the designed branching factor; update the code/comments so the computed fan-out variable and any accompanying log/message (e.g., the fan-out computation in ontology_locality_probe.rs and the variables locality and max_fanout) explicitly state they count distinct top-basins of direct parents, and ensure PASS logic checks both locality >= 0.90 AND max_fanout <= 16; also adjust any text that conflates this metric with the architectural branching factor to avoid confusion.

coderabbitai · 2026-05-31T16:02:36Z

+pub fn parse_subclass_edges(ttl: &str) -> Vec<(String, String)> {
+    const SUBCLASS: &str = "subClassOf"; // matches rdfs:subClassOf AND bare subClassOf
+    let mut edges: Vec<(String, String)> = Vec::new();
+    let mut current_subject: Option<String> = None;
+    let mut predicate_is_subclass = false;
+    let mut in_long_string = false;
+    // Depth of nested `[ ... ]` blank-node restrictions. While > 0 we are
+    // INSIDE an anonymous OWL restriction and emit no edges; the restriction
+    // spans multiple physical lines, so this persists across the line loop.
+    let mut bracket_depth: i32 = 0;
+
+    for raw_line in ttl.lines() {
+        let line = strip_strings_and_comments(raw_line, &mut in_long_string);
+        let leading_ws = raw_line.starts_with(char::is_whitespace);
+
+        // Split into whitespace tokens (Turtle is whitespace-delimited at this
+        // granularity; we already stripped strings/comments).
+        let toks: Vec<&str> = line.split_whitespace().collect();
+        if toks.is_empty() {
+            // A blank physical line does not by itself end a statement.
+            continue;
+        }
+
+        let mut idx = 0;
+
+        // A statement that begins flush-left (no leading whitespace) and whose
+        // first token is a named IRI / blank starts a NEW subject — UNLESS the
+        // line is a pure object-list continuation beginning with ',' (handled
+        // below) or a directive (@prefix / @base / PREFIX / BASE).
+        let first = toks[0];
+        let is_directive = first.starts_with('@')
+            || first.eq_ignore_ascii_case("prefix")
+            || first.eq_ignore_ascii_case("base");
+        if is_directive {
+            // Directives don't carry subjects or edges; but a directive still
+            // can be terminated by '.', which must not clobber subject state of
+            // a real statement (directives are always flush-left & self
+            // contained), so just skip the whole line.
+            continue;
+        }
+
+        if bracket_depth == 0
+            && !leading_ws
+            && first != ","
+            && first != ";"
+            && !first.starts_with('[')
+        {
+            // New subject candidate (only when not inside a blank node).
+            if let Some(subj) = normalize_iri(first) {
+                current_subject = Some(subj);
+            } else {
+                current_subject = None;
+            }
+            predicate_is_subclass = false;
+            idx = 1;
+        }
+
+        // Walk remaining tokens, tracking predicate switches and emitting
+        // edges while the active predicate is subClassOf AND we are at
+        // bracket depth 0 (outside any anonymous restriction).
+        while idx < toks.len() {
+            let tok = toks[idx];
+
+            // Update bracket depth from any '[' / ']' characters in the token,
+            // then move on if the token is pure bracket punctuation. A '['
+            // opening means the CURRENT subClassOf object is an anonymous
+            // restriction; we suppress emission until the matching ']' but
+            // stay in subClassOf predicate mode so a following ',' continues
+            // the OUTER object list.
+            let opens = tok.matches('[').count() as i32;
+            let closes = tok.matches(']').count() as i32;
+            if opens > 0 || closes > 0 {
+                bracket_depth += opens - closes;
+                if bracket_depth < 0 {
+                    bracket_depth = 0;
+                }
+                // If the token is only brackets (possibly with ',' / ';'),
+                // there is nothing else to interpret on it.
+                let stripped: String = tok
+                    .chars()
+                    .filter(|&c| c != '[' && c != ']' && c != ',' && c != ';')
+                    .collect();
+                if stripped.is_empty() {
+                    idx += 1;
+                    continue;
+                }
+            }
+
+            // Anything inside a blank node is ignored entirely.
+            if bracket_depth > 0 {
+                idx += 1;
+                continue;
+            }
+
+            // Object-list continuation: ',' keeps the current predicate.
+            if tok == "," {
+                idx += 1;
+                continue;
+            }
+            // ';' ends the current predicate's object list (a new predicate
+            // follows on this or a later line).
+            if tok == ";" {
+                predicate_is_subclass = false;
+                idx += 1;
+                continue;
+            }
+            // '.' terminates the whole statement → no active subject.
+            if tok.starts_with('.') && tok.len() == 1 {
+                current_subject = None;
+                predicate_is_subclass = false;
+                idx += 1;
+                continue;
+            }
+
+            // Predicate detection: rdfs:subClassOf or bare subClassOf.
+            let bare = tok.trim_end_matches([';', ',']);
+            if bare == SUBCLASS || bare.ends_with(":subClassOf") || bare == "rdfs:subClassOf" {
+                predicate_is_subclass = true;
+                idx += 1;
+                continue;
+            }
+            // In subClassOf object position: emit a named-IRI edge.
+            if predicate_is_subclass {
+                if let (Some(child), Some(parent)) =
+                    (current_subject.clone(), normalize_iri(tok))
+                {
+                    if child != parent {
+                        edges.push((child, parent));
+                    }
+                }
+                idx += 1;
+                continue;
+            }
+
+            // Not in subClassOf mode: a token like `a`, `rdf:type`,
+            // `owl:disjointWith`, `rdfs:label` is a (non-subclass) predicate;
+            // it just resets predicate state. We do not need its objects.
+            if bare == "a" || bare.contains(':') {
+                predicate_is_subclass = false;
+            }
+            idx += 1;
+        }
+    }
+    edges
+}
+
+// ── class graph: intern IRIs, build parent adjacency, assign top-basins ─────
+
+/// Interned subClassOf DAG over class IRIs.
+pub struct ClassGraph {
+    /// id -> IRI key (for printing).
+    pub names: Vec<String>,
+    /// Direct parents of each class (deduplicated, sorted).
+    pub parents: Vec<Vec<usize>>,
+    /// All edges as interned (child, parent) id pairs.
+    pub edges: Vec<(usize, usize)>,
+}
+
+impl ClassGraph {
+    /// Build from `(child, parent)` IRI-key edges. Every IRI appearing in any
+    /// position becomes a node (a parent that is never a child is a root).
+    pub fn from_edges(iri_edges: &[(String, String)]) -> Self {
+        let mut id_of: BTreeMap<String, usize> = BTreeMap::new();
+        let mut names: Vec<String> = Vec::new();
+        let intern = |s: &str, names: &mut Vec<String>, id_of: &mut BTreeMap<String, usize>| {
+            if let Some(&id) = id_of.get(s) {
+                id
+            } else {
+                let id = names.len();
+                names.push(s.to_string());
+                id_of.insert(s.to_string(), id);
+                id
+            }
+        };
+        let mut edges: Vec<(usize, usize)> = Vec::new();
+        for (c, p) in iri_edges {
+            let ci = intern(c, &mut names, &mut id_of);
+            let pi = intern(p, &mut names, &mut id_of);
+            edges.push((ci, pi));
+        }
+        let n = names.len();
+        let mut parents: Vec<Vec<usize>> = vec![Vec::new(); n];
+        for &(c, p) in &edges {
+            parents[c].push(p);
+        }
+        for ps in parents.iter_mut() {
+            ps.sort_unstable();
+            ps.dedup();
+        }
+        Self { names, parents, edges }
+    }
+
+    pub fn n_classes(&self) -> usize {
+        self.names.len()
+    }
+
+    /// Assign each class to its top-basin = the root ancestor reached by
+    /// walking parents upward. Multi-parent: follow the parent with the
+    /// SMALLEST interned id (deterministic representative). Cycles: broken by
+    /// a visited-set; the entry node of a cycle becomes its own basin.
+    /// Returns `basin[id] = root_id`.
+    pub fn assign_basins(&self) -> Vec<usize> {
+        let n = self.n_classes();
+        let mut basin = vec![usize::MAX; n];
+        for start in 0..n {
+            if basin[start] != usize::MAX {
+                continue;
+            }
+            // Walk up to a root, recording the path; memoize on the way back.
+            let mut path: Vec<usize> = Vec::new();
+            let mut visiting: BTreeSet<usize> = BTreeSet::new();
+            let mut cur = start;
+            let root;
+            loop {
+                if let Some(&memo) = basin.get(cur) {
+                    if memo != usize::MAX {
+                        root = memo;
+                        break;
+                    }
+                }
+                if visiting.contains(&cur) {
+                    // Cycle: treat `cur` as the basin root for this SCC entry.
+                    root = cur;
+                    break;
+                }
+                visiting.insert(cur);
+                path.push(cur);
+                // Pick the smallest-id parent (deterministic). No parent → root.
+                match self.parents[cur].iter().min() {
+                    Some(&p) => cur = p,
+                    None => {
+                        root = cur;
+                        break;
+                    }
+                }
+            }
+            for id in path {
+                basin[id] = root;
+            }
+            if basin[start] == usize::MAX {
+                basin[start] = root;
+            }
+        }
+        basin
+    }
+}
+
+// ── metric 1: locality ──────────────────────────────────────────────────────
+
+/// Fraction of edges whose child and parent share a top-basin.
+/// Returns (local_edges, total_edges, fraction). Empty graph → fraction 0.
+pub fn locality(edges: &[(usize, usize)], basin: &[usize]) -> (usize, usize, f64) {
+    let total = edges.len();
+    if total == 0 {
+        return (0, 0, 0.0);
+    }
+    let local = edges
+        .iter()
+        .filter(|&&(c, p)| basin[c] == basin[p])
+        .count();
+    (local, total, local as f64 / total as f64)
+}
+
+// ── metric 2: fan-out (distinct parent-basins per class) ────────────────────
+
+/// Per-class count of DISTINCT parent-basins among its direct subClassOf
+/// parents. Returns (max_fanout, histogram) where histogram[k] = #classes
+/// whose fan-out == k. Classes with no parents contribute fan-out 0.
+pub fn fan_out(graph: &ClassGraph, basin: &[usize]) -> (usize, BTreeMap<usize, usize>) {
+    let mut hist: BTreeMap<usize, usize> = BTreeMap::new();
+    let mut max_fo = 0usize;
+    for c in 0..graph.n_classes() {
+        let distinct: BTreeSet<usize> = graph.parents[c].iter().map(|&p| basin[p]).collect();
+        let fo = distinct.len();
+        max_fo = max_fo.max(fo);
+        *hist.entry(fo).or_insert(0) += 1;
+    }
+    (max_fo, hist)
+}
+
+// ── metric 3: modularity Q of the basin partition ──────────────────────────
+//
+// Newman modularity on the UNDIRECTED subClassOf graph (each subClassOf edge
+// contributes one undirected link between child and parent):
+//
+//     Q = Σ_c [ e_c / m  -  (a_c / 2m)^2 ]
+//
+// where m = |E|, e_c = number of edges fully inside basin c, a_c = sum of
+// degrees of nodes in basin c. We reuse the `splat_louvain_modularity.rs`
+// idea — the within-community edge mass is a popcount-AND between a node's
+// neighbour bitset and the basin-membership bitset — but with dynamically
+// sized `Vec<u64>` planes so the probe handles ontologies with thousands of
+// classes (the contract's fixed 16,384-bit `AwarenessPlane16K` is too small
+// for schema.org). Self-loops are excluded by construction (the parser drops
+// `X subClassOf X`).
+
+/// A dynamically sized bitset (the standalone analogue of `AwarenessPlane16K`).
+struct BitPlane(Vec<u64>);
+
+impl BitPlane {
+    fn zero(n_bits: usize) -> Self {
+        BitPlane(vec![0u64; n_bits.div_ceil(64)])
+    }
+    #[inline]
+    fn set(&mut self, idx: usize) {
+        self.0[idx / 64] |= 1u64 << (idx % 64);
+    }
+    #[inline]
+    fn and_popcount(&self, other: &BitPlane) -> u32 {
+        self.0
+            .iter()
+            .zip(other.0.iter())
+            .map(|(a, b)| (a & b).count_ones())
+            .sum()
+    }
+}
+
+/// Compute Newman modularity Q of the basin partition. Returns Q in
+/// [-0.5, 1.0]. Empty graph → 0.0.
+pub fn modularity_q(graph: &ClassGraph, basin: &[usize]) -> f64 {
+    let n = graph.n_classes();
+    let m = graph.edges.len();
+    if m == 0 || n == 0 {
+        return 0.0;
+    }
+    let two_m = 2.0 * m as f64;
+
+    // Undirected neighbour bitset per node (both directions of each edge).
+    let mut neigh: Vec<BitPlane> = (0..n).map(|_| BitPlane::zero(n)).collect();
+    let mut degree = vec![0u32; n];
+    for &(c, p) in &graph.edges {
+        neigh[c].set(p);
+        neigh[p].set(c);
+        degree[c] += 1;
+        degree[p] += 1;
+    }
+
+    // Group node ids by basin; build a membership bitset per basin.
+    let mut members: BTreeMap<usize, Vec<usize>> = BTreeMap::new();
+    for (id, &b) in basin.iter().enumerate() {
+        members.entry(b).or_default().push(id);
+    }
+
+    let mut q = 0.0;
+    for ids in members.values() {
+        let mut plane = BitPlane::zero(n);
+        for &id in ids {
+            plane.set(id);
+        }
+        // e_c counted twice (once per endpoint) via Σ_u popcount(neigh[u] AND plane).
+        let mut e_c_times_two = 0u32;
+        let mut a_c = 0.0;
+        for &id in ids {
+            e_c_times_two += neigh[id].and_popcount(&plane);
+            a_c += degree[id] as f64;
+        }
+        let e_c = e_c_times_two as f64 / 2.0;
+        q += (e_c / m as f64) - (a_c / two_m).powi(2);
+    }
+    q
+}
+
+// ── verdict ──────────────────────────────────────────────────────────────────
+
+/// Verdict tier for the locality hypothesis.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum Verdict {
+    /// High locality AND fan-out fits the family frontier.
+    Pass,
+    /// Locality decent but borderline, or fan-out near the cap.
+    Marginal,
+    /// Locality low — local-pointer assumption does not hold.
+    Fail,
+}
+
+impl Verdict {
+    pub fn as_str(self) -> &'static str {
+        match self {
+            Verdict::Pass => "PASS",
+            Verdict::Marginal => "MARGINAL",
+            Verdict::Fail => "FAIL",
+        }
+    }
+}
+
+/// Decide the verdict from the measured numbers.
+///
+/// Thresholds (stated, not hand-waved):
+///   * locality ≥ 0.90 AND max_fanout ≤ 16          → PASS  (the map's claim)
+///   * locality ≥ 0.75 (or max_fanout in 17..=32)   → MARGINAL
+///   * otherwise                                     → FAIL
+///
+/// The "16" frontier is the design's pencilled cap; max_fanout > 16 means a
+/// single class needs more than 16 distinct family pointers, breaking the
+/// 4/12/16 split as stated (though a wider frontier byte would still work).
+pub fn verdict(locality_frac: f64, max_fanout: usize) -> Verdict {
+    if locality_frac >= 0.90 && max_fanout <= 16 {
+        Verdict::Pass
+    } else if locality_frac >= 0.75 || (max_fanout > 16 && max_fanout <= 32) {
+        Verdict::Marginal
+    } else {
+        Verdict::Fail
+    }
+}
+
+// ── load real ontology TTLs from a directory ────────────────────────────────
+
+/// All parsed `(child, parent)` IRI edges plus the sorted list of TTL files
+/// they came from.
+type LoadedOntology = (Vec<(String, String)>, Vec<PathBuf>);
+
+/// Recursively collect `*.ttl` files under `dir`, parse subClassOf edges from
+/// each, and return (all_edges, sorted_file_list). I/O errors on individual
+/// files are skipped with a note to stderr (the probe is best-effort over
+/// whatever real ontologies are present).
+fn load_dir(dir: &Path) -> std::io::Result<LoadedOntology> {
+    let mut edges: Vec<(String, String)> = Vec::new();
+    let mut files: Vec<PathBuf> = Vec::new();
+    let mut stack = vec![dir.to_path_buf()];
+    while let Some(d) = stack.pop() {
+        let rd = match std::fs::read_dir(&d) {
+            Ok(rd) => rd,
+            Err(e) => {
+                eprintln!("  (skip dir {}: {})", d.display(), e);
+                continue;
+            }
+        };
+        for entry in rd.flatten() {
+            let path = entry.path();
+            if path.is_dir() {
+                stack.push(path);
+            } else if path.extension().map(|e| e == "ttl").unwrap_or(false) {
+                match std::fs::read_to_string(&path) {
+                    Ok(text) => {
+                        let mut e = parse_subclass_edges(&text);
+                        edges.append(&mut e);
+                        files.push(path);
+                    }
+                    Err(e) => eprintln!("  (skip {}: {})", path.display(), e),
+                }
+            }
+        }
+    }
+    files.sort();
+    Ok((edges, files))
+}


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Move probe operations onto a carrier struct instead of free functions.

Core probe logic is implemented as free functions (parse_subclass_edges, locality, fan_out, modularity_q, verdict, load_dir) rather than methods on a state carrier, which breaks the repository’s Rust carrier-pattern rule.

As per coding guidelines, "**/*.rs: Use only method calls on the carrier struct that holds the state, never free functions. Carrier pattern: trajectory.resolve() instead of resolve(trajectory, config, awareness)"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/jc/examples/ontology_locality_probe.rs` around lines 209 - 654, The file exposes core probe logic as free functions (parse_subclass_edges, locality, fan_out, modularity_q, verdict, load_dir) which violates the carrier-pattern rule; refactor by introducing a carrier struct (e.g., Probe or OntologyProbe) that holds probe state (edges, files, maybe config) and convert those free functions into inherent methods (e.g., Probe::parse_subclass_edges, Probe::locality, Probe::fan_out, Probe::modularity_q, Probe::verdict, Probe::load_dir) updating any call sites to use method calls on the carrier instance and moving any related state (edges, files, basin, graph) into the struct so methods operate on &self / &mut self rather than standalone parameters. Ensure signatures, visibility, and tests are adjusted accordingly and that load_dir populates the carrier's fields instead of returning raw tuples.

coderabbitai · 2026-05-31T16:02:36Z

+        let mut edges: Vec<(usize, usize)> = Vec::new();
+        for (c, p) in iri_edges {
+            let ci = intern(c, &mut names, &mut id_of);
+            let pi = intern(p, &mut names, &mut id_of);
+            edges.push((ci, pi));
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Deduplicate subclass edges before metric computation.

Repeated (child,parent) edges currently count multiple times in locality and modularity, which can skew the reported verdict when ontologies overlap or repeat triples.

💡 Suggested fix

pub fn from_edges(iri_edges: &[(String, String)]) -> Self { @@ - let mut edges: Vec<(usize, usize)> = Vec::new(); + let mut edges: Vec<(usize, usize)> = Vec::new(); @@ for (c, p) in iri_edges { let ci = intern(c, &mut names, &mut id_of); let pi = intern(p, &mut names, &mut id_of); edges.push((ci, pi)); } + edges.sort_unstable(); + edges.dedup();

Also applies to: 460-470, 528-569

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/jc/examples/ontology_locality_probe.rs` around lines 383 - 388, The current collection of subclass edges pushes every (child,parent) pair into edges (built via intern, names, id_of) and allows duplicates which skews locality/modularity; before using edges for metric computation, deduplicate the Vec<(usize,usize)> (or switch to a HashSet) so only unique (ci,pi) pairs are retained; update the same deduplication logic in the other blocks that build edges around the later sections (the ones starting near the other ranges) so repeated triples are eliminated prior to computing metrics.

coderabbitai · 2026-05-31T16:02:36Z

+        let r = self.radius as i32;
+        let class_ids = soa.class_id();
+        for d in -r..=r {
+            let row_i = focal_row as i32 + d;
+            if row_i < 0 || row_i as usize >= n {


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify all potentially lossy i32 casts in this module. rg -n -C2 'as i32' crates/lance-graph/src/graph/arigraph/markov_soa.rs

Repository: AdaWorldAPI/lance-graph

Length of output: 397

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs" wc -l "$FILE" echo "---- snippet around lines 160-210 ----" sed -n '150,210p' "$FILE" | nl -ba echo "---- locate radius field/type ----" rg -n "radius" "$FILE" echo "---- locate focal_row type/assignment ----" rg -n "focal_row" "$FILE"

Repository: AdaWorldAPI/lance-graph

Length of output: 223

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs" wc -l "$FILE" echo "---- snippet around lines 160-210 (with line numbers) ----" sed -n '150,210p' "$FILE" | cat -n echo "---- locate struct/definitions mentioning radius ----" rg -n "radius" "$FILE" echo "---- locate focal_row usage/definition ----" rg -n "focal_row" "$FILE"

Repository: AdaWorldAPI/lance-graph

Length of output: 3062

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs" echo "---- other as i32 in file ----" rg -n "as i32" "$FILE" || true echo "---- usages of SoaWavePrimer::new ----" rg -n "SoaWavePrimer::new\\(" crates/lance-graph/src || true echo "---- usages of radius field ----" rg -n "\\.radius" crates/lance-graph/src/graph/arigraph || true

Repository: AdaWorldAPI/lance-graph

Length of output: 790

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "SoaWavePrimer" crates/lance-graph/src || true rg -n "\\.project\\(" crates/lance-graph/src/graph/arigraph/markov_soa.rs || true # Search whole crate for SoaWavePrimer::project call sites rg -n "SoaWavePrimer::.*project\\(" crates/lance-graph/src || true rg -n "SoaWavePrimer.*\\.project\\(" crates/lance-graph/src || true

Repository: AdaWorldAPI/lance-graph

Length of output: 1656

Avoid lossy as i32 casts in SoaWavePrimer::project window math (lines ~178-182).

self.radius: u32 and focal_row: usize are downcast to i32 via as, so large values truncate (and can even flip signs), corrupting the ±window traversal. Use checked/widened conversions (e.g., i32::try_from(...), i64/isize) and handle overflow (early return or clamp).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lance-graph/src/graph/arigraph/markov_soa.rs` around lines 178 - 182, In SoaWavePrimer::project the window math currently casts self.radius (u32) and focal_row (usize) to i32 via "as", which can truncate; replace those lossy casts by using checked/widened conversions (e.g., use i64::try_from(self.radius) or isize::try_from(focal_row) / i32::try_from where appropriate) and propagate or handle conversion errors (return early or clamp) before computing r and row_i; update the loop that uses r, row_i and bounds checks with the new safe types and keep references to class_ids = soa.class_id() and n for bounds logic.

The `format` CI job runs: cargo fmt --manifest-path crates/lance-graph/Cargo.toml -- --check markov_soa.rs had one-line struct literals + asserts that rustfmt 1.95.0 expands to multi-line. Apply canonical formatting (no logic change); the exact CI command now passes clean. Other failing-check noise was a local --all artifact — CI only formats the lance-graph crate. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

Full-breadth integration spec wiring D-MBX kanban contract through witness commit (D-ATOM-5), surreal LIVE -> Rubicon kanban flip, ExecTarget backends, head2head two-view superposition in the shader driver, EW64-Markov Hebbian prefetch, language->SPO landing (D-LWS), and BindSpace decommission. Grounded against current main (#437/#439/ #444/#445) + two recon passes; flags the two hard blockers (lance-7 witness API, surreal fork dep / OQ-11.6) and the stale-branch caveat. https://claude.ai/code/session_01PLf95mURCY96TvKBFvSWEQ

…EW64-1, D-VIEW-1) + plan v1 AriGraph episodic edges, RISC-encoded (the corrected EW64, replacing the earlier "CE64 lens"/"16-bit pointer" framings): - episodic_edges::{EpisodicEdges64(u64), EdgeRef} — 4x[4-bit family | 12-bit local]. family 0 = intra-basin (inherited from HHTL/class_id, ~98.6% per #444); 1..=15 = cross-family index into the OGIT-class-inherited palette (~1.4%). Identities inherited, never on the edge (I-VSA-IDENTITIES); a CAM_PQ facet code. - view_angle::ViewAngle — 4-bit view-schema selector; the class presence bitmask doubles as the attention mask (inherited view-schema, never per-instance semantics). 527 contract lib tests (+11); both files clippy pedantic+nursery clean. Plan: .claude/plans/episodic-risc-spine-v1.md (3 lifecycle-separated structures: CAM/OGIT identity, Lance-version pseudo-radix index, CLAM ephemeral KV; bounded-horizon compression). Finding: EPIPHANIES E-EPISODIC-CLOSURE. CI-gated next (no protoc offline): D-EW64-2 SoA columns, D-STORY-1 CLAM clusterer, D-STORY-2 session index, D-STORY-3 palette256/4096 archetypes, D-HORIZON-1 stopping rule. Board: INTEGRATION_PLANS + LATEST_STATE + STATUS_BOARD + EPIPHANIES + AGENT_LOG. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

claude added 21 commits May 31, 2026 12:08

plan(lws): chunk 1 — header, scope, verified primitives, D-id index

b349b42

NiblePath-keyed tiered hydration plan W1. Verified-symbols table + EpisodicWitness64/Lance-fragment risk flags + D-LWS-1..9 index. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

plan(lws): chunk 2 — gates (P1/P2/P3 + D-ARM-7) and D-LWS-1 sparse ra…

76b742a

…dix register https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

plan(lws): chunk 3 — D-LWS-2 delta-card value model + D-LWS-3 RISC co…

1b33d65

…mpose-cache https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

plan(lws): chunk 4 — D-LWS-4 I/P/B frames + D-LWS-5 hydration manager…

51360ca

… (centerpiece) https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

chore(deepnsm): drop markov_soa mod decl (moved to AriGraph)

7247af2

Trailing index sync — the pub mod removal from the move (9a5f54c) re-surfaced after a linter touch. deepnsm no longer declares markov_soa; it lives in AriGraph. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

chore(arigraph): register markov_soa module (trailing index sync)

4ad5b1f

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

AdaWorldAPI merged commit 3c95f32 into main May 31, 2026
7 checks passed

AdaWorldAPI mentioned this pull request Jun 3, 2026

docs: cluster asymmetry — capacity-forced vs availability-chosen clustering #453

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs+probe: the agnostic lazy world-spine — addressing vision, locality probe (PASS), markov_soa→AriGraph, EW64-as-AriGraph#444

docs+probe: the agnostic lazy world-spine — addressing vision, locality probe (PASS), markov_soa→AriGraph, EW64-as-AriGraph#444
AdaWorldAPI merged 22 commits into
mainfrom
claude/jolly-cori-clnf9-worldspine

AdaWorldAPI commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The one idea (the docs)

The measurement — probe #1 PASS (the payoff)

The code move — markov_soa → AriGraph, vocabulary-agnostic (SoC)

Findings on the board (the durable record)

Verification

Honest scope

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdaWorldAPI commented May 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading