Skip to content

docs+probe: the agnostic lazy world-spine — addressing vision, locality probe (PASS), markov_soa→AriGraph, EW64-as-AriGraph#444

Merged
AdaWorldAPI merged 22 commits into
mainfrom
claude/jolly-cori-clnf9-worldspine
May 31, 2026
Merged

docs+probe: the agnostic lazy world-spine — addressing vision, locality probe (PASS), markov_soa→AriGraph, EW64-as-AriGraph#444
AdaWorldAPI merged 22 commits into
mainfrom
claude/jolly-cori-clnf9-worldspine

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented May 31, 2026

Copy link
Copy Markdown
Owner

Summary

The consolidated output of a long design session on how to compress Wikidata into a lazy-loading, foveated, address-unified world-spine — converged to a single addressing thesis, validated by a real measurement, with one SoC code-move and an empirical probe shipped. Mostly docs + one runnable probe + one small refactor; no behavioral change to shipped runtime.

The one idea (the docs)

A card stores the surprise; the deck stores the expectation. Meaning = deck ⊗ delta — the free-energy framing, applied to both the key (address) and the value (content). The endgame is inherited nothingness: identity (which one) is 27 bits irreducible; description (what it is) is ~0 bits for the modal class member (inherited from the frozen OGIT deck). A typical entity stores nothing.

  • knowledge/delta-card-addressing-integration-map.md — the converged map (partition-as-address, 27-bit floor with ~0-bit row, sparse radix range-delegation, I/P/B-frames over Lance versioning, RISC compose-not-materialize, frozen-ISA).
  • knowledge/agnostic-lazy-world-spine.md — the tiered substrate (cold Lance ◄─NiblePath─► hot mailbox-SoA ◄── OGIT/DOLCE cache).
  • knowledge/owl-dolce-hhtl-compartments-aerial-fed.md, knowledge/splat-codebook-aerial-wikidata-compression.md — the domain/aerial seams.
  • plans/wikidata-lazy-spine-hydration-v1.md — the 9 D-LWS deliverables for the one missing runtime piece (the NiblePath-keyed tiered hydration manager).

The measurement — probe #1 PASS (the payoff)

crates/jc/examples/ontology_locality_probe.rs (zero-dep, hand-rolled TTL scan, reuses the splat_louvain_modularity machinery) RUN on the real on-disk ontologies (DOLCE-Ultralite, schema.org, Odoo, PROV-O, QUDT, OWL-Time):

LOCALITY    = 98.61%  (1207/1224 subClassOf edges intra-basin)   [claim was "~90%"]
FAN-OUT max = 3       (≤16 ✓; 1121 classes have exactly 1 parent-basin)
MODULARITY  = 0.3246  (>0.3 = clear community structure)
VERDICT     = PASS

⇒ On real frozen-ISA ontology structure, the 16-bit local references + the ≤16 family frontier are real — the "inherited nothingness" addressing is not a hope. Honest caveat (in the probe's own verdict): real ontologies (~10³ classes), NOT Wikidata (~10⁸) — the Wikidata P279 run remains the open probe. Conjecture → FINDING on real ontologies.

The code move — markov_soa → AriGraph, vocabulary-agnostic (SoC)

markov_soa (the Markov wave — a windowed SoA projection) was first authored in deepnsm, which made a core runtime concern depend on a linguistics sensor (layer inversion). Moved to lance-graph::graph::arigraph::markov_soa and made vocabulary-agnostic:

  • SpoRanks { s, p, o: u16 } are opaque — the SoA row carries no language; vocabulary is a late-resolved class property.
  • match takes an injected Fn(u16,u16)->u8 = AriGraph's own cam_pq::DistanceTablesnot a language table. The language layer (DeepNSM/COCA/grammar) stays strictly upstream (it emits SPO into AriGraph; injecting COCA into the hot graph would be the "GoBD-with-Rumi" error).
  • markov_soa IS AriGraph — the cold-path Markov chain promoted to the hot-path SoA (the wave); EW64/the CE64 W-slot→witness arc is the particle.

contract::soa_view::MailboxSoaView gains a doc note: EpisodicWitness64 = AriGraph in the mailbox SoA view (deferred accessor, qualia-pattern; EW64 is a queued design, not yet a code symbol).

Findings on the board (the durable record)

Verification

  • cargo test --manifest-path crates/jc/Cargo.toml60/60 (probe file clippy-clean; pre-existing jc lints in other files untouched, out of scope).
  • cargo test -p lance-graph-contract503/503 (the EW64 doc note + soa_view).
  • cargo test --manifest-path crates/deepnsm/Cargo.toml → green after the markov_soa removal.
  • ⚠️ lance-graph::graph::arigraph::markov_soa is unverified-offlinelance-graph core's lance/datafusion/arrow deps don't fetch in the sandbox; the module + its 4 tests are authored against the grounded MailboxSoaView surface but need a full-checkout compile-verify. Flagged STATUS: provisional in the module header. Its truly-correct final home is inside the EW64-in-SoA seam (P1/P2).

Honest scope

This is vision + measurement + one disciplined refactor, not a feature. The hydration manager (D-LWS), the EW64 type (P1/P2), the Wikidata-scale probe, and the markov_soa core-verify are all queued/gated, clearly labelled CONJECTURE-with-probe where they're unproven. The shipped, verified pieces are: the locality probe (PASS), the contract doc note, the deepnsm cleanup, and the board record.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7


Generated by Claude Code

Summary by CodeRabbit

  • Documentation
    • Added extensive architecture and design docs for a lazy “world-spine” hydration, delta-card addressing, plans, epiphanies and status updates describing tiered cold/hot/semantic layers and next steps.
  • New Features
    • Added an ontology locality probe example to measure partition locality on real ontologies.
    • Added vocabulary-agnostic wave-based similarity projections for AriGraph to support SoA-window comparisons and provenance-aware matching.

claude added 21 commits May 31, 2026 12:08
…tiered substrate

Capstone north-star: one NiblePath address unifies ontology position = memory
arena = (leaf) spatial coordinate. Tiering — COLD Lance columnar ◄─NiblePath─►
HOT mailbox-SoA (agnostic bytes) ◄── SEMANTIC OGIT/DOLCE cache (C2 resolve-not-
store). Reframings: (1) the cold path SPLITS — DataFusion rows/cols joins are
SLOW, business-SQL ground-truth ONLY, off the hot path; HHTL hydration is
address-based (NiblePath → CAM/palette/blasgraph, O(1)), not join-based.
(2) DOLCE continuant/occurrent = a 1-bit permanent/temporary residence policy.
(3) AriGraph SPO + labels → agnostic SoA + late labels (C2 wholesale).

Markov = the CausalEdge64 W-slot → WitnessTable/EpisodicWitness64 arc (NOT the
16384 VSA bundle, which is retired legacy / discovery-layer only). Reasoning =
traversing the CE64→EW64 arc + SPO, no embedding/forward-pass. Reading a text =
accumulating SPO mailboxes + their causal-edge/witness arc; ambiguity resolved by
counterfactual testing (recipe_kernels world⊗factual⊗counterfactual, popcount). A
250-page book ≈ 4-5k sentences ≈ ~4096 SPO mailboxes = one per-cohort
WitnessTable<64> cohort. The resident agnostic row ~4096 bits (address carries
class+label inheritance). Address: byte-aligned 256^4 = 2^32 ~ 4.3B — the 4-byte
CAM-PQ code IS the address = class+label key = palette-distance key.

Built vs new vs conjecture mapped; invariants (CAM-exact, similarity-only-in-
discovery, SoA stays agnostic) recorded. The one missing runtime piece: a
NiblePath-keyed tiered hydration manager.

- knowledge/agnostic-lazy-world-spine.md (the north-star)
- EPIPHANIES: the world-spine FINDING

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…rise, deck=expectation)

Consolidates the 8-turn addressing design from the end (cookbook/delta-card)
back through the full chain. The one idea: a card stores the surprise, the deck
stores the expectation; meaning = deck ⊗ delta — the free-energy framing (prior
+ prediction-error), applied to BOTH the key (address) and the value (content).

- Cookbook (value side): recipe = inherited(region×season×persona) + 8-16 delta
  bits; boundary = generator-vs-derivable.
- Addressing (key side): partition-as-address / schema-as-deck (Quartettkarten);
  27-bit truthful floor with ~0-bit row; sparse radix range-delegation (no 256^4
  files); frozen ISA = compiled perfect hash, no rebalance, version-gated upgrade.
- Frame model (x264/265): I=frozen radix+compacted base fragment, P=appended+
  CLAM-clustered delta, B=RISC compose-cache, GOP compaction = amortized upgrade
  = where similarity freezes to structure. IS Lance fragment-versioning.
- RISC compose-not-materialize: store generators, derive <=7-hop closure via
  ComposeTable/mxm; dissolves the hub problem; per-predicate composability flag.
- Two trees: frozen ontology radix = address (exact); CLAM/CHESS = proposes the
  partition (similarity, discovery-only). Adaptive proposes, frozen ships.
- Scale identities: 6-bit cohort ⊂ 16-bit book ⊂ 18-bit hot envelope(256K) ⊂
  32-bit world(cold). Reasoning = CE64→EW64 arc, not the 16384 VSA bundle.
- 3 probes (Louvain/CLAM locality; delta-card residual; compose hit-rate).

New: knowledge/delta-card-addressing-integration-map.md; EPIPHANIES capstone;
cross-link from agnostic-lazy-world-spine.md (which it supersedes for addressing).

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
The thesis the whole map reaches for: split identity (which one = 27 bits
irreducible, radix trie, path-compressed) from description (what it is = ~0 bits
for the modal class member, inherited whole from the frozen OGIT deck). A typical
entity stores nothing — it inherits everything; only the surprising one pays. The
spine's price is paid ONCE by the frozen ontology, amortized to nothing per
entity. Absence is not missing data; absence IS the inheritance.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Module doc states honest scope: real ontology subClassOf graphs from
data/ontologies, NOT full Wikidata. Parser tracks current subject +
predicate, strips string literals/comments, skips blank-node OWL
restrictions, emits (child,parent) named-IRI edges only.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
NiblePath-keyed tiered hydration plan W1. Verified-symbols table +
EpisodicWitness64/Lance-fragment risk flags + D-LWS-1..9 index.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
… + board-hygiene

Completes wikidata-lazy-spine-hydration-v1: prefetch cascade, DOLCE-1bit
eviction, probe harness (produces P1/P2/P3 gates), deferred 115M load,
per-crate firewall contract, 7-risk register (R1 EpisodicWitness64 absent,
R2 Lance fragment APIs not wired, R3 CLAM is probe-not-clusterer).

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
… into the SoA per ractor-mailbox

Two corrections (user, mid-wave; W1 drift-audit also flagged the symbol):
- There is NO VSA in this design. Drop the '16384-bit VSA bundle (retired
  legacy)' framing entirely — reasoning is a native CE64 W-slot → EpisodicWitness
  arc + SPO graph walk, no fingerprint bundling. The discovery layer (aerial/
  splat) uses a transient palette256/CAM-PQ distance, never a bundle.
- EpisodicWitness64 is NOT a phantom and NOT shipped-as-named: it is the NEW
  AriGraph, migrated INTO the SoA per ractor-mailbox (cohort-local episodic
  memory as a SoA column). Shipped seed = WitnessTable<64> + WitnessEntry (6-bit
  W-slot); EpisodicWitness64's 64-bit layout (incl. the 16-bit book tier) is the
  design surface to settle. Relabelled NEW build target throughout + Status note.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…s, prefetch=Meta)

The shock named: every link shipped, the chain open at the joints.
- Layering corrected: Markov (CE64 W-slot → EW64 arc) is the BASIS; predictive-
  prefetch is the META on top — the prefetch IS the wiring IS the learning
  (Hebbian: aerial 'fire together' offline → EW64 'wire together' online).
- Reactive spine (keystone): Lance update = witness pointer = SurrealDB kanban
  subscription trigger — one event propagating through the storage layer as the
  prefetch signal (why EW64 shares CE64 low-40, why kanban is in contract).
- Diagnosis: island-archipelago — EpisodicWitness64/SpoWitness64 (pr-ce64-mb-4)
  = 0 code symbols; HotWitness = todo!() scaffold; Lance→Surreal→kanban
  subscription unwired. EW64 is the SEAM, not a type. Invisible in green suites.
- Queued (second wave, post-probe-consolidation): one whole-seam spec.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…OCA+CAM-PQ, no cosine)

The 'meet halfway' on VSA: turn the black-box bundle into an explicit,
deterministic projection of the mailbox-SoA window into its COCA-rank SPO
triplets + full provenance (which rows, at what proximity). The triplets stay
ADDRESSABLE — no superposition destroys the register.

Match is DeepNSM's OWN machinery, NOT float cosine: COCA-4096 vocabulary +
the CAM-PQ 4096² u8 word-distance matrix via SimilarityTable::lookup_u8 +
proximity prior. best_guess_match = nearest-triplet CAM-PQ similarity, averaged.
Strictly a fuzzy proposer (cognitive priming): proposes where-to-look / what-it-
resembles ('feels like a Sicilian'), never asserts; exact 32k SPO-W confirms.

Consumes contract::soa_view::MailboxSoaView through the EXISTING hard dep — zero
new dependency, firewall preserved (no dep on the heavy cognitive-shader-driver
that implements the view).

Verified: 5 markov_soa tests green (incl. best_guess_match_uses_cam_pq_not_cosine,
determinism, edge-clamp, skip-untripled, empty=0); full deepnsm suite 94/4/8/1
no regressions; clippy clean in markov_soa (pre-existing lints in other files
untouched, out of scope).

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…e; VSA = fuzzy proposer (priming), not cosine

Supersedes my two earlier mis-framings in-place (board hygiene: don't leave wrong
findings standing):
- (a) 'VSA = per-cycle experience/soul-print vector' — wrong scope.
- (b) 'keep DeepNSM as a parallel universe' — DeepNSM migrates too.

Converged finding: the explicit 32k SPO-W is the substrate (addressable, lossless,
reasoning-capable, provenance-bearing — categorically > any bundle; ~32-item
recovery capacity vs 32k = 1000x over). VSA16k's legitimate role = a strictly-
fuzzy proposer / cognitive priming, firewall-gated to discovery; match via COCA +
CAM-PQ SimilarityTable, NOT cosine. Records the markov_soa.rs artifact (e0a5049),
the aerial within/cross-cohort synergy + the queued CodebookDistance adapter D-id,
and the CLAUDE.md reconciliation note.

Also: crates/deepnsm/Cargo.lock regen from the markov_soa build (benign).

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
One word, three ranked uses; the deterministic CE64→EW64 chain is the line:
1) context-chain building = mailbox chaining through the CE64 W-slot →
   EpisodicWitness64 arc (deterministic, exact, addressable = THE substrate).
2) hybrid+ autocomplete = #1's chain + a fuzzy accumulated witness-bundle as
   speculative autocomplete, leashed to the chain that confirms it (= markov_soa
   + the grail-fold experiment). Invariant: unleashed, #2 degrades into #3.
3) sink-in-and-pray = old VSA-bundle-as-Markov, ceiling-bound, ungrounded — the
   black box (deprecated; the 'every GGUF would already be VSA' disproof).

The line: #1 is the chain; #2 is the chain plus a guess it must confirm; #3 is
the guess without a chain. Gate before grail: P1 AriGraph→SoA (HotWitness
D-ATOM-5 todo!()s) → P2 EW64 in MailboxSoaView (qualia-pattern accessor) → P3
the grail-fold experiment (CONJECTURE, gated, Jirak-baselined, downstream of the
EW64 seam — no scope creep).

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…c (delete deepnsm copy)

markov_soa is the Markov WAVE; EW64/the CE64 W-slot→witness arc is the PARTICLE.
Complementary → same home. It was wrongly in deepnsm (core concern depending on a
linguistics sensor = layer inversion). Moved to
crates/lance-graph/src/graph/arigraph/markov_soa.rs.

SoC deeper step: the SoA SPO row is three OPAQUE u16 ranks — vocabulary is a
late-resolved CLASS property, never a SoA fact (C2 / I-VSA-IDENTITIES, applied to
the triplet encoding). SPO CAN be COCA (good for input parsing) but the
SoA/AriGraph mailbox-view must NOT be forced into COCA. The projector takes an
injected Fn(u16,u16)->u8 distance — caller supplies AriGraph's cam_pq
DistanceTables OR DeepNSM's COCA table. Reuse-by-injection; core has 0 deepnsm
dep (the dep graph enforces agnosticism).

- AriGraph: SpoRanks{s,p,o:u16} opaque + SoaWavePrimer + WaveProjection (4 tests).
- Deleted crates/deepnsm/src/markov_soa.rs (sole ref was its own mod decl);
  deepnsm still 89/4/8/1 green after removal.
- STATUS: AriGraph version unverified-offline (lance-graph core's
  lance/datafusion/arrow deps don't fetch in the sandbox) — verify on full checkout.
- EPIPHANIES: the SoC + vocabulary-agnostic finding.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…); language stays upstream in DeepNSM

markov_soa is NOT a generic projector that takes a COCA lens — it IS AriGraph,
the cold-path Markov chain promoted to the hot-path SoA. AriGraph is agnostic and
NOT necessarily English (holds business/GoBD/Wikidata/text SPO).

The match metric is AriGraph's OWN cam_pq::DistanceTables, NOT a language table.
The language layer (DeepNSM/COCA-4096/grammar templates) stays STRICTLY upstream:
it scans flat data (usually English), parses, and EMITS SPO into AriGraph — and
MUST stay English (grammar templates get messy otherwise). Injecting a COCA
distance into the hot graph would be the GoBD-with-Rumi error (a language lens
over an agnostic graph). Removed the wrong 'or DeepNSM COCA table' injection
alternative from both the module doc and the EPIPHANIES finding.

(also captures the EPIPHANIES SoC finding that the prior commit's bad pathspec
dropped from the index.)

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Trailing index sync — the pub mod removal from the move (9a5f54c) re-surfaced
after a linter touch. deepnsm no longer declares markov_soa; it lives in AriGraph.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…view

Add a deferred-accessor note on MailboxSoaView (beside the qualia one): the
future EpisodicWitness64 column IS AriGraph promoted into the mailbox SoA view —
the cold-path episodic Markov chain (arigraph::{episodic,witness_corpus}) as a
hot-path per-row column = the CausalEdge64 W-slot → witness arc (Markov #1, the
particle; markov_soa is the wave). EW64 is not yet a code symbol (queued, see
E-EW64-IS-PREDICTIVE-PREFETCH); shipped seeds are the W-slot + WitnessTable<64> +
arigraph episodic. Stays agnostic — language (DeepNSM/COCA) stays upstream.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
… Q=0.325)

The empirical falsifier for the delta-card / inherited-nothingness addressing
claim (probe #1 of the integration map), harvested from the W2 wave worker and
RUN on the real on-disk ontologies (DOLCE-Ultralite, schema.org, Odoo, PROV-O,
QUDT, OWL-Time).

Measured (1170 classes, 1224 subClassOf edges, 33 top-basins):
- LOCALITY = 98.61% intra-basin (the '~90% local' claim survives + exceeds)
- FAN-OUT max = 3 (<=16 ✓; 1121 classes have exactly 1 parent-basin)
- MODULARITY Q = 0.3246 (>0.3 = clear community structure)

VERDICT: PASS — on REAL frozen-ISA ontology structure, 16-bit local references +
the <=16 family frontier are real. HONEST CAVEAT (in the probe verdict): real
ontologies ~10^3 classes, NOT Wikidata ~10^8; the Wikidata P279 run stays the
open probe. Conjecture → FINDING on real ontologies.

zero-dep jc (hand-rolled TTL scan, reuses splat_louvain_modularity machinery);
60/60 jc tests green; probe file clippy-clean (pre-existing jc lints in other
files untouched). EPIPHANIES: the measured-result FINDING.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…wave entry

STATUS_BOARD: the 9 D-LWS hydration-manager rows (D-LWS-8 probe-1 SHIPPED:
locality 98.6%/fan-out 3/Q=0.325 PASS), + D-MKV-SOA + D-EW64-NOTE rows.
AGENT_LOG: the world-spine vision + W1/W2 wave + markov_soa SoC + EW64-as-
AriGraph + probe-result session entry.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
@coderabbitai

coderabbitai Bot commented May 31, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9056ef47-af8d-4bad-aff1-a534ac9a4008

📥 Commits

Reviewing files that changed from the base of the PR and between 3e860b0 and 5c652f4.

📒 Files selected for processing (1)
  • crates/lance-graph/src/graph/arigraph/markov_soa.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/lance-graph/src/graph/arigraph/markov_soa.rs

📝 Walkthrough

Walkthrough

This PR documents a lazy world-spine hydration design, adds an empirical ontology-locality probe example, and implements vocabulary-agnostic SoA wave projections for AriGraph with supporting docs and tests.

Changes

Lazy world-spine architecture with empirical validation and runtime support

Layer / File(s) Summary
Architecture vision and design decisions
.claude/board/AGENT_LOG.md, .claude/board/EPIPHANIES.md, .claude/board/STATUS_BOARD.md, .claude/knowledge/agnostic-lazy-world-spine.md, .claude/knowledge/delta-card-addressing-integration-map.md, .claude/plans/wikidata-lazy-spine-hydration-v1.md
Board findings and epiphanies (2026-05-31) document the lazy world-spine vision: unified NiblePath addressing, tiered hydration (cold/hot/semantic), delta-card framing, three-Markovs taxonomy, substrate decisions (~32k SPO-W truth path vs VSA16k proposer), EW64 witness seam notes, and a D-LWS implementation plan with probes/gates.
Ontology locality probe — empirical validation
crates/jc/Cargo.toml, crates/jc/examples/ontology_locality_probe.rs
New Cargo example and Rust program that parses TTL subClassOf edges, interns class IRIs (ClassGraph), assigns deterministic basins, computes edge locality %, per-class fan-out, and Newman modularity Q, and emits a Pass/Marginal/Fail verdict. Includes recursive loader and unit tests for parsing, metrics, verdicts, and cycles.
AriGraph markov_soa module — runtime wave projections
crates/lance-graph/src/graph/arigraph/markov_soa.rs, crates/lance-graph/src/graph/arigraph/mod.rs, crates/lance-graph-contract/src/soa_view.rs
Adds vocabulary-agnostic SoA-window wave projection types and operations: SpoRanks, RowContribution, BundleProvenance, WaveProjection::best_guess_match (injectable per-role distance), and SoaWavePrimer::project (±radius MailboxSoaView folding). Includes unit tests and a MailboxSoaView doc comment describing a future EpisodicWitness64 accessor.

Sequence Diagram(s)

sequenceDiagram
  participant TTL as TTL Input
  participant Parser as parse_subclass_edges
  participant Graph as ClassGraph
  participant Basin as assign_basins
  participant Metrics as locality/fan_out/modularity_q
  participant Verdict as verdict
  TTL->>Parser: feed lines (strip strings/comments)
  Parser->>Graph: emit (child,parent) edges
  Graph->>Basin: assign deterministic basin roots
  Basin->>Metrics: compute edge locality, fan-out histogram, Q
  Metrics->>Verdict: evaluate thresholds -> Pass/Marginal/Fail
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • AdaWorldAPI/lance-graph#437: Adds a deferred EpisodicWitness64-related MailboxSoaView doc comment that relates to the MailboxSoaView contract referenced here.
  • AdaWorldAPI/lance-graph#434: Overlaps on board/epiphany docs and Markov/witness substrate discussions; this PR extends with code for markov_soa and the locality probe.

Poem

A rabbit scrawls in margin light,
Basin roots hum through day and night,
Probes and waves in tiny hops,
Markov beats and mailbox stops—
Build the spine, then let truth bite. 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main components of the changeset: documentation of an agnostic lazy world-spine architecture, a locality probe demonstrating it passes validation criteria, and a refactoring that moves markov_soa into AriGraph as vocabulary-agnostic code with EpisodicWitness64 framing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/jolly-cori-clnf9-worldspine

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e860b06ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +153 to +155
if c == '#' {
// Rest of line is a comment.
break;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve fragments inside angle-bracket IRIs

For valid Turtle that uses full IRIs such as <http://example.org#Child> rdfs:subClassOf <http://example.org#Parent> ., this treats the # inside the IRI as the start of a comment before tokenization. The parser then truncates both class names (and normalize_iri still accepts tokens beginning with <), so different fragment IRIs in the same namespace can collapse into the same malformed class key and corrupt the locality/fan-out measurements.

Useful? React with 👍 / 👎.

Comment on lines +331 to +336
if predicate_is_subclass {
if let (Some(child), Some(parent)) =
(current_subject.clone(), normalize_iri(tok))
{
if child != parent {
edges.push((child, parent));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop carrying subClassOf past delimited objects

When a valid Turtle object is written with attached punctuation, e.g. rdfs:subClassOf ex:Parent; or ex:Parent ., normalize_iri(tok) strips the delimiter and emits the parent, but predicate_is_subclass is left true. The next predicate token on the same line or on an indented continuation line can then be parsed as another superclass, adding bogus edges and skewing the probe's verdict; the object delimiter needs to reset/end the active predicate after the edge is emitted.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/plans/wikidata-lazy-spine-hydration-v1.md:
- Around line 135-147: The comment clarifies that the P1 "fan-out max=3" metric
in ontology_locality_probe.rs measures the number of DISTINCT top-basins among a
class’s direct subClassOf parents (i.e., distinct parent-basin count), not the
designed branching factor; update the code/comments so the computed fan-out
variable and any accompanying log/message (e.g., the fan-out computation in
ontology_locality_probe.rs and the variables locality and max_fanout) explicitly
state they count distinct top-basins of direct parents, and ensure PASS logic
checks both locality >= 0.90 AND max_fanout <= 16; also adjust any text that
conflates this metric with the architectural branching factor to avoid
confusion.

In `@crates/jc/examples/ontology_locality_probe.rs`:
- Around line 383-388: The current collection of subclass edges pushes every
(child,parent) pair into edges (built via intern, names, id_of) and allows
duplicates which skews locality/modularity; before using edges for metric
computation, deduplicate the Vec<(usize,usize)> (or switch to a HashSet) so only
unique (ci,pi) pairs are retained; update the same deduplication logic in the
other blocks that build edges around the later sections (the ones starting near
the other ranges) so repeated triples are eliminated prior to computing metrics.
- Around line 209-654: The file exposes core probe logic as free functions
(parse_subclass_edges, locality, fan_out, modularity_q, verdict, load_dir) which
violates the carrier-pattern rule; refactor by introducing a carrier struct
(e.g., Probe or OntologyProbe) that holds probe state (edges, files, maybe
config) and convert those free functions into inherent methods (e.g.,
Probe::parse_subclass_edges, Probe::locality, Probe::fan_out,
Probe::modularity_q, Probe::verdict, Probe::load_dir) updating any call sites to
use method calls on the carrier instance and moving any related state (edges,
files, basin, graph) into the struct so methods operate on &self / &mut self
rather than standalone parameters. Ensure signatures, visibility, and tests are
adjusted accordingly and that load_dir populates the carrier's fields instead of
returning raw tuples.

In `@crates/lance-graph/src/graph/arigraph/markov_soa.rs`:
- Around line 178-182: In SoaWavePrimer::project the window math currently casts
self.radius (u32) and focal_row (usize) to i32 via "as", which can truncate;
replace those lossy casts by using checked/widened conversions (e.g., use
i64::try_from(self.radius) or isize::try_from(focal_row) / i32::try_from where
appropriate) and propagate or handle conversion errors (return early or clamp)
before computing r and row_i; update the loop that uses r, row_i and bounds
checks with the new safe types and keep references to class_ids = soa.class_id()
and n for bounds logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: dda7d8cd-b3f0-4111-9bed-f94764e8b4db

📥 Commits

Reviewing files that changed from the base of the PR and between 67534a3 and 3e860b0.

⛔ Files ignored due to path filters (1)
  • crates/deepnsm/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • .claude/board/AGENT_LOG.md
  • .claude/board/EPIPHANIES.md
  • .claude/board/STATUS_BOARD.md
  • .claude/knowledge/agnostic-lazy-world-spine.md
  • .claude/knowledge/delta-card-addressing-integration-map.md
  • .claude/plans/wikidata-lazy-spine-hydration-v1.md
  • crates/jc/Cargo.toml
  • crates/jc/examples/ontology_locality_probe.rs
  • crates/lance-graph-contract/src/soa_view.rs
  • crates/lance-graph/src/graph/arigraph/markov_soa.rs
  • crates/lance-graph/src/graph/arigraph/mod.rs

Comment on lines +135 to +147
### Gate P1 — Partition locality (CONJECTURE → must measure)
- **Driver:** `jc/examples/splat_louvain_modularity.rs` (Louvain modularity =
popcount-AND over `contract::splat::AwarenessPlane16K` planes) +
`neighborhood::clam::measure_cluster_radii` on the real P279/subClassOf +
edge graph derived from `data/ontologies/*.ttl` (e.g. the FIBO or
schema.org subtree; biology subtree once Wikidata lands).
- **Pass:** high modularity ⇒ ≥~90% of edges are intra-cohort ⇒ 16-bit
intra-cohort references + the family frontier are real, and the natural
fan-out (the 4/12/16 split) is observed, not assumed.
- **Gates:** D-LWS-1 fan-out choice; D-LWS-4 GOP P-frame placement; D-LWS-5
cohort residency.
- **Honest status:** `clam.rs` header literally says the radii-coincide-with-
ontology-boundaries claim "is a TEST, not a fact." Treat as **CONJECTURE**.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check what the locality probe actually measures as "fan-out"

# Look for the probe implementation to see what metrics it computes
rg -n -A5 -B5 "fan.?out|FAN.?OUT" crates/jc/examples/ontology_locality_probe.rs

# Also check if there's any documentation of the probe's output format
rg -n "max.*fan|modularity|locality.*percent" crates/jc/examples/ontology_locality_probe.rs

Repository: AdaWorldAPI/lance-graph

Length of output: 9054


Clarify that P1’s “fan-out max=3” measures distinct parent-basin count per class, not the designed branching factor.

  • In ontology_locality_probe.rs, fan-out is computed as the number of DISTINCT top-basins among each class’s direct subClassOf parents, and PASS is locality >= 0.90 and max_fanout <= 16.
  • So FAN-OUT max = 3 is consistent with the “16-frontier” cap; it just indicates the observed worst-case distinct parent-basin count is 3 (and doesn’t contradict the architectural 4/12/16 vs 16-way split unless prose conflates the metrics).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/plans/wikidata-lazy-spine-hydration-v1.md around lines 135 - 147,
The comment clarifies that the P1 "fan-out max=3" metric in
ontology_locality_probe.rs measures the number of DISTINCT top-basins among a
class’s direct subClassOf parents (i.e., distinct parent-basin count), not the
designed branching factor; update the code/comments so the computed fan-out
variable and any accompanying log/message (e.g., the fan-out computation in
ontology_locality_probe.rs and the variables locality and max_fanout) explicitly
state they count distinct top-basins of direct parents, and ensure PASS logic
checks both locality >= 0.90 AND max_fanout <= 16; also adjust any text that
conflates this metric with the architectural branching factor to avoid
confusion.

Comment on lines +209 to +654
pub fn parse_subclass_edges(ttl: &str) -> Vec<(String, String)> {
const SUBCLASS: &str = "subClassOf"; // matches rdfs:subClassOf AND bare subClassOf
let mut edges: Vec<(String, String)> = Vec::new();
let mut current_subject: Option<String> = None;
let mut predicate_is_subclass = false;
let mut in_long_string = false;
// Depth of nested `[ ... ]` blank-node restrictions. While > 0 we are
// INSIDE an anonymous OWL restriction and emit no edges; the restriction
// spans multiple physical lines, so this persists across the line loop.
let mut bracket_depth: i32 = 0;

for raw_line in ttl.lines() {
let line = strip_strings_and_comments(raw_line, &mut in_long_string);
let leading_ws = raw_line.starts_with(char::is_whitespace);

// Split into whitespace tokens (Turtle is whitespace-delimited at this
// granularity; we already stripped strings/comments).
let toks: Vec<&str> = line.split_whitespace().collect();
if toks.is_empty() {
// A blank physical line does not by itself end a statement.
continue;
}

let mut idx = 0;

// A statement that begins flush-left (no leading whitespace) and whose
// first token is a named IRI / blank starts a NEW subject — UNLESS the
// line is a pure object-list continuation beginning with ',' (handled
// below) or a directive (@prefix / @base / PREFIX / BASE).
let first = toks[0];
let is_directive = first.starts_with('@')
|| first.eq_ignore_ascii_case("prefix")
|| first.eq_ignore_ascii_case("base");
if is_directive {
// Directives don't carry subjects or edges; but a directive still
// can be terminated by '.', which must not clobber subject state of
// a real statement (directives are always flush-left & self
// contained), so just skip the whole line.
continue;
}

if bracket_depth == 0
&& !leading_ws
&& first != ","
&& first != ";"
&& !first.starts_with('[')
{
// New subject candidate (only when not inside a blank node).
if let Some(subj) = normalize_iri(first) {
current_subject = Some(subj);
} else {
current_subject = None;
}
predicate_is_subclass = false;
idx = 1;
}

// Walk remaining tokens, tracking predicate switches and emitting
// edges while the active predicate is subClassOf AND we are at
// bracket depth 0 (outside any anonymous restriction).
while idx < toks.len() {
let tok = toks[idx];

// Update bracket depth from any '[' / ']' characters in the token,
// then move on if the token is pure bracket punctuation. A '['
// opening means the CURRENT subClassOf object is an anonymous
// restriction; we suppress emission until the matching ']' but
// stay in subClassOf predicate mode so a following ',' continues
// the OUTER object list.
let opens = tok.matches('[').count() as i32;
let closes = tok.matches(']').count() as i32;
if opens > 0 || closes > 0 {
bracket_depth += opens - closes;
if bracket_depth < 0 {
bracket_depth = 0;
}
// If the token is only brackets (possibly with ',' / ';'),
// there is nothing else to interpret on it.
let stripped: String = tok
.chars()
.filter(|&c| c != '[' && c != ']' && c != ',' && c != ';')
.collect();
if stripped.is_empty() {
idx += 1;
continue;
}
}

// Anything inside a blank node is ignored entirely.
if bracket_depth > 0 {
idx += 1;
continue;
}

// Object-list continuation: ',' keeps the current predicate.
if tok == "," {
idx += 1;
continue;
}
// ';' ends the current predicate's object list (a new predicate
// follows on this or a later line).
if tok == ";" {
predicate_is_subclass = false;
idx += 1;
continue;
}
// '.' terminates the whole statement → no active subject.
if tok.starts_with('.') && tok.len() == 1 {
current_subject = None;
predicate_is_subclass = false;
idx += 1;
continue;
}

// Predicate detection: rdfs:subClassOf or bare subClassOf.
let bare = tok.trim_end_matches([';', ',']);
if bare == SUBCLASS || bare.ends_with(":subClassOf") || bare == "rdfs:subClassOf" {
predicate_is_subclass = true;
idx += 1;
continue;
}
// In subClassOf object position: emit a named-IRI edge.
if predicate_is_subclass {
if let (Some(child), Some(parent)) =
(current_subject.clone(), normalize_iri(tok))
{
if child != parent {
edges.push((child, parent));
}
}
idx += 1;
continue;
}

// Not in subClassOf mode: a token like `a`, `rdf:type`,
// `owl:disjointWith`, `rdfs:label` is a (non-subclass) predicate;
// it just resets predicate state. We do not need its objects.
if bare == "a" || bare.contains(':') {
predicate_is_subclass = false;
}
idx += 1;
}
}
edges
}

// ── class graph: intern IRIs, build parent adjacency, assign top-basins ─────

/// Interned subClassOf DAG over class IRIs.
pub struct ClassGraph {
/// id -> IRI key (for printing).
pub names: Vec<String>,
/// Direct parents of each class (deduplicated, sorted).
pub parents: Vec<Vec<usize>>,
/// All edges as interned (child, parent) id pairs.
pub edges: Vec<(usize, usize)>,
}

impl ClassGraph {
/// Build from `(child, parent)` IRI-key edges. Every IRI appearing in any
/// position becomes a node (a parent that is never a child is a root).
pub fn from_edges(iri_edges: &[(String, String)]) -> Self {
let mut id_of: BTreeMap<String, usize> = BTreeMap::new();
let mut names: Vec<String> = Vec::new();
let intern = |s: &str, names: &mut Vec<String>, id_of: &mut BTreeMap<String, usize>| {
if let Some(&id) = id_of.get(s) {
id
} else {
let id = names.len();
names.push(s.to_string());
id_of.insert(s.to_string(), id);
id
}
};
let mut edges: Vec<(usize, usize)> = Vec::new();
for (c, p) in iri_edges {
let ci = intern(c, &mut names, &mut id_of);
let pi = intern(p, &mut names, &mut id_of);
edges.push((ci, pi));
}
let n = names.len();
let mut parents: Vec<Vec<usize>> = vec![Vec::new(); n];
for &(c, p) in &edges {
parents[c].push(p);
}
for ps in parents.iter_mut() {
ps.sort_unstable();
ps.dedup();
}
Self { names, parents, edges }
}

pub fn n_classes(&self) -> usize {
self.names.len()
}

/// Assign each class to its top-basin = the root ancestor reached by
/// walking parents upward. Multi-parent: follow the parent with the
/// SMALLEST interned id (deterministic representative). Cycles: broken by
/// a visited-set; the entry node of a cycle becomes its own basin.
/// Returns `basin[id] = root_id`.
pub fn assign_basins(&self) -> Vec<usize> {
let n = self.n_classes();
let mut basin = vec![usize::MAX; n];
for start in 0..n {
if basin[start] != usize::MAX {
continue;
}
// Walk up to a root, recording the path; memoize on the way back.
let mut path: Vec<usize> = Vec::new();
let mut visiting: BTreeSet<usize> = BTreeSet::new();
let mut cur = start;
let root;
loop {
if let Some(&memo) = basin.get(cur) {
if memo != usize::MAX {
root = memo;
break;
}
}
if visiting.contains(&cur) {
// Cycle: treat `cur` as the basin root for this SCC entry.
root = cur;
break;
}
visiting.insert(cur);
path.push(cur);
// Pick the smallest-id parent (deterministic). No parent → root.
match self.parents[cur].iter().min() {
Some(&p) => cur = p,
None => {
root = cur;
break;
}
}
}
for id in path {
basin[id] = root;
}
if basin[start] == usize::MAX {
basin[start] = root;
}
}
basin
}
}

// ── metric 1: locality ──────────────────────────────────────────────────────

/// Fraction of edges whose child and parent share a top-basin.
/// Returns (local_edges, total_edges, fraction). Empty graph → fraction 0.
pub fn locality(edges: &[(usize, usize)], basin: &[usize]) -> (usize, usize, f64) {
let total = edges.len();
if total == 0 {
return (0, 0, 0.0);
}
let local = edges
.iter()
.filter(|&&(c, p)| basin[c] == basin[p])
.count();
(local, total, local as f64 / total as f64)
}

// ── metric 2: fan-out (distinct parent-basins per class) ────────────────────

/// Per-class count of DISTINCT parent-basins among its direct subClassOf
/// parents. Returns (max_fanout, histogram) where histogram[k] = #classes
/// whose fan-out == k. Classes with no parents contribute fan-out 0.
pub fn fan_out(graph: &ClassGraph, basin: &[usize]) -> (usize, BTreeMap<usize, usize>) {
let mut hist: BTreeMap<usize, usize> = BTreeMap::new();
let mut max_fo = 0usize;
for c in 0..graph.n_classes() {
let distinct: BTreeSet<usize> = graph.parents[c].iter().map(|&p| basin[p]).collect();
let fo = distinct.len();
max_fo = max_fo.max(fo);
*hist.entry(fo).or_insert(0) += 1;
}
(max_fo, hist)
}

// ── metric 3: modularity Q of the basin partition ──────────────────────────
//
// Newman modularity on the UNDIRECTED subClassOf graph (each subClassOf edge
// contributes one undirected link between child and parent):
//
// Q = Σ_c [ e_c / m - (a_c / 2m)^2 ]
//
// where m = |E|, e_c = number of edges fully inside basin c, a_c = sum of
// degrees of nodes in basin c. We reuse the `splat_louvain_modularity.rs`
// idea — the within-community edge mass is a popcount-AND between a node's
// neighbour bitset and the basin-membership bitset — but with dynamically
// sized `Vec<u64>` planes so the probe handles ontologies with thousands of
// classes (the contract's fixed 16,384-bit `AwarenessPlane16K` is too small
// for schema.org). Self-loops are excluded by construction (the parser drops
// `X subClassOf X`).

/// A dynamically sized bitset (the standalone analogue of `AwarenessPlane16K`).
struct BitPlane(Vec<u64>);

impl BitPlane {
fn zero(n_bits: usize) -> Self {
BitPlane(vec![0u64; n_bits.div_ceil(64)])
}
#[inline]
fn set(&mut self, idx: usize) {
self.0[idx / 64] |= 1u64 << (idx % 64);
}
#[inline]
fn and_popcount(&self, other: &BitPlane) -> u32 {
self.0
.iter()
.zip(other.0.iter())
.map(|(a, b)| (a & b).count_ones())
.sum()
}
}

/// Compute Newman modularity Q of the basin partition. Returns Q in
/// [-0.5, 1.0]. Empty graph → 0.0.
pub fn modularity_q(graph: &ClassGraph, basin: &[usize]) -> f64 {
let n = graph.n_classes();
let m = graph.edges.len();
if m == 0 || n == 0 {
return 0.0;
}
let two_m = 2.0 * m as f64;

// Undirected neighbour bitset per node (both directions of each edge).
let mut neigh: Vec<BitPlane> = (0..n).map(|_| BitPlane::zero(n)).collect();
let mut degree = vec![0u32; n];
for &(c, p) in &graph.edges {
neigh[c].set(p);
neigh[p].set(c);
degree[c] += 1;
degree[p] += 1;
}

// Group node ids by basin; build a membership bitset per basin.
let mut members: BTreeMap<usize, Vec<usize>> = BTreeMap::new();
for (id, &b) in basin.iter().enumerate() {
members.entry(b).or_default().push(id);
}

let mut q = 0.0;
for ids in members.values() {
let mut plane = BitPlane::zero(n);
for &id in ids {
plane.set(id);
}
// e_c counted twice (once per endpoint) via Σ_u popcount(neigh[u] AND plane).
let mut e_c_times_two = 0u32;
let mut a_c = 0.0;
for &id in ids {
e_c_times_two += neigh[id].and_popcount(&plane);
a_c += degree[id] as f64;
}
let e_c = e_c_times_two as f64 / 2.0;
q += (e_c / m as f64) - (a_c / two_m).powi(2);
}
q
}

// ── verdict ──────────────────────────────────────────────────────────────────

/// Verdict tier for the locality hypothesis.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Verdict {
/// High locality AND fan-out fits the family frontier.
Pass,
/// Locality decent but borderline, or fan-out near the cap.
Marginal,
/// Locality low — local-pointer assumption does not hold.
Fail,
}

impl Verdict {
pub fn as_str(self) -> &'static str {
match self {
Verdict::Pass => "PASS",
Verdict::Marginal => "MARGINAL",
Verdict::Fail => "FAIL",
}
}
}

/// Decide the verdict from the measured numbers.
///
/// Thresholds (stated, not hand-waved):
/// * locality ≥ 0.90 AND max_fanout ≤ 16 → PASS (the map's claim)
/// * locality ≥ 0.75 (or max_fanout in 17..=32) → MARGINAL
/// * otherwise → FAIL
///
/// The "16" frontier is the design's pencilled cap; max_fanout > 16 means a
/// single class needs more than 16 distinct family pointers, breaking the
/// 4/12/16 split as stated (though a wider frontier byte would still work).
pub fn verdict(locality_frac: f64, max_fanout: usize) -> Verdict {
if locality_frac >= 0.90 && max_fanout <= 16 {
Verdict::Pass
} else if locality_frac >= 0.75 || (max_fanout > 16 && max_fanout <= 32) {
Verdict::Marginal
} else {
Verdict::Fail
}
}

// ── load real ontology TTLs from a directory ────────────────────────────────

/// All parsed `(child, parent)` IRI edges plus the sorted list of TTL files
/// they came from.
type LoadedOntology = (Vec<(String, String)>, Vec<PathBuf>);

/// Recursively collect `*.ttl` files under `dir`, parse subClassOf edges from
/// each, and return (all_edges, sorted_file_list). I/O errors on individual
/// files are skipped with a note to stderr (the probe is best-effort over
/// whatever real ontologies are present).
fn load_dir(dir: &Path) -> std::io::Result<LoadedOntology> {
let mut edges: Vec<(String, String)> = Vec::new();
let mut files: Vec<PathBuf> = Vec::new();
let mut stack = vec![dir.to_path_buf()];
while let Some(d) = stack.pop() {
let rd = match std::fs::read_dir(&d) {
Ok(rd) => rd,
Err(e) => {
eprintln!(" (skip dir {}: {})", d.display(), e);
continue;
}
};
for entry in rd.flatten() {
let path = entry.path();
if path.is_dir() {
stack.push(path);
} else if path.extension().map(|e| e == "ttl").unwrap_or(false) {
match std::fs::read_to_string(&path) {
Ok(text) => {
let mut e = parse_subclass_edges(&text);
edges.append(&mut e);
files.push(path);
}
Err(e) => eprintln!(" (skip {}: {})", path.display(), e),
}
}
}
}
files.sort();
Ok((edges, files))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Move probe operations onto a carrier struct instead of free functions.

Core probe logic is implemented as free functions (parse_subclass_edges, locality, fan_out, modularity_q, verdict, load_dir) rather than methods on a state carrier, which breaks the repository’s Rust carrier-pattern rule.

As per coding guidelines, "**/*.rs: Use only method calls on the carrier struct that holds the state, never free functions. Carrier pattern: trajectory.resolve() instead of resolve(trajectory, config, awareness)"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/jc/examples/ontology_locality_probe.rs` around lines 209 - 654, The
file exposes core probe logic as free functions (parse_subclass_edges, locality,
fan_out, modularity_q, verdict, load_dir) which violates the carrier-pattern
rule; refactor by introducing a carrier struct (e.g., Probe or OntologyProbe)
that holds probe state (edges, files, maybe config) and convert those free
functions into inherent methods (e.g., Probe::parse_subclass_edges,
Probe::locality, Probe::fan_out, Probe::modularity_q, Probe::verdict,
Probe::load_dir) updating any call sites to use method calls on the carrier
instance and moving any related state (edges, files, basin, graph) into the
struct so methods operate on &self / &mut self rather than standalone
parameters. Ensure signatures, visibility, and tests are adjusted accordingly
and that load_dir populates the carrier's fields instead of returning raw
tuples.

Comment on lines +383 to +388
let mut edges: Vec<(usize, usize)> = Vec::new();
for (c, p) in iri_edges {
let ci = intern(c, &mut names, &mut id_of);
let pi = intern(p, &mut names, &mut id_of);
edges.push((ci, pi));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Deduplicate subclass edges before metric computation.

Repeated (child,parent) edges currently count multiple times in locality and modularity, which can skew the reported verdict when ontologies overlap or repeat triples.

💡 Suggested fix
 pub fn from_edges(iri_edges: &[(String, String)]) -> Self {
@@
-        let mut edges: Vec<(usize, usize)> = Vec::new();
+        let mut edges: Vec<(usize, usize)> = Vec::new();
@@
         for (c, p) in iri_edges {
             let ci = intern(c, &mut names, &mut id_of);
             let pi = intern(p, &mut names, &mut id_of);
             edges.push((ci, pi));
         }
+        edges.sort_unstable();
+        edges.dedup();

Also applies to: 460-470, 528-569

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/jc/examples/ontology_locality_probe.rs` around lines 383 - 388, The
current collection of subclass edges pushes every (child,parent) pair into edges
(built via intern, names, id_of) and allows duplicates which skews
locality/modularity; before using edges for metric computation, deduplicate the
Vec<(usize,usize)> (or switch to a HashSet) so only unique (ci,pi) pairs are
retained; update the same deduplication logic in the other blocks that build
edges around the later sections (the ones starting near the other ranges) so
repeated triples are eliminated prior to computing metrics.

Comment on lines +178 to +182
let r = self.radius as i32;
let class_ids = soa.class_id();
for d in -r..=r {
let row_i = focal_row as i32 + d;
if row_i < 0 || row_i as usize >= n {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify all potentially lossy i32 casts in this module.
rg -n -C2 'as i32' crates/lance-graph/src/graph/arigraph/markov_soa.rs

Repository: AdaWorldAPI/lance-graph

Length of output: 397


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs"
wc -l "$FILE"

echo "---- snippet around lines 160-210 ----"
sed -n '150,210p' "$FILE" | nl -ba

echo "---- locate radius field/type ----"
rg -n "radius" "$FILE"

echo "---- locate focal_row type/assignment ----"
rg -n "focal_row" "$FILE"

Repository: AdaWorldAPI/lance-graph

Length of output: 223


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs"
wc -l "$FILE"

echo "---- snippet around lines 160-210 (with line numbers) ----"
sed -n '150,210p' "$FILE" | cat -n

echo "---- locate struct/definitions mentioning radius ----"
rg -n "radius" "$FILE"

echo "---- locate focal_row usage/definition ----"
rg -n "focal_row" "$FILE"

Repository: AdaWorldAPI/lance-graph

Length of output: 3062


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs"

echo "---- other as i32 in file ----"
rg -n "as i32" "$FILE" || true

echo "---- usages of SoaWavePrimer::new ----"
rg -n "SoaWavePrimer::new\\(" crates/lance-graph/src || true

echo "---- usages of radius field ----"
rg -n "\\.radius" crates/lance-graph/src/graph/arigraph || true

Repository: AdaWorldAPI/lance-graph

Length of output: 790


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "SoaWavePrimer" crates/lance-graph/src || true
rg -n "\\.project\\(" crates/lance-graph/src/graph/arigraph/markov_soa.rs || true

# Search whole crate for SoaWavePrimer::project call sites
rg -n "SoaWavePrimer::.*project\\(" crates/lance-graph/src || true
rg -n "SoaWavePrimer.*\\.project\\(" crates/lance-graph/src || true

Repository: AdaWorldAPI/lance-graph

Length of output: 1656


Avoid lossy as i32 casts in SoaWavePrimer::project window math (lines ~178-182).

self.radius: u32 and focal_row: usize are downcast to i32 via as, so large values truncate (and can even flip signs), corrupting the ±window traversal. Use checked/widened conversions (e.g., i32::try_from(...), i64/isize) and handle overflow (early return or clamp).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lance-graph/src/graph/arigraph/markov_soa.rs` around lines 178 - 182,
In SoaWavePrimer::project the window math currently casts self.radius (u32) and
focal_row (usize) to i32 via "as", which can truncate; replace those lossy casts
by using checked/widened conversions (e.g., use i64::try_from(self.radius) or
isize::try_from(focal_row) / i32::try_from where appropriate) and propagate or
handle conversion errors (return early or clamp) before computing r and row_i;
update the loop that uses r, row_i and bounds checks with the new safe types and
keep references to class_ids = soa.class_id() and n for bounds logic.

The `format` CI job runs:
  cargo fmt --manifest-path crates/lance-graph/Cargo.toml -- --check
markov_soa.rs had one-line struct literals + asserts that rustfmt 1.95.0
expands to multi-line. Apply canonical formatting (no logic change);
the exact CI command now passes clean. Other failing-check noise was a
local --all artifact — CI only formats the lance-graph crate.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
@AdaWorldAPI AdaWorldAPI merged commit 3c95f32 into main May 31, 2026
7 checks passed
AdaWorldAPI pushed a commit that referenced this pull request May 31, 2026
Full-breadth integration spec wiring D-MBX kanban contract through
witness commit (D-ATOM-5), surreal LIVE -> Rubicon kanban flip,
ExecTarget backends, head2head two-view superposition in the shader
driver, EW64-Markov Hebbian prefetch, language->SPO landing (D-LWS),
and BindSpace decommission. Grounded against current main (#437/#439/
#444/#445) + two recon passes; flags the two hard blockers (lance-7
witness API, surreal fork dep / OQ-11.6) and the stale-branch caveat.

https://claude.ai/code/session_01PLf95mURCY96TvKBFvSWEQ
AdaWorldAPI pushed a commit that referenced this pull request May 31, 2026
…EW64-1, D-VIEW-1) + plan v1

AriGraph episodic edges, RISC-encoded (the corrected EW64, replacing the earlier
"CE64 lens"/"16-bit pointer" framings):
- episodic_edges::{EpisodicEdges64(u64), EdgeRef} — 4x[4-bit family | 12-bit local].
  family 0 = intra-basin (inherited from HHTL/class_id, ~98.6% per #444);
  1..=15 = cross-family index into the OGIT-class-inherited palette (~1.4%).
  Identities inherited, never on the edge (I-VSA-IDENTITIES); a CAM_PQ facet code.
- view_angle::ViewAngle — 4-bit view-schema selector; the class presence bitmask
  doubles as the attention mask (inherited view-schema, never per-instance semantics).

527 contract lib tests (+11); both files clippy pedantic+nursery clean.

Plan: .claude/plans/episodic-risc-spine-v1.md (3 lifecycle-separated structures:
CAM/OGIT identity, Lance-version pseudo-radix index, CLAM ephemeral KV; bounded-horizon
compression). Finding: EPIPHANIES E-EPISODIC-CLOSURE. CI-gated next (no protoc offline):
D-EW64-2 SoA columns, D-STORY-1 CLAM clusterer, D-STORY-2 session index,
D-STORY-3 palette256/4096 archetypes, D-HORIZON-1 stopping rule.

Board: INTEGRATION_PLANS + LATEST_STATE + STATUS_BOARD + EPIPHANIES + AGENT_LOG.

https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants