docs+probe: the agnostic lazy world-spine — addressing vision, locality probe (PASS), markov_soa→AriGraph, EW64-as-AriGraph#444
Conversation
…tiered substrate Capstone north-star: one NiblePath address unifies ontology position = memory arena = (leaf) spatial coordinate. Tiering — COLD Lance columnar ◄─NiblePath─► HOT mailbox-SoA (agnostic bytes) ◄── SEMANTIC OGIT/DOLCE cache (C2 resolve-not- store). Reframings: (1) the cold path SPLITS — DataFusion rows/cols joins are SLOW, business-SQL ground-truth ONLY, off the hot path; HHTL hydration is address-based (NiblePath → CAM/palette/blasgraph, O(1)), not join-based. (2) DOLCE continuant/occurrent = a 1-bit permanent/temporary residence policy. (3) AriGraph SPO + labels → agnostic SoA + late labels (C2 wholesale). Markov = the CausalEdge64 W-slot → WitnessTable/EpisodicWitness64 arc (NOT the 16384 VSA bundle, which is retired legacy / discovery-layer only). Reasoning = traversing the CE64→EW64 arc + SPO, no embedding/forward-pass. Reading a text = accumulating SPO mailboxes + their causal-edge/witness arc; ambiguity resolved by counterfactual testing (recipe_kernels world⊗factual⊗counterfactual, popcount). A 250-page book ≈ 4-5k sentences ≈ ~4096 SPO mailboxes = one per-cohort WitnessTable<64> cohort. The resident agnostic row ~4096 bits (address carries class+label inheritance). Address: byte-aligned 256^4 = 2^32 ~ 4.3B — the 4-byte CAM-PQ code IS the address = class+label key = palette-distance key. Built vs new vs conjecture mapped; invariants (CAM-exact, similarity-only-in- discovery, SoA stays agnostic) recorded. The one missing runtime piece: a NiblePath-keyed tiered hydration manager. - knowledge/agnostic-lazy-world-spine.md (the north-star) - EPIPHANIES: the world-spine FINDING https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…rise, deck=expectation) Consolidates the 8-turn addressing design from the end (cookbook/delta-card) back through the full chain. The one idea: a card stores the surprise, the deck stores the expectation; meaning = deck ⊗ delta — the free-energy framing (prior + prediction-error), applied to BOTH the key (address) and the value (content). - Cookbook (value side): recipe = inherited(region×season×persona) + 8-16 delta bits; boundary = generator-vs-derivable. - Addressing (key side): partition-as-address / schema-as-deck (Quartettkarten); 27-bit truthful floor with ~0-bit row; sparse radix range-delegation (no 256^4 files); frozen ISA = compiled perfect hash, no rebalance, version-gated upgrade. - Frame model (x264/265): I=frozen radix+compacted base fragment, P=appended+ CLAM-clustered delta, B=RISC compose-cache, GOP compaction = amortized upgrade = where similarity freezes to structure. IS Lance fragment-versioning. - RISC compose-not-materialize: store generators, derive <=7-hop closure via ComposeTable/mxm; dissolves the hub problem; per-predicate composability flag. - Two trees: frozen ontology radix = address (exact); CLAM/CHESS = proposes the partition (similarity, discovery-only). Adaptive proposes, frozen ships. - Scale identities: 6-bit cohort ⊂ 16-bit book ⊂ 18-bit hot envelope(256K) ⊂ 32-bit world(cold). Reasoning = CE64→EW64 arc, not the 16384 VSA bundle. - 3 probes (Louvain/CLAM locality; delta-card residual; compose hit-rate). New: knowledge/delta-card-addressing-integration-map.md; EPIPHANIES capstone; cross-link from agnostic-lazy-world-spine.md (which it supersedes for addressing). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
The thesis the whole map reaches for: split identity (which one = 27 bits irreducible, radix trie, path-compressed) from description (what it is = ~0 bits for the modal class member, inherited whole from the frozen OGIT deck). A typical entity stores nothing — it inherits everything; only the surprising one pays. The spine's price is paid ONCE by the frozen ontology, amortized to nothing per entity. Absence is not missing data; absence IS the inheritance. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Module doc states honest scope: real ontology subClassOf graphs from data/ontologies, NOT full Wikidata. Parser tracks current subject + predicate, strips string literals/comments, skips blank-node OWL restrictions, emits (child,parent) named-IRI edges only. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
NiblePath-keyed tiered hydration plan W1. Verified-symbols table + EpisodicWitness64/Lance-fragment risk flags + D-LWS-1..9 index. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
… + board-hygiene Completes wikidata-lazy-spine-hydration-v1: prefetch cascade, DOLCE-1bit eviction, probe harness (produces P1/P2/P3 gates), deferred 115M load, per-crate firewall contract, 7-risk register (R1 EpisodicWitness64 absent, R2 Lance fragment APIs not wired, R3 CLAM is probe-not-clusterer). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
… into the SoA per ractor-mailbox Two corrections (user, mid-wave; W1 drift-audit also flagged the symbol): - There is NO VSA in this design. Drop the '16384-bit VSA bundle (retired legacy)' framing entirely — reasoning is a native CE64 W-slot → EpisodicWitness arc + SPO graph walk, no fingerprint bundling. The discovery layer (aerial/ splat) uses a transient palette256/CAM-PQ distance, never a bundle. - EpisodicWitness64 is NOT a phantom and NOT shipped-as-named: it is the NEW AriGraph, migrated INTO the SoA per ractor-mailbox (cohort-local episodic memory as a SoA column). Shipped seed = WitnessTable<64> + WitnessEntry (6-bit W-slot); EpisodicWitness64's 64-bit layout (incl. the 16-bit book tier) is the design surface to settle. Relabelled NEW build target throughout + Status note. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…s, prefetch=Meta) The shock named: every link shipped, the chain open at the joints. - Layering corrected: Markov (CE64 W-slot → EW64 arc) is the BASIS; predictive- prefetch is the META on top — the prefetch IS the wiring IS the learning (Hebbian: aerial 'fire together' offline → EW64 'wire together' online). - Reactive spine (keystone): Lance update = witness pointer = SurrealDB kanban subscription trigger — one event propagating through the storage layer as the prefetch signal (why EW64 shares CE64 low-40, why kanban is in contract). - Diagnosis: island-archipelago — EpisodicWitness64/SpoWitness64 (pr-ce64-mb-4) = 0 code symbols; HotWitness = todo!() scaffold; Lance→Surreal→kanban subscription unwired. EW64 is the SEAM, not a type. Invisible in green suites. - Queued (second wave, post-probe-consolidation): one whole-seam spec. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…OCA+CAM-PQ, no cosine)
The 'meet halfway' on VSA: turn the black-box bundle into an explicit,
deterministic projection of the mailbox-SoA window into its COCA-rank SPO
triplets + full provenance (which rows, at what proximity). The triplets stay
ADDRESSABLE — no superposition destroys the register.
Match is DeepNSM's OWN machinery, NOT float cosine: COCA-4096 vocabulary +
the CAM-PQ 4096² u8 word-distance matrix via SimilarityTable::lookup_u8 +
proximity prior. best_guess_match = nearest-triplet CAM-PQ similarity, averaged.
Strictly a fuzzy proposer (cognitive priming): proposes where-to-look / what-it-
resembles ('feels like a Sicilian'), never asserts; exact 32k SPO-W confirms.
Consumes contract::soa_view::MailboxSoaView through the EXISTING hard dep — zero
new dependency, firewall preserved (no dep on the heavy cognitive-shader-driver
that implements the view).
Verified: 5 markov_soa tests green (incl. best_guess_match_uses_cam_pq_not_cosine,
determinism, edge-clamp, skip-untripled, empty=0); full deepnsm suite 94/4/8/1
no regressions; clippy clean in markov_soa (pre-existing lints in other files
untouched, out of scope).
https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…e; VSA = fuzzy proposer (priming), not cosine Supersedes my two earlier mis-framings in-place (board hygiene: don't leave wrong findings standing): - (a) 'VSA = per-cycle experience/soul-print vector' — wrong scope. - (b) 'keep DeepNSM as a parallel universe' — DeepNSM migrates too. Converged finding: the explicit 32k SPO-W is the substrate (addressable, lossless, reasoning-capable, provenance-bearing — categorically > any bundle; ~32-item recovery capacity vs 32k = 1000x over). VSA16k's legitimate role = a strictly- fuzzy proposer / cognitive priming, firewall-gated to discovery; match via COCA + CAM-PQ SimilarityTable, NOT cosine. Records the markov_soa.rs artifact (e0a5049), the aerial within/cross-cohort synergy + the queued CodebookDistance adapter D-id, and the CLAUDE.md reconciliation note. Also: crates/deepnsm/Cargo.lock regen from the markov_soa build (benign). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
One word, three ranked uses; the deterministic CE64→EW64 chain is the line: 1) context-chain building = mailbox chaining through the CE64 W-slot → EpisodicWitness64 arc (deterministic, exact, addressable = THE substrate). 2) hybrid+ autocomplete = #1's chain + a fuzzy accumulated witness-bundle as speculative autocomplete, leashed to the chain that confirms it (= markov_soa + the grail-fold experiment). Invariant: unleashed, #2 degrades into #3. 3) sink-in-and-pray = old VSA-bundle-as-Markov, ceiling-bound, ungrounded — the black box (deprecated; the 'every GGUF would already be VSA' disproof). The line: #1 is the chain; #2 is the chain plus a guess it must confirm; #3 is the guess without a chain. Gate before grail: P1 AriGraph→SoA (HotWitness D-ATOM-5 todo!()s) → P2 EW64 in MailboxSoaView (qualia-pattern accessor) → P3 the grail-fold experiment (CONJECTURE, gated, Jirak-baselined, downstream of the EW64 seam — no scope creep). https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…c (delete deepnsm copy)
markov_soa is the Markov WAVE; EW64/the CE64 W-slot→witness arc is the PARTICLE.
Complementary → same home. It was wrongly in deepnsm (core concern depending on a
linguistics sensor = layer inversion). Moved to
crates/lance-graph/src/graph/arigraph/markov_soa.rs.
SoC deeper step: the SoA SPO row is three OPAQUE u16 ranks — vocabulary is a
late-resolved CLASS property, never a SoA fact (C2 / I-VSA-IDENTITIES, applied to
the triplet encoding). SPO CAN be COCA (good for input parsing) but the
SoA/AriGraph mailbox-view must NOT be forced into COCA. The projector takes an
injected Fn(u16,u16)->u8 distance — caller supplies AriGraph's cam_pq
DistanceTables OR DeepNSM's COCA table. Reuse-by-injection; core has 0 deepnsm
dep (the dep graph enforces agnosticism).
- AriGraph: SpoRanks{s,p,o:u16} opaque + SoaWavePrimer + WaveProjection (4 tests).
- Deleted crates/deepnsm/src/markov_soa.rs (sole ref was its own mod decl);
deepnsm still 89/4/8/1 green after removal.
- STATUS: AriGraph version unverified-offline (lance-graph core's
lance/datafusion/arrow deps don't fetch in the sandbox) — verify on full checkout.
- EPIPHANIES: the SoC + vocabulary-agnostic finding.
https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…); language stays upstream in DeepNSM markov_soa is NOT a generic projector that takes a COCA lens — it IS AriGraph, the cold-path Markov chain promoted to the hot-path SoA. AriGraph is agnostic and NOT necessarily English (holds business/GoBD/Wikidata/text SPO). The match metric is AriGraph's OWN cam_pq::DistanceTables, NOT a language table. The language layer (DeepNSM/COCA-4096/grammar templates) stays STRICTLY upstream: it scans flat data (usually English), parses, and EMITS SPO into AriGraph — and MUST stay English (grammar templates get messy otherwise). Injecting a COCA distance into the hot graph would be the GoBD-with-Rumi error (a language lens over an agnostic graph). Removed the wrong 'or DeepNSM COCA table' injection alternative from both the module doc and the EPIPHANIES finding. (also captures the EPIPHANIES SoC finding that the prior commit's bad pathspec dropped from the index.) https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Trailing index sync — the pub mod removal from the move (9a5f54c) re-surfaced after a linter touch. deepnsm no longer declares markov_soa; it lives in AriGraph. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…view
Add a deferred-accessor note on MailboxSoaView (beside the qualia one): the
future EpisodicWitness64 column IS AriGraph promoted into the mailbox SoA view —
the cold-path episodic Markov chain (arigraph::{episodic,witness_corpus}) as a
hot-path per-row column = the CausalEdge64 W-slot → witness arc (Markov #1, the
particle; markov_soa is the wave). EW64 is not yet a code symbol (queued, see
E-EW64-IS-PREDICTIVE-PREFETCH); shipped seeds are the W-slot + WitnessTable<64> +
arigraph episodic. Stays agnostic — language (DeepNSM/COCA) stays upstream.
https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
… Q=0.325) The empirical falsifier for the delta-card / inherited-nothingness addressing claim (probe #1 of the integration map), harvested from the W2 wave worker and RUN on the real on-disk ontologies (DOLCE-Ultralite, schema.org, Odoo, PROV-O, QUDT, OWL-Time). Measured (1170 classes, 1224 subClassOf edges, 33 top-basins): - LOCALITY = 98.61% intra-basin (the '~90% local' claim survives + exceeds) - FAN-OUT max = 3 (<=16 ✓; 1121 classes have exactly 1 parent-basin) - MODULARITY Q = 0.3246 (>0.3 = clear community structure) VERDICT: PASS — on REAL frozen-ISA ontology structure, 16-bit local references + the <=16 family frontier are real. HONEST CAVEAT (in the probe verdict): real ontologies ~10^3 classes, NOT Wikidata ~10^8; the Wikidata P279 run stays the open probe. Conjecture → FINDING on real ontologies. zero-dep jc (hand-rolled TTL scan, reuses splat_louvain_modularity machinery); 60/60 jc tests green; probe file clippy-clean (pre-existing jc lints in other files untouched). EPIPHANIES: the measured-result FINDING. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
…wave entry STATUS_BOARD: the 9 D-LWS hydration-manager rows (D-LWS-8 probe-1 SHIPPED: locality 98.6%/fan-out 3/Q=0.325 PASS), + D-MKV-SOA + D-EW64-NOTE rows. AGENT_LOG: the world-spine vision + W1/W2 wave + markov_soa SoC + EW64-as- AriGraph + probe-result session entry. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR documents a lazy world-spine hydration design, adds an empirical ontology-locality probe example, and implements vocabulary-agnostic SoA wave projections for AriGraph with supporting docs and tests. ChangesLazy world-spine architecture with empirical validation and runtime support
Sequence Diagram(s)sequenceDiagram
participant TTL as TTL Input
participant Parser as parse_subclass_edges
participant Graph as ClassGraph
participant Basin as assign_basins
participant Metrics as locality/fan_out/modularity_q
participant Verdict as verdict
TTL->>Parser: feed lines (strip strings/comments)
Parser->>Graph: emit (child,parent) edges
Graph->>Basin: assign deterministic basin roots
Basin->>Metrics: compute edge locality, fan-out histogram, Q
Metrics->>Verdict: evaluate thresholds -> Pass/Marginal/Fail
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e860b06ae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if c == '#' { | ||
| // Rest of line is a comment. | ||
| break; |
There was a problem hiding this comment.
Preserve fragments inside angle-bracket IRIs
For valid Turtle that uses full IRIs such as <http://example.org#Child> rdfs:subClassOf <http://example.org#Parent> ., this treats the # inside the IRI as the start of a comment before tokenization. The parser then truncates both class names (and normalize_iri still accepts tokens beginning with <), so different fragment IRIs in the same namespace can collapse into the same malformed class key and corrupt the locality/fan-out measurements.
Useful? React with 👍 / 👎.
| if predicate_is_subclass { | ||
| if let (Some(child), Some(parent)) = | ||
| (current_subject.clone(), normalize_iri(tok)) | ||
| { | ||
| if child != parent { | ||
| edges.push((child, parent)); |
There was a problem hiding this comment.
Stop carrying subClassOf past delimited objects
When a valid Turtle object is written with attached punctuation, e.g. rdfs:subClassOf ex:Parent; or ex:Parent ., normalize_iri(tok) strips the delimiter and emits the parent, but predicate_is_subclass is left true. The next predicate token on the same line or on an indented continuation line can then be parsed as another superclass, adding bogus edges and skewing the probe's verdict; the object delimiter needs to reset/end the active predicate after the edge is emitted.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/plans/wikidata-lazy-spine-hydration-v1.md:
- Around line 135-147: The comment clarifies that the P1 "fan-out max=3" metric
in ontology_locality_probe.rs measures the number of DISTINCT top-basins among a
class’s direct subClassOf parents (i.e., distinct parent-basin count), not the
designed branching factor; update the code/comments so the computed fan-out
variable and any accompanying log/message (e.g., the fan-out computation in
ontology_locality_probe.rs and the variables locality and max_fanout) explicitly
state they count distinct top-basins of direct parents, and ensure PASS logic
checks both locality >= 0.90 AND max_fanout <= 16; also adjust any text that
conflates this metric with the architectural branching factor to avoid
confusion.
In `@crates/jc/examples/ontology_locality_probe.rs`:
- Around line 383-388: The current collection of subclass edges pushes every
(child,parent) pair into edges (built via intern, names, id_of) and allows
duplicates which skews locality/modularity; before using edges for metric
computation, deduplicate the Vec<(usize,usize)> (or switch to a HashSet) so only
unique (ci,pi) pairs are retained; update the same deduplication logic in the
other blocks that build edges around the later sections (the ones starting near
the other ranges) so repeated triples are eliminated prior to computing metrics.
- Around line 209-654: The file exposes core probe logic as free functions
(parse_subclass_edges, locality, fan_out, modularity_q, verdict, load_dir) which
violates the carrier-pattern rule; refactor by introducing a carrier struct
(e.g., Probe or OntologyProbe) that holds probe state (edges, files, maybe
config) and convert those free functions into inherent methods (e.g.,
Probe::parse_subclass_edges, Probe::locality, Probe::fan_out,
Probe::modularity_q, Probe::verdict, Probe::load_dir) updating any call sites to
use method calls on the carrier instance and moving any related state (edges,
files, basin, graph) into the struct so methods operate on &self / &mut self
rather than standalone parameters. Ensure signatures, visibility, and tests are
adjusted accordingly and that load_dir populates the carrier's fields instead of
returning raw tuples.
In `@crates/lance-graph/src/graph/arigraph/markov_soa.rs`:
- Around line 178-182: In SoaWavePrimer::project the window math currently casts
self.radius (u32) and focal_row (usize) to i32 via "as", which can truncate;
replace those lossy casts by using checked/widened conversions (e.g., use
i64::try_from(self.radius) or isize::try_from(focal_row) / i32::try_from where
appropriate) and propagate or handle conversion errors (return early or clamp)
before computing r and row_i; update the loop that uses r, row_i and bounds
checks with the new safe types and keep references to class_ids = soa.class_id()
and n for bounds logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: dda7d8cd-b3f0-4111-9bed-f94764e8b4db
⛔ Files ignored due to path filters (1)
crates/deepnsm/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
.claude/board/AGENT_LOG.md.claude/board/EPIPHANIES.md.claude/board/STATUS_BOARD.md.claude/knowledge/agnostic-lazy-world-spine.md.claude/knowledge/delta-card-addressing-integration-map.md.claude/plans/wikidata-lazy-spine-hydration-v1.mdcrates/jc/Cargo.tomlcrates/jc/examples/ontology_locality_probe.rscrates/lance-graph-contract/src/soa_view.rscrates/lance-graph/src/graph/arigraph/markov_soa.rscrates/lance-graph/src/graph/arigraph/mod.rs
| ### Gate P1 — Partition locality (CONJECTURE → must measure) | ||
| - **Driver:** `jc/examples/splat_louvain_modularity.rs` (Louvain modularity = | ||
| popcount-AND over `contract::splat::AwarenessPlane16K` planes) + | ||
| `neighborhood::clam::measure_cluster_radii` on the real P279/subClassOf + | ||
| edge graph derived from `data/ontologies/*.ttl` (e.g. the FIBO or | ||
| schema.org subtree; biology subtree once Wikidata lands). | ||
| - **Pass:** high modularity ⇒ ≥~90% of edges are intra-cohort ⇒ 16-bit | ||
| intra-cohort references + the family frontier are real, and the natural | ||
| fan-out (the 4/12/16 split) is observed, not assumed. | ||
| - **Gates:** D-LWS-1 fan-out choice; D-LWS-4 GOP P-frame placement; D-LWS-5 | ||
| cohort residency. | ||
| - **Honest status:** `clam.rs` header literally says the radii-coincide-with- | ||
| ontology-boundaries claim "is a TEST, not a fact." Treat as **CONJECTURE**. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check what the locality probe actually measures as "fan-out"
# Look for the probe implementation to see what metrics it computes
rg -n -A5 -B5 "fan.?out|FAN.?OUT" crates/jc/examples/ontology_locality_probe.rs
# Also check if there's any documentation of the probe's output format
rg -n "max.*fan|modularity|locality.*percent" crates/jc/examples/ontology_locality_probe.rsRepository: AdaWorldAPI/lance-graph
Length of output: 9054
Clarify that P1’s “fan-out max=3” measures distinct parent-basin count per class, not the designed branching factor.
- In
ontology_locality_probe.rs,fan-outis computed as the number of DISTINCT top-basins among each class’s directsubClassOfparents, and PASS islocality >= 0.90andmax_fanout <= 16. - So
FAN-OUT max = 3is consistent with the “16-frontier” cap; it just indicates the observed worst-case distinct parent-basin count is 3 (and doesn’t contradict the architectural 4/12/16 vs 16-way split unless prose conflates the metrics).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/plans/wikidata-lazy-spine-hydration-v1.md around lines 135 - 147,
The comment clarifies that the P1 "fan-out max=3" metric in
ontology_locality_probe.rs measures the number of DISTINCT top-basins among a
class’s direct subClassOf parents (i.e., distinct parent-basin count), not the
designed branching factor; update the code/comments so the computed fan-out
variable and any accompanying log/message (e.g., the fan-out computation in
ontology_locality_probe.rs and the variables locality and max_fanout) explicitly
state they count distinct top-basins of direct parents, and ensure PASS logic
checks both locality >= 0.90 AND max_fanout <= 16; also adjust any text that
conflates this metric with the architectural branching factor to avoid
confusion.
| pub fn parse_subclass_edges(ttl: &str) -> Vec<(String, String)> { | ||
| const SUBCLASS: &str = "subClassOf"; // matches rdfs:subClassOf AND bare subClassOf | ||
| let mut edges: Vec<(String, String)> = Vec::new(); | ||
| let mut current_subject: Option<String> = None; | ||
| let mut predicate_is_subclass = false; | ||
| let mut in_long_string = false; | ||
| // Depth of nested `[ ... ]` blank-node restrictions. While > 0 we are | ||
| // INSIDE an anonymous OWL restriction and emit no edges; the restriction | ||
| // spans multiple physical lines, so this persists across the line loop. | ||
| let mut bracket_depth: i32 = 0; | ||
|
|
||
| for raw_line in ttl.lines() { | ||
| let line = strip_strings_and_comments(raw_line, &mut in_long_string); | ||
| let leading_ws = raw_line.starts_with(char::is_whitespace); | ||
|
|
||
| // Split into whitespace tokens (Turtle is whitespace-delimited at this | ||
| // granularity; we already stripped strings/comments). | ||
| let toks: Vec<&str> = line.split_whitespace().collect(); | ||
| if toks.is_empty() { | ||
| // A blank physical line does not by itself end a statement. | ||
| continue; | ||
| } | ||
|
|
||
| let mut idx = 0; | ||
|
|
||
| // A statement that begins flush-left (no leading whitespace) and whose | ||
| // first token is a named IRI / blank starts a NEW subject — UNLESS the | ||
| // line is a pure object-list continuation beginning with ',' (handled | ||
| // below) or a directive (@prefix / @base / PREFIX / BASE). | ||
| let first = toks[0]; | ||
| let is_directive = first.starts_with('@') | ||
| || first.eq_ignore_ascii_case("prefix") | ||
| || first.eq_ignore_ascii_case("base"); | ||
| if is_directive { | ||
| // Directives don't carry subjects or edges; but a directive still | ||
| // can be terminated by '.', which must not clobber subject state of | ||
| // a real statement (directives are always flush-left & self | ||
| // contained), so just skip the whole line. | ||
| continue; | ||
| } | ||
|
|
||
| if bracket_depth == 0 | ||
| && !leading_ws | ||
| && first != "," | ||
| && first != ";" | ||
| && !first.starts_with('[') | ||
| { | ||
| // New subject candidate (only when not inside a blank node). | ||
| if let Some(subj) = normalize_iri(first) { | ||
| current_subject = Some(subj); | ||
| } else { | ||
| current_subject = None; | ||
| } | ||
| predicate_is_subclass = false; | ||
| idx = 1; | ||
| } | ||
|
|
||
| // Walk remaining tokens, tracking predicate switches and emitting | ||
| // edges while the active predicate is subClassOf AND we are at | ||
| // bracket depth 0 (outside any anonymous restriction). | ||
| while idx < toks.len() { | ||
| let tok = toks[idx]; | ||
|
|
||
| // Update bracket depth from any '[' / ']' characters in the token, | ||
| // then move on if the token is pure bracket punctuation. A '[' | ||
| // opening means the CURRENT subClassOf object is an anonymous | ||
| // restriction; we suppress emission until the matching ']' but | ||
| // stay in subClassOf predicate mode so a following ',' continues | ||
| // the OUTER object list. | ||
| let opens = tok.matches('[').count() as i32; | ||
| let closes = tok.matches(']').count() as i32; | ||
| if opens > 0 || closes > 0 { | ||
| bracket_depth += opens - closes; | ||
| if bracket_depth < 0 { | ||
| bracket_depth = 0; | ||
| } | ||
| // If the token is only brackets (possibly with ',' / ';'), | ||
| // there is nothing else to interpret on it. | ||
| let stripped: String = tok | ||
| .chars() | ||
| .filter(|&c| c != '[' && c != ']' && c != ',' && c != ';') | ||
| .collect(); | ||
| if stripped.is_empty() { | ||
| idx += 1; | ||
| continue; | ||
| } | ||
| } | ||
|
|
||
| // Anything inside a blank node is ignored entirely. | ||
| if bracket_depth > 0 { | ||
| idx += 1; | ||
| continue; | ||
| } | ||
|
|
||
| // Object-list continuation: ',' keeps the current predicate. | ||
| if tok == "," { | ||
| idx += 1; | ||
| continue; | ||
| } | ||
| // ';' ends the current predicate's object list (a new predicate | ||
| // follows on this or a later line). | ||
| if tok == ";" { | ||
| predicate_is_subclass = false; | ||
| idx += 1; | ||
| continue; | ||
| } | ||
| // '.' terminates the whole statement → no active subject. | ||
| if tok.starts_with('.') && tok.len() == 1 { | ||
| current_subject = None; | ||
| predicate_is_subclass = false; | ||
| idx += 1; | ||
| continue; | ||
| } | ||
|
|
||
| // Predicate detection: rdfs:subClassOf or bare subClassOf. | ||
| let bare = tok.trim_end_matches([';', ',']); | ||
| if bare == SUBCLASS || bare.ends_with(":subClassOf") || bare == "rdfs:subClassOf" { | ||
| predicate_is_subclass = true; | ||
| idx += 1; | ||
| continue; | ||
| } | ||
| // In subClassOf object position: emit a named-IRI edge. | ||
| if predicate_is_subclass { | ||
| if let (Some(child), Some(parent)) = | ||
| (current_subject.clone(), normalize_iri(tok)) | ||
| { | ||
| if child != parent { | ||
| edges.push((child, parent)); | ||
| } | ||
| } | ||
| idx += 1; | ||
| continue; | ||
| } | ||
|
|
||
| // Not in subClassOf mode: a token like `a`, `rdf:type`, | ||
| // `owl:disjointWith`, `rdfs:label` is a (non-subclass) predicate; | ||
| // it just resets predicate state. We do not need its objects. | ||
| if bare == "a" || bare.contains(':') { | ||
| predicate_is_subclass = false; | ||
| } | ||
| idx += 1; | ||
| } | ||
| } | ||
| edges | ||
| } | ||
|
|
||
| // ── class graph: intern IRIs, build parent adjacency, assign top-basins ───── | ||
|
|
||
| /// Interned subClassOf DAG over class IRIs. | ||
| pub struct ClassGraph { | ||
| /// id -> IRI key (for printing). | ||
| pub names: Vec<String>, | ||
| /// Direct parents of each class (deduplicated, sorted). | ||
| pub parents: Vec<Vec<usize>>, | ||
| /// All edges as interned (child, parent) id pairs. | ||
| pub edges: Vec<(usize, usize)>, | ||
| } | ||
|
|
||
| impl ClassGraph { | ||
| /// Build from `(child, parent)` IRI-key edges. Every IRI appearing in any | ||
| /// position becomes a node (a parent that is never a child is a root). | ||
| pub fn from_edges(iri_edges: &[(String, String)]) -> Self { | ||
| let mut id_of: BTreeMap<String, usize> = BTreeMap::new(); | ||
| let mut names: Vec<String> = Vec::new(); | ||
| let intern = |s: &str, names: &mut Vec<String>, id_of: &mut BTreeMap<String, usize>| { | ||
| if let Some(&id) = id_of.get(s) { | ||
| id | ||
| } else { | ||
| let id = names.len(); | ||
| names.push(s.to_string()); | ||
| id_of.insert(s.to_string(), id); | ||
| id | ||
| } | ||
| }; | ||
| let mut edges: Vec<(usize, usize)> = Vec::new(); | ||
| for (c, p) in iri_edges { | ||
| let ci = intern(c, &mut names, &mut id_of); | ||
| let pi = intern(p, &mut names, &mut id_of); | ||
| edges.push((ci, pi)); | ||
| } | ||
| let n = names.len(); | ||
| let mut parents: Vec<Vec<usize>> = vec![Vec::new(); n]; | ||
| for &(c, p) in &edges { | ||
| parents[c].push(p); | ||
| } | ||
| for ps in parents.iter_mut() { | ||
| ps.sort_unstable(); | ||
| ps.dedup(); | ||
| } | ||
| Self { names, parents, edges } | ||
| } | ||
|
|
||
| pub fn n_classes(&self) -> usize { | ||
| self.names.len() | ||
| } | ||
|
|
||
| /// Assign each class to its top-basin = the root ancestor reached by | ||
| /// walking parents upward. Multi-parent: follow the parent with the | ||
| /// SMALLEST interned id (deterministic representative). Cycles: broken by | ||
| /// a visited-set; the entry node of a cycle becomes its own basin. | ||
| /// Returns `basin[id] = root_id`. | ||
| pub fn assign_basins(&self) -> Vec<usize> { | ||
| let n = self.n_classes(); | ||
| let mut basin = vec![usize::MAX; n]; | ||
| for start in 0..n { | ||
| if basin[start] != usize::MAX { | ||
| continue; | ||
| } | ||
| // Walk up to a root, recording the path; memoize on the way back. | ||
| let mut path: Vec<usize> = Vec::new(); | ||
| let mut visiting: BTreeSet<usize> = BTreeSet::new(); | ||
| let mut cur = start; | ||
| let root; | ||
| loop { | ||
| if let Some(&memo) = basin.get(cur) { | ||
| if memo != usize::MAX { | ||
| root = memo; | ||
| break; | ||
| } | ||
| } | ||
| if visiting.contains(&cur) { | ||
| // Cycle: treat `cur` as the basin root for this SCC entry. | ||
| root = cur; | ||
| break; | ||
| } | ||
| visiting.insert(cur); | ||
| path.push(cur); | ||
| // Pick the smallest-id parent (deterministic). No parent → root. | ||
| match self.parents[cur].iter().min() { | ||
| Some(&p) => cur = p, | ||
| None => { | ||
| root = cur; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| for id in path { | ||
| basin[id] = root; | ||
| } | ||
| if basin[start] == usize::MAX { | ||
| basin[start] = root; | ||
| } | ||
| } | ||
| basin | ||
| } | ||
| } | ||
|
|
||
| // ── metric 1: locality ────────────────────────────────────────────────────── | ||
|
|
||
| /// Fraction of edges whose child and parent share a top-basin. | ||
| /// Returns (local_edges, total_edges, fraction). Empty graph → fraction 0. | ||
| pub fn locality(edges: &[(usize, usize)], basin: &[usize]) -> (usize, usize, f64) { | ||
| let total = edges.len(); | ||
| if total == 0 { | ||
| return (0, 0, 0.0); | ||
| } | ||
| let local = edges | ||
| .iter() | ||
| .filter(|&&(c, p)| basin[c] == basin[p]) | ||
| .count(); | ||
| (local, total, local as f64 / total as f64) | ||
| } | ||
|
|
||
| // ── metric 2: fan-out (distinct parent-basins per class) ──────────────────── | ||
|
|
||
| /// Per-class count of DISTINCT parent-basins among its direct subClassOf | ||
| /// parents. Returns (max_fanout, histogram) where histogram[k] = #classes | ||
| /// whose fan-out == k. Classes with no parents contribute fan-out 0. | ||
| pub fn fan_out(graph: &ClassGraph, basin: &[usize]) -> (usize, BTreeMap<usize, usize>) { | ||
| let mut hist: BTreeMap<usize, usize> = BTreeMap::new(); | ||
| let mut max_fo = 0usize; | ||
| for c in 0..graph.n_classes() { | ||
| let distinct: BTreeSet<usize> = graph.parents[c].iter().map(|&p| basin[p]).collect(); | ||
| let fo = distinct.len(); | ||
| max_fo = max_fo.max(fo); | ||
| *hist.entry(fo).or_insert(0) += 1; | ||
| } | ||
| (max_fo, hist) | ||
| } | ||
|
|
||
| // ── metric 3: modularity Q of the basin partition ────────────────────────── | ||
| // | ||
| // Newman modularity on the UNDIRECTED subClassOf graph (each subClassOf edge | ||
| // contributes one undirected link between child and parent): | ||
| // | ||
| // Q = Σ_c [ e_c / m - (a_c / 2m)^2 ] | ||
| // | ||
| // where m = |E|, e_c = number of edges fully inside basin c, a_c = sum of | ||
| // degrees of nodes in basin c. We reuse the `splat_louvain_modularity.rs` | ||
| // idea — the within-community edge mass is a popcount-AND between a node's | ||
| // neighbour bitset and the basin-membership bitset — but with dynamically | ||
| // sized `Vec<u64>` planes so the probe handles ontologies with thousands of | ||
| // classes (the contract's fixed 16,384-bit `AwarenessPlane16K` is too small | ||
| // for schema.org). Self-loops are excluded by construction (the parser drops | ||
| // `X subClassOf X`). | ||
|
|
||
| /// A dynamically sized bitset (the standalone analogue of `AwarenessPlane16K`). | ||
| struct BitPlane(Vec<u64>); | ||
|
|
||
| impl BitPlane { | ||
| fn zero(n_bits: usize) -> Self { | ||
| BitPlane(vec![0u64; n_bits.div_ceil(64)]) | ||
| } | ||
| #[inline] | ||
| fn set(&mut self, idx: usize) { | ||
| self.0[idx / 64] |= 1u64 << (idx % 64); | ||
| } | ||
| #[inline] | ||
| fn and_popcount(&self, other: &BitPlane) -> u32 { | ||
| self.0 | ||
| .iter() | ||
| .zip(other.0.iter()) | ||
| .map(|(a, b)| (a & b).count_ones()) | ||
| .sum() | ||
| } | ||
| } | ||
|
|
||
| /// Compute Newman modularity Q of the basin partition. Returns Q in | ||
| /// [-0.5, 1.0]. Empty graph → 0.0. | ||
| pub fn modularity_q(graph: &ClassGraph, basin: &[usize]) -> f64 { | ||
| let n = graph.n_classes(); | ||
| let m = graph.edges.len(); | ||
| if m == 0 || n == 0 { | ||
| return 0.0; | ||
| } | ||
| let two_m = 2.0 * m as f64; | ||
|
|
||
| // Undirected neighbour bitset per node (both directions of each edge). | ||
| let mut neigh: Vec<BitPlane> = (0..n).map(|_| BitPlane::zero(n)).collect(); | ||
| let mut degree = vec![0u32; n]; | ||
| for &(c, p) in &graph.edges { | ||
| neigh[c].set(p); | ||
| neigh[p].set(c); | ||
| degree[c] += 1; | ||
| degree[p] += 1; | ||
| } | ||
|
|
||
| // Group node ids by basin; build a membership bitset per basin. | ||
| let mut members: BTreeMap<usize, Vec<usize>> = BTreeMap::new(); | ||
| for (id, &b) in basin.iter().enumerate() { | ||
| members.entry(b).or_default().push(id); | ||
| } | ||
|
|
||
| let mut q = 0.0; | ||
| for ids in members.values() { | ||
| let mut plane = BitPlane::zero(n); | ||
| for &id in ids { | ||
| plane.set(id); | ||
| } | ||
| // e_c counted twice (once per endpoint) via Σ_u popcount(neigh[u] AND plane). | ||
| let mut e_c_times_two = 0u32; | ||
| let mut a_c = 0.0; | ||
| for &id in ids { | ||
| e_c_times_two += neigh[id].and_popcount(&plane); | ||
| a_c += degree[id] as f64; | ||
| } | ||
| let e_c = e_c_times_two as f64 / 2.0; | ||
| q += (e_c / m as f64) - (a_c / two_m).powi(2); | ||
| } | ||
| q | ||
| } | ||
|
|
||
| // ── verdict ────────────────────────────────────────────────────────────────── | ||
|
|
||
| /// Verdict tier for the locality hypothesis. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| pub enum Verdict { | ||
| /// High locality AND fan-out fits the family frontier. | ||
| Pass, | ||
| /// Locality decent but borderline, or fan-out near the cap. | ||
| Marginal, | ||
| /// Locality low — local-pointer assumption does not hold. | ||
| Fail, | ||
| } | ||
|
|
||
| impl Verdict { | ||
| pub fn as_str(self) -> &'static str { | ||
| match self { | ||
| Verdict::Pass => "PASS", | ||
| Verdict::Marginal => "MARGINAL", | ||
| Verdict::Fail => "FAIL", | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Decide the verdict from the measured numbers. | ||
| /// | ||
| /// Thresholds (stated, not hand-waved): | ||
| /// * locality ≥ 0.90 AND max_fanout ≤ 16 → PASS (the map's claim) | ||
| /// * locality ≥ 0.75 (or max_fanout in 17..=32) → MARGINAL | ||
| /// * otherwise → FAIL | ||
| /// | ||
| /// The "16" frontier is the design's pencilled cap; max_fanout > 16 means a | ||
| /// single class needs more than 16 distinct family pointers, breaking the | ||
| /// 4/12/16 split as stated (though a wider frontier byte would still work). | ||
| pub fn verdict(locality_frac: f64, max_fanout: usize) -> Verdict { | ||
| if locality_frac >= 0.90 && max_fanout <= 16 { | ||
| Verdict::Pass | ||
| } else if locality_frac >= 0.75 || (max_fanout > 16 && max_fanout <= 32) { | ||
| Verdict::Marginal | ||
| } else { | ||
| Verdict::Fail | ||
| } | ||
| } | ||
|
|
||
| // ── load real ontology TTLs from a directory ──────────────────────────────── | ||
|
|
||
| /// All parsed `(child, parent)` IRI edges plus the sorted list of TTL files | ||
| /// they came from. | ||
| type LoadedOntology = (Vec<(String, String)>, Vec<PathBuf>); | ||
|
|
||
| /// Recursively collect `*.ttl` files under `dir`, parse subClassOf edges from | ||
| /// each, and return (all_edges, sorted_file_list). I/O errors on individual | ||
| /// files are skipped with a note to stderr (the probe is best-effort over | ||
| /// whatever real ontologies are present). | ||
| fn load_dir(dir: &Path) -> std::io::Result<LoadedOntology> { | ||
| let mut edges: Vec<(String, String)> = Vec::new(); | ||
| let mut files: Vec<PathBuf> = Vec::new(); | ||
| let mut stack = vec![dir.to_path_buf()]; | ||
| while let Some(d) = stack.pop() { | ||
| let rd = match std::fs::read_dir(&d) { | ||
| Ok(rd) => rd, | ||
| Err(e) => { | ||
| eprintln!(" (skip dir {}: {})", d.display(), e); | ||
| continue; | ||
| } | ||
| }; | ||
| for entry in rd.flatten() { | ||
| let path = entry.path(); | ||
| if path.is_dir() { | ||
| stack.push(path); | ||
| } else if path.extension().map(|e| e == "ttl").unwrap_or(false) { | ||
| match std::fs::read_to_string(&path) { | ||
| Ok(text) => { | ||
| let mut e = parse_subclass_edges(&text); | ||
| edges.append(&mut e); | ||
| files.push(path); | ||
| } | ||
| Err(e) => eprintln!(" (skip {}: {})", path.display(), e), | ||
| } | ||
| } | ||
| } | ||
| } | ||
| files.sort(); | ||
| Ok((edges, files)) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Move probe operations onto a carrier struct instead of free functions.
Core probe logic is implemented as free functions (parse_subclass_edges, locality, fan_out, modularity_q, verdict, load_dir) rather than methods on a state carrier, which breaks the repository’s Rust carrier-pattern rule.
As per coding guidelines, "**/*.rs: Use only method calls on the carrier struct that holds the state, never free functions. Carrier pattern: trajectory.resolve() instead of resolve(trajectory, config, awareness)"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/jc/examples/ontology_locality_probe.rs` around lines 209 - 654, The
file exposes core probe logic as free functions (parse_subclass_edges, locality,
fan_out, modularity_q, verdict, load_dir) which violates the carrier-pattern
rule; refactor by introducing a carrier struct (e.g., Probe or OntologyProbe)
that holds probe state (edges, files, maybe config) and convert those free
functions into inherent methods (e.g., Probe::parse_subclass_edges,
Probe::locality, Probe::fan_out, Probe::modularity_q, Probe::verdict,
Probe::load_dir) updating any call sites to use method calls on the carrier
instance and moving any related state (edges, files, basin, graph) into the
struct so methods operate on &self / &mut self rather than standalone
parameters. Ensure signatures, visibility, and tests are adjusted accordingly
and that load_dir populates the carrier's fields instead of returning raw
tuples.
| let mut edges: Vec<(usize, usize)> = Vec::new(); | ||
| for (c, p) in iri_edges { | ||
| let ci = intern(c, &mut names, &mut id_of); | ||
| let pi = intern(p, &mut names, &mut id_of); | ||
| edges.push((ci, pi)); | ||
| } |
There was a problem hiding this comment.
Deduplicate subclass edges before metric computation.
Repeated (child,parent) edges currently count multiple times in locality and modularity, which can skew the reported verdict when ontologies overlap or repeat triples.
💡 Suggested fix
pub fn from_edges(iri_edges: &[(String, String)]) -> Self {
@@
- let mut edges: Vec<(usize, usize)> = Vec::new();
+ let mut edges: Vec<(usize, usize)> = Vec::new();
@@
for (c, p) in iri_edges {
let ci = intern(c, &mut names, &mut id_of);
let pi = intern(p, &mut names, &mut id_of);
edges.push((ci, pi));
}
+ edges.sort_unstable();
+ edges.dedup();Also applies to: 460-470, 528-569
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/jc/examples/ontology_locality_probe.rs` around lines 383 - 388, The
current collection of subclass edges pushes every (child,parent) pair into edges
(built via intern, names, id_of) and allows duplicates which skews
locality/modularity; before using edges for metric computation, deduplicate the
Vec<(usize,usize)> (or switch to a HashSet) so only unique (ci,pi) pairs are
retained; update the same deduplication logic in the other blocks that build
edges around the later sections (the ones starting near the other ranges) so
repeated triples are eliminated prior to computing metrics.
| let r = self.radius as i32; | ||
| let class_ids = soa.class_id(); | ||
| for d in -r..=r { | ||
| let row_i = focal_row as i32 + d; | ||
| if row_i < 0 || row_i as usize >= n { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify all potentially lossy i32 casts in this module.
rg -n -C2 'as i32' crates/lance-graph/src/graph/arigraph/markov_soa.rsRepository: AdaWorldAPI/lance-graph
Length of output: 397
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs"
wc -l "$FILE"
echo "---- snippet around lines 160-210 ----"
sed -n '150,210p' "$FILE" | nl -ba
echo "---- locate radius field/type ----"
rg -n "radius" "$FILE"
echo "---- locate focal_row type/assignment ----"
rg -n "focal_row" "$FILE"Repository: AdaWorldAPI/lance-graph
Length of output: 223
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs"
wc -l "$FILE"
echo "---- snippet around lines 160-210 (with line numbers) ----"
sed -n '150,210p' "$FILE" | cat -n
echo "---- locate struct/definitions mentioning radius ----"
rg -n "radius" "$FILE"
echo "---- locate focal_row usage/definition ----"
rg -n "focal_row" "$FILE"Repository: AdaWorldAPI/lance-graph
Length of output: 3062
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="crates/lance-graph/src/graph/arigraph/markov_soa.rs"
echo "---- other as i32 in file ----"
rg -n "as i32" "$FILE" || true
echo "---- usages of SoaWavePrimer::new ----"
rg -n "SoaWavePrimer::new\\(" crates/lance-graph/src || true
echo "---- usages of radius field ----"
rg -n "\\.radius" crates/lance-graph/src/graph/arigraph || trueRepository: AdaWorldAPI/lance-graph
Length of output: 790
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "SoaWavePrimer" crates/lance-graph/src || true
rg -n "\\.project\\(" crates/lance-graph/src/graph/arigraph/markov_soa.rs || true
# Search whole crate for SoaWavePrimer::project call sites
rg -n "SoaWavePrimer::.*project\\(" crates/lance-graph/src || true
rg -n "SoaWavePrimer.*\\.project\\(" crates/lance-graph/src || trueRepository: AdaWorldAPI/lance-graph
Length of output: 1656
Avoid lossy as i32 casts in SoaWavePrimer::project window math (lines ~178-182).
self.radius: u32 and focal_row: usize are downcast to i32 via as, so large values truncate (and can even flip signs), corrupting the ±window traversal. Use checked/widened conversions (e.g., i32::try_from(...), i64/isize) and handle overflow (early return or clamp).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/lance-graph/src/graph/arigraph/markov_soa.rs` around lines 178 - 182,
In SoaWavePrimer::project the window math currently casts self.radius (u32) and
focal_row (usize) to i32 via "as", which can truncate; replace those lossy casts
by using checked/widened conversions (e.g., use i64::try_from(self.radius) or
isize::try_from(focal_row) / i32::try_from where appropriate) and propagate or
handle conversion errors (return early or clamp) before computing r and row_i;
update the loop that uses r, row_i and bounds checks with the new safe types and
keep references to class_ids = soa.class_id() and n for bounds logic.
The `format` CI job runs: cargo fmt --manifest-path crates/lance-graph/Cargo.toml -- --check markov_soa.rs had one-line struct literals + asserts that rustfmt 1.95.0 expands to multi-line. Apply canonical formatting (no logic change); the exact CI command now passes clean. Other failing-check noise was a local --all artifact — CI only formats the lance-graph crate. https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Full-breadth integration spec wiring D-MBX kanban contract through witness commit (D-ATOM-5), surreal LIVE -> Rubicon kanban flip, ExecTarget backends, head2head two-view superposition in the shader driver, EW64-Markov Hebbian prefetch, language->SPO landing (D-LWS), and BindSpace decommission. Grounded against current main (#437/#439/ #444/#445) + two recon passes; flags the two hard blockers (lance-7 witness API, surreal fork dep / OQ-11.6) and the stale-branch caveat. https://claude.ai/code/session_01PLf95mURCY96TvKBFvSWEQ
…EW64-1, D-VIEW-1) + plan v1
AriGraph episodic edges, RISC-encoded (the corrected EW64, replacing the earlier
"CE64 lens"/"16-bit pointer" framings):
- episodic_edges::{EpisodicEdges64(u64), EdgeRef} — 4x[4-bit family | 12-bit local].
family 0 = intra-basin (inherited from HHTL/class_id, ~98.6% per #444);
1..=15 = cross-family index into the OGIT-class-inherited palette (~1.4%).
Identities inherited, never on the edge (I-VSA-IDENTITIES); a CAM_PQ facet code.
- view_angle::ViewAngle — 4-bit view-schema selector; the class presence bitmask
doubles as the attention mask (inherited view-schema, never per-instance semantics).
527 contract lib tests (+11); both files clippy pedantic+nursery clean.
Plan: .claude/plans/episodic-risc-spine-v1.md (3 lifecycle-separated structures:
CAM/OGIT identity, Lance-version pseudo-radix index, CLAM ephemeral KV; bounded-horizon
compression). Finding: EPIPHANIES E-EPISODIC-CLOSURE. CI-gated next (no protoc offline):
D-EW64-2 SoA columns, D-STORY-1 CLAM clusterer, D-STORY-2 session index,
D-STORY-3 palette256/4096 archetypes, D-HORIZON-1 stopping rule.
Board: INTEGRATION_PLANS + LATEST_STATE + STATUS_BOARD + EPIPHANIES + AGENT_LOG.
https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Summary
The consolidated output of a long design session on how to compress Wikidata into a lazy-loading, foveated, address-unified world-spine — converged to a single addressing thesis, validated by a real measurement, with one SoC code-move and an empirical probe shipped. Mostly docs + one runnable probe + one small refactor; no behavioral change to shipped runtime.
The one idea (the docs)
A card stores the surprise; the deck stores the expectation. Meaning = deck ⊗ delta — the free-energy framing, applied to both the key (address) and the value (content). The endgame is inherited nothingness: identity (which one) is 27 bits irreducible; description (what it is) is ~0 bits for the modal class member (inherited from the frozen OGIT deck). A typical entity stores nothing.
knowledge/delta-card-addressing-integration-map.md— the converged map (partition-as-address, 27-bit floor with ~0-bit row, sparse radix range-delegation, I/P/B-frames over Lance versioning, RISC compose-not-materialize, frozen-ISA).knowledge/agnostic-lazy-world-spine.md— the tiered substrate (cold Lance ◄─NiblePath─► hot mailbox-SoA ◄── OGIT/DOLCE cache).knowledge/owl-dolce-hhtl-compartments-aerial-fed.md,knowledge/splat-codebook-aerial-wikidata-compression.md— the domain/aerial seams.plans/wikidata-lazy-spine-hydration-v1.md— the 9 D-LWS deliverables for the one missing runtime piece (theNiblePath-keyed tiered hydration manager).The measurement — probe #1 PASS (the payoff)
crates/jc/examples/ontology_locality_probe.rs(zero-dep, hand-rolled TTL scan, reuses thesplat_louvain_modularitymachinery) RUN on the real on-disk ontologies (DOLCE-Ultralite, schema.org, Odoo, PROV-O, QUDT, OWL-Time):⇒ On real frozen-ISA ontology structure, the 16-bit local references + the ≤16 family frontier are real — the "inherited nothingness" addressing is not a hope. Honest caveat (in the probe's own verdict): real ontologies (~10³ classes), NOT Wikidata (~10⁸) — the Wikidata P279 run remains the open probe. Conjecture → FINDING on real ontologies.
The code move — markov_soa → AriGraph, vocabulary-agnostic (SoC)
markov_soa(the Markov wave — a windowed SoA projection) was first authored indeepnsm, which made a core runtime concern depend on a linguistics sensor (layer inversion). Moved tolance-graph::graph::arigraph::markov_soaand made vocabulary-agnostic:SpoRanks { s, p, o: u16 }are opaque — the SoA row carries no language; vocabulary is a late-resolved class property.Fn(u16,u16)->u8= AriGraph's owncam_pq::DistanceTables— not a language table. The language layer (DeepNSM/COCA/grammar) stays strictly upstream (it emits SPO into AriGraph; injecting COCA into the hot graph would be the "GoBD-with-Rumi" error).markov_soaIS AriGraph — the cold-path Markov chain promoted to the hot-path SoA (the wave); EW64/the CE64 W-slot→witness arc is the particle.contract::soa_view::MailboxSoaViewgains a doc note:EpisodicWitness64= AriGraph in the mailbox SoA view (deferred accessor, qualia-pattern; EW64 is a queued design, not yet a code symbol).Findings on the board (the durable record)
markov_soa, leashed dark-horse), Claude/review lance graph architecture i6 t kf #3 sink-in-and-pray (deprecated VSA-substrate) — + the P1→P2→P3 gate-before-grail ordering.Verification
cargo test --manifest-path crates/jc/Cargo.toml→ 60/60 (probe file clippy-clean; pre-existing jc lints in other files untouched, out of scope).cargo test -p lance-graph-contract→ 503/503 (the EW64 doc note + soa_view).cargo test --manifest-path crates/deepnsm/Cargo.toml→ green after themarkov_soaremoval.lance-graph::graph::arigraph::markov_soais unverified-offline —lance-graphcore'slance/datafusion/arrowdeps don't fetch in the sandbox; the module + its 4 tests are authored against the groundedMailboxSoaViewsurface but need a full-checkout compile-verify. FlaggedSTATUS: provisionalin the module header. Its truly-correct final home is inside the EW64-in-SoA seam (P1/P2).Honest scope
This is vision + measurement + one disciplined refactor, not a feature. The hydration manager (D-LWS), the EW64 type (P1/P2), the Wikidata-scale probe, and the markov_soa core-verify are all queued/gated, clearly labelled CONJECTURE-with-probe where they're unproven. The shipped, verified pieces are: the locality probe (PASS), the contract doc note, the deepnsm cleanup, and the board record.
https://claude.ai/code/session_012SorR8UbtEvYmbX8cXftj7
Generated by Claude Code
Summary by CodeRabbit