Skip to content

OSINT family-adapter edges + Callcenter DataFusion/Gremlin + aiwar POC (codex roll-up)#560

Merged
AdaWorldAPI merged 6 commits into
mainfrom
claude/jirak-math-theorems-harvest-rfii13
Jun 20, 2026
Merged

OSINT family-adapter edges + Callcenter DataFusion/Gremlin + aiwar POC (codex roll-up)#560
AdaWorldAPI merged 6 commits into
mainfrom
claude/jirak-math-theorems-harvest-rfii13

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented Jun 20, 2026

Copy link
Copy Markdown
Owner

What

Follow-up to merged #557. Rolls in both codex P1 review findings, the operator's 16×8-bit family-node-adapter edge model, the Callcenter DataFusion/Gremlin slice, and an aiwar OSINT POC on the real graph. q2 wires the resulting GraphSnapshot to the Quadro-2 visual.

codex P1 fixes (rolled in)

  • feat: bump arrow 57, datafusion 51, lance 2 #1 — classid is the class (exact). project_snapshot / nearest_anchor now include only rows where classid == domain.classid, so a mixed-class SoA board can't leak FMA/default nodes (or cross-domain edges) into an OSINT projection. New test mixed_class_board_excludes_other_domains.
  • Module 6: #[track_caller] error macros for zero-cost location capture #2 — ambiguous one-byte edge targets. Resolved structurally via the family-adapter model below: the canonical 16-byte EdgeBlock is read as 16 family-node adapters, each byte → a FAMILY by family & 0xFF (collision-aware: a low byte shared by two families is skipped, never mis-routed). Member-by-identity resolution is gone, so the >255-member aliasing dissolves (resolution is only ever family-level). New test ambiguous_family_low_byte_is_skipped_not_misrouted.

E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE: edges→families trades a mixin dependency (a referenced family must exist) for extreme render stability (family hubs are fixed anchors; the layout doesn't churn) + flexibility (a node mixes in up to 16 family adjacencies). 8×16-bit / 32×4-residue / member→member second-hop are deferred richer flavors; helix-basin-anchor (CLAM ⇄ Louvain turbovec residue) is the operator's named later enhancement.

Callcenter slice (lance-graph-callcenter)

  • graph_table (query-lite) — GraphSnapshotnodes + edges arrow MemTable TableProviders + register_graph(SessionContext). The DataFusion / SQL / Cypher→SQL path (mirrors transcode::ontology_table). Live SQL roundtrip test: SELECT target FROM edges WHERE source=… AND label='references'.
  • graph_gremlin (always-on, pure contract types) — g(&snap).v(&[…]).out()/.in_()/.out_e(label)/.values_kind()/.count(). The basic Gremlin POC, doubling as the SurrealQL ->edge-> traversal kernel.

aiwar OSINT POC (contract::aiwar + example)

  • AiwarClassView (entity category ⇒ family id) + aiwar_node_rows ingest the real aiwar-neo4j-harvest/data/aiwar_graph.json (via the existing literal_graph::ingest_aiwar_json) into OSINT NodeRows → project_snapshot gives a Gotham graph whose family nodes ARE the categories.
  • examples/aiwar_family_poc.rs on the real 174 KB graph: 221 entities / 326 edges → 281 nodes (221 members + 60 family hubs) + 481 edges.
  • Honest note: 60 families because the class view keys off the raw fine-grained type field (FacialRecognition, ComputerVision vs …; … split); coarse N_*-bucket grouping is a one-line granularity knob q2 can choose — the mechanism is correct either way.

Tests

  • cargo test -p lance-graph-contract --lib703 (+5: aiwar ×3, soa_graph ambiguity + mixed-class ×2). cargo clippy -p lance-graph-contract --all-targets -- -D warnings — clean.
  • cargo test -p lance-graph-callcenter --features query10 graph tests (incl. live SQL). Default (no-feature) build compiles graph_gremlin.
  • New callcenter files (graph_table.rs, graph_gremlin.rs) are clippy-clean (verified zero errors originate in them). The crate's --features query -D warnings clippy fails on pre-existing oxrdf-deprecation/doc debt in unrelated modules — logged as TD-CALLCENTER-QUERY-CLIPPY, out of scope here.

Board updated in-commit (LATEST_STATE, AGENT_LOG, EPIPHANIES, TECH_DEBT, plan).

🤖 Generated with Claude Code


Generated by Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added graph traversal API with Gremlin-style operations (traverse edges, filter vertices, project properties)
    • Enabled SQL query support for graph data via DataFusion integration
    • Added OSINT graph ingestion with automatic family node projection
    • Introduced optional GUID v2 layout feature (feature-gated)
  • Documentation

    • Updated integration plans and technical tracking for upcoming releases

…iwar POC

Follow-up to merged #557; rolls in both codex P1 review findings + the
operator's 16-family-adapter edge model + the Callcenter slice + an aiwar
OSINT POC on the real graph.

soa_graph (codex P1 #1 + #2, operator model):
- classid IS the class (exact): project_snapshot / nearest_anchor include
  only rows where classid == domain.classid — a mixed-class board can't
  leak one domain's nodes/edges into another's view.
- 16 x 8-bit family-node adapters: the canonical EdgeBlock is read as 16
  family adapters (12 in-family + 4 out-of-family), each non-zero byte ->
  a FAMILY node by `family & 0xFF`, collision-aware (ambiguous low byte
  skipped, never mis-routed). Member-by-identity resolution removed -> the
  >255-member aliasing dissolves (resolution is family-level only).
  Trade: mixin dependency for extreme render stability + flexibility
  (E-FAMILY-ADAPTER-EDGES-ARE-RENDER-STABLE).

lance-graph-callcenter (the slice):
- graph_table (query-lite): GraphSnapshot -> `nodes` + `edges` arrow
  MemTable TableProviders + register_graph(SessionContext). The
  DataFusion / SQL / Cypher->SQL path (mirrors transcode::ontology_table).
- graph_gremlin (always-on, pure contract types): g(&snap).v().out()/
  .in_()/.out_e(label)/.values_kind() — the Gremlin POC = SurrealQL
  `->edge->` traversal kernel.

contract::aiwar + example (the POC the operator asked for):
- AiwarClassView (entity category ⇒ family id) + aiwar_node_rows ingest the
  real AdaWorldAPI/aiwar-neo4j-harvest/data/aiwar_graph.json into OSINT
  NodeRows; project_snapshot gives a Gotham graph whose family nodes ARE
  the categories. Example run: 221 entities/326 edges -> 281 nodes (221
  members + 60 family hubs) + 481 edges. q2 wires it to the Quadro-2 visual.

Tests: contract 703 lib (+5), clippy --all-targets -D warnings clean.
callcenter 10 graph tests (--features query, incl. live SQL roundtrip),
default build compiles graph_gremlin; new files clippy-clean (pre-existing
callcenter query-clippy debt logged TD-CALLCENTER-QUERY-CLIPPY). Board
updated (LATEST_STATE, AGENT_LOG, EPIPHANIES, TECH_DEBT, plan).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 46 minutes and 59 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 89acabf5-2ba6-470f-a2c6-1f4ed20f7d89

📥 Commits

Reviewing files that changed from the base of the PR and between 5dd1645 and 5e10dbb.

📒 Files selected for processing (5)
  • .claude/board/AGENT_LOG.md
  • .claude/board/EPIPHANIES.md
  • .claude/plans/guid-v2-tail-per-family-codebook-v1.md
  • crates/lance-graph-callcenter/src/graph_gremlin.rs
  • crates/lance-graph-contract/src/aiwar.rs
📝 Walkthrough

Walkthrough

The PR reworks the SoA→Gotham graph projection to use collision-aware 16×8-bit family-node adapter resolution with classid filtering, adds a feature-gated GUID v2 tail layout (3×u16) with NiblePath lowering, introduces callcenter Gremlin traversal and DataFusion table providers over GraphSnapshot, adds an aiwar OSINT family-node projection module, and updates board/planning documentation.

Changes

Family-node adapter edges, GUID v2 tail, callcenter surfaces, aiwar OSINT POC

Layer / File(s) Summary
soa_graph: classid filtering + collision-aware family-node adapter edges
crates/lance-graph-contract/src/soa_graph.rs
project_snapshot now pre-filters rows by domain.classid, replaces member-identity adjacency with a collision-aware family_low_byte resolver that maps EdgeBlock adapter bytes to unambiguous family nodes (skipping ambiguous and self-targeting bytes), updates family-node creation to use member-count maps, and scopes nearest_anchor to domain rows. Tests are rewritten to assert family: namespace edge targets and ambiguous-byte skipping.
GUID v2 tail: NodeGuid v2 API, GuidPartsV2, NiblePath lowering
crates/lance-graph-contract/Cargo.toml, crates/lance-graph-contract/src/canonical_node.rs, crates/lance-graph-contract/src/hhtl.rs
Adds the guid-v2-tail Cargo feature. Under that flag, NodeGuid gains new_v2, leaf/family_v2/identity_v2/local_key_v2 accessors, decode_v2 returning GuidPartsV2, and to_hex_v2. NiblePath gains from_guid_prefix_v2 composing HEEL/HIP/TWIG/leaf into a full-depth routing path. Feature-gated tests validate field isolation, v1/v2 prefix parity, and routing path depth.
aiwar OSINT family-node projection
crates/lance-graph-contract/src/lib.rs, crates/lance-graph-contract/src/aiwar.rs, crates/lance-graph-contract/examples/aiwar_family_poc.rs
Adds AiwarClassView (deterministic category→family-id mapping from a LiteralGraph) and aiwar_node_rows (converts graph entities to NodeRows with EdgeBlock adapter slots encoding out-of-family target-category bytes). The aiwar_family_poc example binary ingests an aiwar JSON graph, projects an OSINT GraphSnapshot, and prints per-category family member counts.
callcenter: Gremlin traversal kernel and DataFusion table providers
crates/lance-graph-callcenter/src/lib.rs, crates/lance-graph-callcenter/src/graph_gremlin.rs, crates/lance-graph-callcenter/src/graph_table.rs
Adds graph_gremlin (GraphTraversalSource, Traversal with out/in_/out_e/in_e steps and values_kind/to_vec/count terminals) and graph_table (Arrow schemas, nodes_batch/edges_batch, MemTable providers, register_graph under query feature). Both are wired into lib.rs. Unit and async SQL tests cover traversal directionality, label filtering, multi-hop chaining, and SQL over registered tables.
Planning and board documentation
.claude/board/*, .claude/plans/guid-v2-tail-per-family-codebook-v1.md, .claude/plans/unified-soa-rubikon-integration-v1.md
AGENT_LOG, EPIPHANIES, INTEGRATION_PLANS, LATEST_STATE, and TECH_DEBT board files are updated with entries for this PR's work. A new guid-v2-tail-per-family-codebook-v1.md plan file is added describing the proposed 3×u16 repartitioning, per-family codebook scoping, capacity gates, and D-GV2-1..5 deliverables.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as aiwar_family_poc
    participant Contract as lance-graph-contract
    participant SOA as soa_graph::project_snapshot
    participant Callcenter as lance-graph-callcenter

    CLI->>Contract: ingest_aiwar_json(path) → LiteralGraph
    CLI->>Contract: AiwarClassView::from_graph(&graph)
    Contract-->>CLI: category→family_id map
    CLI->>Contract: aiwar_node_rows(&graph) → Vec<NodeRow>
    Note over Contract: EdgeBlock slots = out-of-family target-category bytes
    CLI->>SOA: project_snapshot(OSINT_GOTHAM, rows)
    Note over SOA: filter by classid, collision-aware family-node adapter resolution
    SOA-->>CLI: GraphSnapshot (family + member nodes, family: edges)
    CLI->>Callcenter: g(&snap).v(&[ids]).out_e(label).to_vec()
    Callcenter-->>CLI: reached vertex ids
    CLI->>Callcenter: register_graph(&ctx, &snap)
    Callcenter-->>CLI: nodes/edges MemTable in SessionContext
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • AdaWorldAPI/lance-graph#557: The main PR's soa_graph.rs changes (domain filtering, EdgeBlock in/out-family adapter resolution, nearest_anchor scoping) directly evolve the contract::soa_graph projector introduced in this PR.
  • AdaWorldAPI/lance-graph#489: The main PR's guid-v2-tail additions to canonical_node.rs layer directly on top of the NodeGuid/EdgeBlock/NodeRow foundation established in this PR.
  • AdaWorldAPI/lance-graph#507: Both PRs extend NiblePath in hhtl.rs with GUID-prefix routing helper constructors (from_guid_prefix vs from_guid_prefix_v2).

Poem

🐇 Hoppity-hop through the family tree,
Each adapter byte resolves with glee,
No more aliasing when counts exceed,
Collision-aware skips plant a new seed,
GUID v2 tail stretches three tiers wide,
And Gremlin hops along for the ride! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses three major components shipped in this PR: OSINT family-adapter edges, Callcenter DataFusion/Gremlin integration, and the aiwar POC, plus the codex roll-up of P1 fixes. It accurately summarizes the primary changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a4b76ac01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/lance-graph-callcenter/src/graph_gremlin.rs Outdated
Comment thread crates/lance-graph-contract/src/aiwar.rs Outdated
claude added 4 commits June 20, 2026 18:32
…ation plan

Capture the operator's what-if before it dilutes: repartition the 48-bit
basin tail family(u24)|identity(u24) -> leaf(u16)|family(u16)|identity(u16)
(uniform 8×u16 key) + scope codebooks per family (family -> Codebook,
finer sibling of classid -> ClassView).

Blast radius measured (grep): CONTAINED in lance-graph — q2/smb-office-rs/
medcare-rs = 0 (no downstream touches NodeGuid); routing prefix
(from_guid_prefix / mailbox_scan / NiblePath<->entity_type) is
tail-agnostic; ~3 layout files (canonical_node, soa_graph, aiwar) + ~35
mostly-test NodeGuid::new call sites. Unrelated 0xFFFF.. hits are cycle
counters / fingerprint masks, not the GUID tail.

PROPOSED — gated on operator sign-off (canon version bump per
I-LEGACY-API-FEATURE-GATED) + two capacity numbers (<=65536 identities
per (leaf,family); <=256 codebook entries per family before split). Ships
feature-gated guid-v2-tail (default OFF) with field-isolation matrix +
version gate. INTEGRATION_PLANS prepended; D-GV2-1..5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi
… mixin

Operator design lock (2026-06-20) refining the guid-v2-tail plan:
- leaf is the 4th HHTL routing tier ("a natural HHTL"): the cascade is
  HEEL·HIP·TWIG·leaf = 4 tiers × 4 nibbles = 16 nibbles = a full u64
  NiblePath. from_guid_prefix_v2 = HEEL·HIP·TWIG·leaf (classid is the
  separate codebook prefix).
- family = the basin / episodic hub; identity = instance. basin-local
  key = family·identity (4 bytes, was 6). u16 offsets: leaf 10..12,
  family 12..14, identity 14..16.
- Family node = episodic basin: connections accumulated on the basin ARE
  the supporting edges of every member. Mixin = O(1) address reference
  (a byte), shared state stored once on the basin, distance = HHTL hop
  arithmetic on the address — never O(n) edge materialization or BFS.

EPIPHANY E-MIXIN-IS-AN-ADDRESS-REFERENCE-NOT-A-COPY captures it; the
open leaf/family decision is RESOLVED (leaf=routing, family=basin).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi
…feature-gated

Operator greenlit the guid-v2-tail plan. Ships D-GV2-1: the v2 basin tail
behind feature `guid-v2-tail` (default OFF), ADDITIVE + NON-breaking (v1
new/family()/identity() untouched).

canonical_node (under #[cfg(feature="guid-v2-tail")]):
- new_v2(classid,heel,hip,twig,leaf,family,identity) — leaf/family/identity
  as full u16 at fixed offsets 10..12 / 12..14 / 14..16 (no u24).
- leaf() (4th HHTL tier), family_v2() (basin/episodic hub), identity_v2()
  (instance), local_key_v2() (family++identity, 4B), decode_v2/GuidPartsV2,
  to_hex_v2 (uniform 4-hex Display), GUID_TAIL_LAYOUT_VERSION_V2 = 2.

hhtl: from_guid_prefix_v2 = HEEL·HIP·TWIG·leaf (16 nibbles). leaf IS in the
routing path; family/identity are the basin tail (NOT in the path); classid
is the separate codebook prefix.

Per I-LEGACY-API-FEATURE-GATED: distinct v2 names (no function silently
changes semantics under the flag), field-isolation matrix test (vary one
tier → only that accessor changes), v1/v2 coexistence test, leaf-in-path
test, version-gate const. Cutover (rename v2→canonical, deprecate v1,
ENVELOPE_LAYOUT_VERSION bump) = D-GV2-5, after D-GV2-2/3/4 adopt the v2
accessors.

Verified BOTH configs: default lib tests 703 (unchanged, non-breaking);
--features guid-v2-tail 706 (+3 v2 tests); clippy -D warnings clean both.
Plan D-GV2-1 marked SHIPPED; AGENT_LOG updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi
…on tile pyramid

Operator design lock: making every tier the same size (u16) makes the KEY,
the per-family CODEBOOK, the VALUE tile, and the PERTURBATION pyramid all the
SAME 2bit×2bit 4×4 Morton-tile primitive.

- 1 nibble = 2bit×2bit = a 4×4 Morton tile = FAN_OUT-16 (one HHTL level;
  the same morton4 the domino AMX kernel runs on the value side).
- 1 u16 tier = 4 nibbles = a 256×256 Morton tile (256 = 4⁴, the OGAR
  centroid tile per tier; nibble-interleave = alternating-axis refinement).
- 8 u16 tiers = one stacked pyramid.

One kernel (Morton + AMX 4×4 BF16 GEMM sweeps tiers/codebooks/values
uniformly), one distance (Morton common-prefix = HHTL hop =
family_hop_count), one codebook shape (256×256 per tier). The 24+24 tail
broke this (u24 ≠ clean tile); 16+16+16 restores it. EPIPHANY
E-UNIFORM-MORTON-TILE-PYRAMID; plan section added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
crates/lance-graph-callcenter/src/graph_gremlin.rs (1)

110-120: ⚡ Quick win

Avoid repeated linear scans in values_kind().

values_kind() does a full node scan per current ID. Building an ID→kind index once per call removes the O(|current|×|nodes|) behavior.

⚡ Proposed refactor
-use std::collections::HashSet;
+use std::collections::{HashMap, HashSet};
...
 pub fn values_kind(&self) -> Vec<String> {
+    let kinds_by_id: HashMap<&str, &str> = self
+        .snap
+        .nodes
+        .iter()
+        .map(|n| (n.id.as_str(), n.kind.as_str()))
+        .collect();
     self.current
         .iter()
-        .filter_map(|id| {
-            self.snap
-                .nodes
-                .iter()
-                .find(|n| &n.id == id)
-                .map(|n| n.kind.clone())
-        })
+        .filter_map(|id| kinds_by_id.get(id.as_str()).map(|k| (*k).to_string()))
         .collect()
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lance-graph-callcenter/src/graph_gremlin.rs` around lines 110 - 120,
The `values_kind()` method currently performs a full linear scan of
`self.snap.nodes` for each ID in `self.current`, resulting in
O(|current|×|nodes|) complexity. Instead, build a HashMap index that maps node
IDs to their kinds once at the start of the method, then iterate through
`self.current` and use the HashMap to look up each kind in constant time. This
will change the complexity to O(|nodes| + |current|) by trading space for time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/lance-graph-callcenter/src/graph_gremlin.rs`:
- Around line 36-41: The v method currently accepts arbitrary vertex IDs without
validating they exist in the graph, allowing operations like count() to report
non-existent vertices. In the else branch of the v method where custom IDs are
provided, filter the input IDs to only include those that actually exist in
self.snap.nodes. Modify the mapping logic to check each ID against the available
nodes in the graph before adding it to the current vector.

In `@crates/lance-graph-callcenter/src/graph_table.rs`:
- Around line 197-204: The test currently only verifies that the query result
has one row but does not validate the actual aggregate count value returned by
the COUNT(*) query. Extract the count value from the first batch (batches[0])
and add an assertion to verify it equals the expected node count of 4 (as
mentioned in the comment: 2 OSINT members plus 2 family nodes). This ensures the
test fails if the wrong total is returned, not just if the result shape is
incorrect.

In `@crates/lance-graph-contract/src/aiwar.rs`:
- Around line 39-43: The enumerate index cast to u32 in the families mapping
lacks bounds validation, which allows silent truncation via the masking
operation at lines 106-107 that could produce duplicate NodeGuids. Add explicit
validation to ensure the enumerated index does not exceed the 24-bit limit
(0xFF_FFFF) before the cast to u32, and raise an error if this limit is breached
instead of silently wrapping or masking the value. Apply the same validation fix
to the other location mentioned at lines 101-107 where similar index-to-guid
conversion occurs.

In `@crates/lance-graph-contract/src/soa_graph.rs`:
- Around line 392-397: The ambiguity test filter on line 395 uses an incorrect
source-id suffix in the ends_with call. Change the suffix from "010000000001" to
"000100000001" which is the canonical tail for family=0x0100 and identity=1.
This ensures the filter correctly identifies the intended node with the
ambiguous low byte and prevents the test from passing vacuously.

---

Nitpick comments:
In `@crates/lance-graph-callcenter/src/graph_gremlin.rs`:
- Around line 110-120: The `values_kind()` method currently performs a full
linear scan of `self.snap.nodes` for each ID in `self.current`, resulting in
O(|current|×|nodes|) complexity. Instead, build a HashMap index that maps node
IDs to their kinds once at the start of the method, then iterate through
`self.current` and use the HashMap to look up each kind in constant time. This
will change the complexity to O(|nodes| + |current|) by trading space for time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 365a4a8b-f78e-4469-bc7d-04f184b812ce

📥 Commits

Reviewing files that changed from the base of the PR and between d865560 and 5dd1645.

📒 Files selected for processing (17)
  • .claude/board/AGENT_LOG.md
  • .claude/board/EPIPHANIES.md
  • .claude/board/INTEGRATION_PLANS.md
  • .claude/board/LATEST_STATE.md
  • .claude/board/TECH_DEBT.md
  • .claude/plans/guid-v2-tail-per-family-codebook-v1.md
  • .claude/plans/unified-soa-rubikon-integration-v1.md
  • crates/lance-graph-callcenter/src/graph_gremlin.rs
  • crates/lance-graph-callcenter/src/graph_table.rs
  • crates/lance-graph-callcenter/src/lib.rs
  • crates/lance-graph-contract/Cargo.toml
  • crates/lance-graph-contract/examples/aiwar_family_poc.rs
  • crates/lance-graph-contract/src/aiwar.rs
  • crates/lance-graph-contract/src/canonical_node.rs
  • crates/lance-graph-contract/src/hhtl.rs
  • crates/lance-graph-contract/src/lib.rs
  • crates/lance-graph-contract/src/soa_graph.rs

Comment on lines +36 to +41
pub fn v(&self, ids: &[&str]) -> Traversal<'a> {
let current: Vec<String> = if ids.is_empty() {
self.snap.nodes.iter().map(|n| n.id.clone()).collect()
} else {
ids.iter().map(|s| s.to_string()).collect()
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Filter v(ids) seeds to vertices that actually exist.

Right now Line 40 accepts arbitrary IDs, so count()/to_vec() can report non-existent vertices (e.g., v(&["missing"]).count() == 1).

💡 Proposed fix
 pub fn v(&self, ids: &[&str]) -> Traversal<'a> {
+    let node_ids: HashSet<&str> = self.snap.nodes.iter().map(|n| n.id.as_str()).collect();
     let current: Vec<String> = if ids.is_empty() {
         self.snap.nodes.iter().map(|n| n.id.clone()).collect()
     } else {
-        ids.iter().map(|s| s.to_string()).collect()
+        ids.iter()
+            .copied()
+            .filter(|id| node_ids.contains(*id))
+            .map(str::to_string)
+            .collect()
     };
     Traversal {
         snap: self.snap,
         current,
     }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lance-graph-callcenter/src/graph_gremlin.rs` around lines 36 - 41, The
v method currently accepts arbitrary vertex IDs without validating they exist in
the graph, allowing operations like count() to report non-existent vertices. In
the else branch of the v method where custom IDs are provided, filter the input
IDs to only include those that actually exist in self.snap.nodes. Modify the
mapping logic to check each ID against the available nodes in the graph before
adding it to the current vector.

Comment on lines +197 to +204
// GROUP BY over node kinds: 2 OSINT members + 2 family nodes.
let df = ctx
.sql("SELECT count(*) AS n FROM nodes")
.await
.unwrap();
let batches = df.collect().await.unwrap();
assert_eq!(batches[0].num_rows(), 1);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the aggregate value, not only result shape.

This currently verifies one output row, but not that count(*) is the expected node count. A wrong total could still pass.

✅ Tighten test assertion
-        let df = ctx
-            .sql("SELECT count(*) AS n FROM nodes")
+        let df = ctx
+            .sql("SELECT count(*) AS n FROM nodes HAVING count(*) = 4")
             .await
             .unwrap();
         let batches = df.collect().await.unwrap();
-        assert_eq!(batches[0].num_rows(), 1);
+        let rows: usize = batches.iter().map(|b| b.num_rows()).sum();
+        assert_eq!(rows, 1, "expected exactly 4 projected nodes");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// GROUP BY over node kinds: 2 OSINT members + 2 family nodes.
let df = ctx
.sql("SELECT count(*) AS n FROM nodes")
.await
.unwrap();
let batches = df.collect().await.unwrap();
assert_eq!(batches[0].num_rows(), 1);
}
// GROUP BY over node kinds: 2 OSINT members + 2 family nodes.
let df = ctx
.sql("SELECT count(*) AS n FROM nodes HAVING count(*) = 4")
.await
.unwrap();
let batches = df.collect().await.unwrap();
let rows: usize = batches.iter().map(|b| b.num_rows()).sum();
assert_eq!(rows, 1, "expected exactly 4 projected nodes");
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lance-graph-callcenter/src/graph_table.rs` around lines 197 - 204, The
test currently only verifies that the query result has one row but does not
validate the actual aggregate count value returned by the COUNT(*) query.
Extract the count value from the first batch (batches[0]) and add an assertion
to verify it equals the expected node count of 4 (as mentioned in the comment: 2
OSINT members plus 2 family nodes). This ensures the test fails if the wrong
total is returned, not just if the result shape is incorrect.

Comment on lines +39 to +43
let families = labels
.into_iter()
.enumerate()
.map(|(i, l)| (l, (i as u32) + 1))
.collect();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid silent u24 truncation when minting GUID family/identity.

(i as u32) plus masking at Lines 106-107 (& 0x00FF_FFFF) can silently wrap large inputs, producing duplicate NodeGuids and mislinked snapshots instead of failing fast.

Suggested patch
-        let families = labels
+        let families = labels
             .into_iter()
             .enumerate()
-            .map(|(i, l)| (l, (i as u32) + 1))
+            .map(|(i, l)| {
+                let fam = u32::try_from(i + 1).expect("category index exceeds u32");
+                assert!(fam <= 0x00FF_FFFF, "family id exceeds u24");
+                (l, fam)
+            })
             .collect();
@@
-            NodeRow {
+            let identity = u32::try_from(i).expect("node index exceeds u32");
+            assert!(identity <= 0x00FF_FFFF, "identity exceeds u24");
+            NodeRow {
                 key: NodeGuid::new(
                     NodeGuid::CLASSID_OSINT,
                     0,
                     0,
                     0,
-                    fam & 0x00FF_FFFF,
-                    (i as u32) & 0x00FF_FFFF,
+                    fam,
+                    identity,
                 ),

Also applies to: 101-107

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lance-graph-contract/src/aiwar.rs` around lines 39 - 43, The enumerate
index cast to u32 in the families mapping lacks bounds validation, which allows
silent truncation via the masking operation at lines 106-107 that could produce
duplicate NodeGuids. Add explicit validation to ensure the enumerated index does
not exceed the 24-bit limit (0xFF_FFFF) before the cast to u32, and raise an
error if this limit is breached instead of silently wrapping or masking the
value. Apply the same validation fix to the other location mentioned at lines
101-107 where similar index-to-guid conversion occurs.

Comment on lines +392 to +397
let from_0100 = snap
.edges
.iter()
.filter(|e| e.label == "linked" && e.source.ends_with("010000000001"))
.count();
assert_eq!(from_0100, 0, "ambiguous low byte must be skipped");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the source-id suffix in the ambiguity test filter.

On Line 395, ends_with("010000000001") does not match the canonical tail for family=0x0100, identity=1 (000100000001), so this check can miss the intended node and pass vacuously.

Suggested patch
-            .filter(|e| e.label == "linked" && e.source.ends_with("010000000001"))
+            .filter(|e| e.label == "linked" && e.source.ends_with("000100000001"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let from_0100 = snap
.edges
.iter()
.filter(|e| e.label == "linked" && e.source.ends_with("010000000001"))
.count();
assert_eq!(from_0100, 0, "ambiguous low byte must be skipped");
let from_0100 = snap
.edges
.iter()
.filter(|e| e.label == "linked" && e.source.ends_with("000100000001"))
.count();
assert_eq!(from_0100, 0, "ambiguous low byte must be skipped");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lance-graph-contract/src/soa_graph.rs` around lines 392 - 397, The
ambiguity test filter on line 395 uses an incorrect source-id suffix in the
ends_with call. Change the suffix from "010000000001" to "000100000001" which is
the canonical tail for family=0x0100 and identity=1. This ensures the filter
correctly identifies the intended node with the ambiguous low byte and prevents
the test from passing vacuously.

…mily edges

(1) graph_gremlin: step() silently deduped reached targets via a `seen` set,
breaking Gremlin bag/multiset semantics — g.v(["A","C"]).out().count() gave
1 not 2 when both reach B. Rewrote to per-traverser emission (duplicates
preserved); set semantics is now the explicit `dedup()` step. +test
out_preserves_bag_multiplicity.

(2) aiwar: aiwar_node_rows wrote cross-category adapter bytes into the first
12 in_family slots (labeled `linked` by project_snapshot), so `references`
queries missed them and the label flipped with sorted fan-out count. aiwar
edges are ALL cross-family (built from tf != fam) → write them to the 4
out_family slots (`references`), cap 4. Test asserts `references` present and
no `linked`.

contract aiwar 3/3; callcenter gremlin 8/8 (+1 bag test); clippy clean (new
files; pre-existing TD-CALLCENTER-QUERY-CLIPPY untouched).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi
@AdaWorldAPI AdaWorldAPI merged commit c05394f into main Jun 20, 2026
6 checks passed
AdaWorldAPI pushed a commit that referenced this pull request Jun 20, 2026
…-tail)

Continues the v2 arc after #560 merged. The type + in-memory registry tier
of the family→Codebook scoping (the Lance-backed/OntologyRegistry tier is
deferred). Zero-dep, feature-gated (default OFF).

contract::codebook:
- Codebook — insertion-ordered index↔label interning, 1-byte index,
  CODEBOOK_CAP=256; intern() returns None on overflow (the split-the-family
  signal, never widen the byte).
- FamilyCodebookRegistry — family(u16) → Codebook; intern(family,label),
  resolve(family,index) for cross-family decode. Per-family scoping: the SAME
  label gets INDEPENDENT indices in different families (no global codebook
  contamination — this is what dissolves the aiwar "60 noisy families").

The finer sibling of classid→ClassView; the family node's episodic-basin
content (E-MIXIN-IS-AN-ADDRESS-REFERENCE-NOT-A-COPY); the 256×256 Morton tile
(≤256 leaves for the 1-byte in-family index, E-UNIFORM-MORTON-TILE-PYRAMID).

3 tests (dedup/sequential, overflow-split, per-family scoping). --features
guid-v2-tail green; default build clean (codebook absent); clippy -D warnings
clean both. Plan D-GV2-2 marked PARTIAL; AGENT_LOG updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi
AdaWorldAPI pushed a commit that referenced this pull request Jun 20, 2026
…+ canon conflict

Operator asked to DO the ontology-schema migration documentation. Grounded
in OGAR crates/ogar-vocab/src/lib.rs (the real codebook): it already defines
CODEBOOK (domain-encoded 0xDDCC), ConceptDomain + canonical_concept_domain,
source_domain_concept(project|erp), canonical_concept_id, and LabelDTO — and
its own note says LabelDTO "long-term belongs in lance-graph-contract;
codebook id == NodeGuid.classid low u16."

Surfaces a canon CONFLICT: merged CLASSID_OSINT=0x0007 routes to OGAR's
Reserved domain (OSINT is 0x07XX); CLASSID_FMA=0x0008 sits in OGAR's OCR
block (FMA/anatomy is clinical → Health 0x09XX). Root cause: 0x0007 minted
from the early "OSINT is 0x0007" guess before ogar-vocab's 0xDDCC layout was
consulted.

New .claude/plans/ogar-vocab-contract-codebook-migration-v1.md (D-OVC-1..4):
host the codebook/ConceptDomain/LabelDTO in contract, classids follow 0xDDCC
(mint project 0x01XX + ERP 0x02XX; realign OSINT→0x0700, FMA→Health). The
per-family codebook (D-GV2-2) is the finer scope of the same idea.

NO code minted/rewritten: realigning merged OSINT/FMA rewrites canon
(#557/#560 + CLAUDE.md canon block) → operator sign-off required (plan §5,
three decisions). INTEGRATION_PLANS prepended; ISSUES ISS-CLASSID-OGAR-DRIFT
filed; AGENT_LOG updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CcpLeEC3XK8Eye53GKBVvi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants