osint/fma: hydrate the bake from a TTL source, not hand-typed nodes#50
Conversation
Close the demo-gap wiring on the buildable surface: `ttl → hydrate → canonical-GUID → .soa`. The bake no longer hand-types 95 invented heart nodes; it parses an FMA `.ttl` fixture and hydrates it. - `data/fma-heart.fixture.ttl` — a FIXTURE (clearly labeled, NOT the real FMA): real FMA class names (Heart, Left ventricle, Myocardium…) + the canonical predicate set (`bfo:part_of`, `rdfs:subClassOf`). The real 266 MB `fma.owl` (OGIT contextId 13, MedCare-rs release asset, absent from this light closure) loads via `lance-graph-rdf` / `lance_graph_ontology::hydrate_fma` at the spine — same shape, more rows. - `fma_ttl.rs` — a minimal line-oriented Turtle reader (the source side of the thread; the spine uses the full OWL loader). 2 unit tests. - `fma.rs` — `hydrate_fma()` replaces `build_heart()`, the light-bake twin of `lance_graph_ontology::hydrate_fma`: walk the `bfo:part_of` partonomy into the canonical 8:8 HHTL cascade (sibling-rank per depth → the `[mixin:instance]` tier), project `rdfs:subClassOf` onto the cross-cutting type ceiling. Nothing is hardcoded to "heart" — the real 75K hydrates through the same walk. Why this shape: the 8:8 cascade key IS OGAR's operator-pinned canon (each tier = a 256×256 centroid tile); SPO is the extraction wire, the GUID-key the hydrated runtime — exactly the forward/backward hydration the spine already runs (ruff `expand`/`reassemble` → `ogar-from-*` → OntologyRegistry). Verified: bake hydrates 20 nodes / 28 edges from the fixture; byte-parity exact (cockpit read offsets match the hydrated GUID across organ→chamber→ wall, ceiling sentinel intact); reader unit tests pass; my files clippy-clean (the `-D warnings` failure is the pre-existing aiwar-ingest dep, untouched); cockpit tsc clean. Browser render not exercised here. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TzqvDqbFRzyx17EkLKBoZF
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
📝 WalkthroughWalkthroughAdds a hand-authored Turtle fixture ( ChangesFMA Heart Fixture and Hydration
Sequence Diagram(s)sequenceDiagram
participant main
participant hydrate_fma
participant fma_ttl
participant Builder
main->>hydrate_fma: hydrate_fma(FMA_TTL)
hydrate_fma->>fma_ttl: parse(ttl)
fma_ttl-->>hydrate_fma: Fragment { part_of, is_a }
hydrate_fma->>hydrate_fma: compute children map + sibling ranks
loop BFS by depth
hydrate_fma->>Builder: part_of_node(label, depth, rank)
end
loop is_a targets
hydrate_fma->>Builder: type_node(label)
end
hydrate_fma->>Builder: emit REL_PART_OF edges
hydrate_fma->>Builder: emit REL_IS_A edges
hydrate_fma-->>main: Builder
main->>Builder: lookup tissue node + dual-membership proof
main->>Builder: emit_oso1()
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/osint-bake/src/bin/fma.rs`:
- Around line 194-199: The BTreeMap collection in the parent_of variable
silently collapses multi-parent relationships since BTreeMap only keeps one
value per key, but the edge emission logic later still uses all entries from
frag.part_of, causing inconsistency. Modify the map construction logic to detect
when a child component appears more than once in frag.part_of (indicating
multiple parents) and handle this ambiguity by either returning an error or
panicking with a clear message instead of silently dropping parent
relationships. This ensures the addressing and graph edges remain consistent
throughout the partonomy processing.
- Around line 139-140: Replace the unchecked casts to u16 at the identity
assignment in the node creation (line 139 around the `i as u16` cast), the
sibling rank assignments (lines 159, 214, 275), and the edge index assignments
in `emit_oso1` (lines 307-308) with checked conversions using try_into() or
similar Rust fallible conversion patterns. This will prevent silent truncation
when handling datasets with more than 65,535 nodes, ensuring the code properly
handles the real 75K FMA via hydrate_fma without corrupting node indices and
sibling ranks.
In `@crates/osint-bake/src/fma_ttl.rs`:
- Around line 45-49: The parse function in fma_ttl.rs is too lenient when
validating triple grammar. First, enforce the required terminal dot by removing
the unwrap_or fallback so that lines without a trailing dot are rejected instead
of accepted. Second, add a check after the destructuring pattern to ensure there
are no remaining tokens from the split_whitespace iterator, so that lines with
extra trailing tokens beyond the required subject, predicate, and object are
also rejected. This will prevent malformed lines from being silently accepted
and ensure strict adherence to the documented triple format.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7a1133a7-f273-4a55-bf71-869b8f9f3236
📒 Files selected for processing (5)
cockpit/public/fma.soacrates/osint-bake/data/fma-heart.fixture.ttlcrates/osint-bake/src/bin/fma.rscrates/osint-bake/src/fma_ttl.rscrates/osint-bake/src/lib.rs
| i as u16, // identity — stable node id | ||
| ); |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n '\bas u(8|16)\b' crates/osint-bake/src/bin/fma.rs
python - <<'PY'
from pathlib import Path
from collections import defaultdict
ttl_path = Path("crates/osint-bake/data/fma-heart.fixture.ttl")
lines = ttl_path.read_text().splitlines()
children = defaultdict(list)
nodes = set()
for raw in lines:
line = raw.strip()
if not line or line.startswith("#") or line.startswith("`@prefix`"):
continue
if not line.endswith("."):
continue
s, p, o, *_ = line[:-1].split()
if p in {"bfo:part_of", "fma:regional_part_of", "fma:constitutional_part_of"}:
children[o].append(s)
nodes.add(s)
nodes.add(o)
max_siblings = max((len(v) for v in children.values()), default=0)
print(f"fixture_nodes={len(nodes)}")
print(f"fixture_max_siblings={max_siblings}")
print("u8_sibling_capacity_ok=", max_siblings <= 255)
print("u16_node_capacity_ok=", len(nodes) <= 65535)
PYRepository: AdaWorldAPI/q2
Length of output: 714
🏁 Script executed:
# First, let's map the file structure and get an outline
ast-grep outline crates/osint-bake/src/bin/fma.rs --view expandedRepository: AdaWorldAPI/q2
Length of output: 1097
🏁 Script executed:
# Now let's examine the specific lines mentioned in the review
# Lines 139-140
sed -n '130,150p' crates/osint-bake/src/bin/fma.rs
# Lines 159-160
sed -n '150,170p' crates/osint-bake/src/bin/fma.rs
# Lines 214-215
sed -n '205,225p' crates/osint-bake/src/bin/fma.rs
# Lines 275-276
sed -n '265,285p' crates/osint-bake/src/bin/fma.rsRepository: AdaWorldAPI/q2
Length of output: 3085
🏁 Script executed:
# Let's understand the context better - what are the actual input constraints?
# Check how large the FMA data typically is and what the code is designed to handle
cat crates/osint-bake/data/fma-heart.fixture.ttl | head -30
# Count the actual number of nodes and siblings in the fixture
wc -l crates/osint-bake/data/fma-heart.fixture.ttl
# Let's examine the complete context around line 86 to understand the tier() function
sed -n '80,95p' crates/osint-bake/src/bin/fma.rs
# Check the NodeGuid::new_v2 signature to understand what it expects
rg -A 5 'fn new_v2' crates/osint-bake/src/Repository: AdaWorldAPI/q2
Length of output: 2201
🏁 Script executed:
# Let's also look at the actual limits and constants defined
sed -n '51,83p' crates/osint-bake/src/bin/fma.rs
# Check if there's any documentation about expected data size limits
rg -i 'limit|capacity|max' crates/osint-bake/src/bin/fma.rs | head -20Repository: AdaWorldAPI/q2
Length of output: 2358
🏁 Script executed:
# Let's check the complete hydrate_fma function to understand input size constraints
sed -n '188,250p' crates/osint-bake/src/bin/fma.rs
# Check how idx (the node index) grows and whether it can exceed u16
sed -n '240,290p' crates/osint-bake/src/bin/fma.rsRepository: AdaWorldAPI/q2
Length of output: 4052
🏁 Script executed:
# Let's understand the design intent by looking at comments about HHTL and tiers
sed -n '73,88p' crates/osint-bake/src/bin/fma.rs
# Also check if there's any validation at the API boundary
rg -B 3 -A 3 'hydrate_fma|emit_oso1' crates/osint-bake/src/bin/fma.rs | head -40Repository: AdaWorldAPI/q2
Length of output: 2682
🏁 Script executed:
# Check if there are any tests that verify the constraints or limits
rg -i 'test|assert|expect' crates/osint-bake/src/bin/fma.rs
# Check if there's a separate test file for fma
find crates/osint-bake -name '*test*' -o -name '*fma*'
# Let's also check if this is the only place narrowing casts happen or if they're
# systematically guarded elsewhere in the codebase
rg 'try_from.*expect' crates/osint-bake/src/ | head -20Repository: AdaWorldAPI/q2
Length of output: 499
🏁 Script executed:
# Let's verify the actual design constraints by examining what type_node expects
sed -n '150,168p' crates/osint-bake/src/bin/fma.rs
# And part_of_node signature
sed -n '114,145p' crates/osint-bake/src/bin/fma.rsRepository: AdaWorldAPI/q2
Length of output: 1722
🏁 Script executed:
# Let's examine the NodeGuid::new_v2 signature to understand what parameter types it expects
rg -B 3 -A 10 'fn new_v2' crates/osint-bake/src/
# And check if there are any architectural comments about the 75K FMA mentioned
rg -i '75k|real fma' crates/osint-bake/src/bin/fma.rsRepository: AdaWorldAPI/q2
Length of output: 268
🏁 Script executed:
# Let's verify the actual fixture size and compare to the claimed 75K real FMA
echo "=== Fixture analysis ==="
grep -c 'bfo:part_of\|rdfs:subClassOf' crates/osint-bake/data/fma-heart.fixture.ttl
# Count unique nodes in fixture
grep -o 'fma:[A-Za-z_]*' crates/osint-bake/data/fma-heart.fixture.ttl | sort -u | wc -l
# Let's check if there's any documentation about the scaling claim
rg -i 'real.*fma|75k|65k|overflow' crates/osint-bake/src/Repository: AdaWorldAPI/q2
Length of output: 804
🏁 Script executed:
# Let's examine if there are any guards or comments about the 65535 u16 limit
rg -B 2 -A 2 'as u16.*node|node.*as u16' crates/osint-bake/src/bin/fma.rs
# Check what NodeGuid::new_v2 actually takes and if it documents the limits
rg -B 5 -A 15 'struct NodeGuid\|fn new_v2' crates/osint-bake/src/
# Let's verify the claim about 75K FMA by checking what the actual constraint is
echo "=== Checking if 75K claim is realistic ==="
# The fixture shows: 21 unique nodes, 30 relations
# But 75K nodes would be 2.86x more than u16 can hold (65535)
python3 - <<'PY'
fixture_nodes = 21
fixture_relations = 30
claimed_real_fma = 75000
u16_max = 65535
u8_max = 255
print(f"Fixture nodes: {fixture_nodes}")
print(f"Claimed real FMA: {claimed_real_fma}")
print(f"u16 maximum: {u16_max}")
print(f"Would 75K overflow u16? {claimed_real_fma > u16_max}")
print(f"Overflow by: {claimed_real_fma - u16_max} nodes")
print()
print("Cast risk assessment:")
print(f"- i as u16 (node index) wraps at node {u16_max + 1}")
print(f"- k as u8 (sibling rank) wraps at rank {u8_max + 1}")
PYRepository: AdaWorldAPI/q2
Length of output: 801
🏁 Script executed:
# Let's look for any TODOs, FIXMEs, or documented limitations in fma.rs
rg -i 'todo|fixme|limitation|overflow|warn' crates/osint-bake/src/bin/fma.rsRepository: AdaWorldAPI/q2
Length of output: 152
🏁 Script executed:
# Let's check if hydrate_fma is designed to only work with the fixture or if it's
# actually intended to handle the real FMA
rg 'hydrate_fma|FMA_TTL' crates/osint-bake/src/bin/fma.rs
# Check if there's any architecture documentation about when this would be used
cat crates/osint-bake/src/lib.rs | head -50
# Let's also verify the fixture limitation is intentional
sed -n '179,190p' crates/osint-bake/src/bin/fma.rsRepository: AdaWorldAPI/q2
Length of output: 3743
🏁 Script executed:
# Check if the code would actually work with larger data or if it's architecturally limited
# Look at the complete emit_oso1 function
sed -n '296,320p' crates/osint-bake/src/bin/fma.rs
# Verify whether this is a fixture-only tool or general-purpose
rg -B 5 -A 5 'fn main' crates/osint-bake/src/bin/fma.rsRepository: AdaWorldAPI/q2
Length of output: 1182
Guard narrowing casts in key/rank generation to prevent silent truncation for larger FMA datasets.
The casts at lines 139, 159, 214, and 275 will silently wrap for inputs exceeding their type limits. While the fixture (21 nodes, 4 max siblings) is safe, the code is designed to handle the real 75K FMA via hydrate_fma—which at 75,000 nodes exceeds u16::MAX (65,535). Silent truncation will corrupt node indices and sibling ranks, breaking the byte-deterministic asset guarantee.
Suggested fix (checked conversions)
- .map_or(0, |k| (k as u8) + 1)
+ .map_or(0, |k| {
+ u8::try_from(k + 1).expect("sibling rank exceeds 8-bit tier capacity")
+ })
...
- let t = b.type_node(&fma_ttl::label_of(ty), type_idx.len() as u16);
+ let slot = u16::try_from(type_idx.len()).expect("type index exceeds u16 capacity");
+ let t = b.type_node(&fma_ttl::label_of(ty), slot);
...
- i as u16, // identity — stable node id
+ u16::try_from(i).expect("node index exceeds u16 identity capacity"),
...
- i as u16,
+ u16::try_from(i).expect("node index exceeds u16 identity capacity"),Also apply at lines 307–308 (edge indices in emit_oso1).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| i as u16, // identity — stable node id | |
| ); | |
| u16::try_from(i).expect("node index exceeds u16 identity capacity"), | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/osint-bake/src/bin/fma.rs` around lines 139 - 140, Replace the
unchecked casts to u16 at the identity assignment in the node creation (line 139
around the `i as u16` cast), the sibling rank assignments (lines 159, 214, 275),
and the edge index assignments in `emit_oso1` (lines 307-308) with checked
conversions using try_into() or similar Rust fallible conversion patterns. This
will prevent silent truncation when handling datasets with more than 65,535
nodes, ensuring the code properly handles the real 75K FMA via hydrate_fma
without corrupting node indices and sibling ranks.
| let parent_of: BTreeMap<&str, &str> = frag | ||
| .part_of | ||
| .iter() | ||
| .enumerate() | ||
| .map(|(k, t)| b.type_node(t, k as u16)) | ||
| .map(|(c, p)| (c.as_str(), p.as_str())) | ||
| .collect(); | ||
| let mut children: BTreeMap<&str, Vec<&str>> = BTreeMap::new(); |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
parent_of collapses multi-parent partonomy and makes addressing inconsistent with emitted edges.
collect() into BTreeMap<&str, &str> keeps one parent per child, but edge emission later still includes all frag.part_of links. That can produce one address/path while representing different parent edges in the graph.
Suggested fix (fail fast on ambiguous parentage)
- let parent_of: BTreeMap<&str, &str> = frag
- .part_of
- .iter()
- .map(|(c, p)| (c.as_str(), p.as_str()))
- .collect();
+ let mut parent_of: BTreeMap<&str, &str> = BTreeMap::new();
+ for (c, p) in &frag.part_of {
+ if let Some(prev) = parent_of.insert(c.as_str(), p.as_str()) {
+ assert!(
+ prev == p.as_str(),
+ "multiple part_of parents for {}: {} vs {}",
+ c,
+ prev,
+ p
+ );
+ }
+ }Also applies to: 201-205
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/osint-bake/src/bin/fma.rs` around lines 194 - 199, The BTreeMap
collection in the parent_of variable silently collapses multi-parent
relationships since BTreeMap only keeps one value per key, but the edge emission
logic later still uses all entries from frag.part_of, causing inconsistency.
Modify the map construction logic to detect when a child component appears more
than once in frag.part_of (indicating multiple parents) and handle this
ambiguity by either returning an error or panicking with a clear message instead
of silently dropping parent relationships. This ensures the addressing and graph
edges remain consistent throughout the partonomy processing.
| let body = line.strip_suffix('.').unwrap_or(line); | ||
| let mut toks = body.split_whitespace(); | ||
| let (Some(s), Some(p), Some(o)) = (toks.next(), toks.next(), toks.next()) else { | ||
| continue; | ||
| }; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Enforce the documented triple grammar instead of silently accepting malformed lines.
parse() currently accepts lines without terminal . and extra trailing tokens, which can mask fixture/data errors and produce partial hydration silently.
Suggested fix
- let body = line.strip_suffix('.').unwrap_or(line);
+ let Some(body) = line.strip_suffix('.') else {
+ continue;
+ };
let mut toks = body.split_whitespace();
let (Some(s), Some(p), Some(o)) = (toks.next(), toks.next(), toks.next()) else {
continue;
};
+ if toks.next().is_some() {
+ continue;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let body = line.strip_suffix('.').unwrap_or(line); | |
| let mut toks = body.split_whitespace(); | |
| let (Some(s), Some(p), Some(o)) = (toks.next(), toks.next(), toks.next()) else { | |
| continue; | |
| }; | |
| let Some(body) = line.strip_suffix('.') else { | |
| continue; | |
| }; | |
| let mut toks = body.split_whitespace(); | |
| let (Some(s), Some(p), Some(o)) = (toks.next(), toks.next(), toks.next()) else { | |
| continue; | |
| }; | |
| if toks.next().is_some() { | |
| continue; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/osint-bake/src/fma_ttl.rs` around lines 45 - 49, The parse function in
fma_ttl.rs is too lenient when validating triple grammar. First, enforce the
required terminal dot by removing the unwrap_or fallback so that lines without a
trailing dot are rejected instead of accepted. Second, add a check after the
destructuring pattern to ensure there are no remaining tokens from the
split_whitespace iterator, so that lines with extra trailing tokens beyond the
required subject, predicate, and object are also rejected. This will prevent
malformed lines from being silently accepted and ensure strict adherence to the
documented triple format.
What
The
/fmabake no longer hand-types nodes — it hydrates from a TTL source, closing the demo-gap wiring on the buildable surface:hydrate_fma()is the light-bake twin oflance_graph_ontology::hydrate_fma: it walks thebfo:part_ofpartonomy into the canonical HHTL cascade (each node's sibling-rank per depth → the[mixin:instance]8:8 tier) and projectsrdfs:subClassOfonto the cross-cutting global-type ceiling. Nothing is hardcoded to "heart" — the real 75K FMA hydrates through the same walk.Why this shape
The 8:8 cascade key is OGAR's operator-pinned canon (each tier = a 256×256 centroid tile); SPO is the extraction wire, the GUID-key the hydrated runtime — exactly the forward/backward hydration the spine already runs (ruff
expand/reassemble→ogar-from-*→OntologyRegistry). This makes the q2 toy's data hydrated-from-source and its addressing the operator canon.Files
data/fma-heart.fixture.ttl— a labeled FIXTURE (real FMA class names + the canonical predicate set), not the real ontology.fma_ttl.rs— a minimal line-oriented Turtle reader (the source side; 2 unit tests).fma.rs—hydrate_fma()replacesbuild_heart().Verified
Myocardium of left ventricle→HEEL[01:01]·HIP[02:02]·TWIG[03:03], ceiling sentinel intact).-D warningsfailure is the pre-existingaiwar-ingestdep, untouched); cockpittscclean.Honest scope — what this is NOT
fma.owl(OGIT contextId 13, MedCare-rs release asset) is absent from this light closure; it loads vialance-graph-rdf/lance_graph_ontology::hydrate_fmaat the spine — same shape, ~75K rows instead of 20.🤖 Generated with Claude Code
https://claude.ai/code/session_01TzqvDqbFRzyx17EkLKBoZF
Generated by Claude Code
Summary by CodeRabbit
New Features
Tests