Skip to content

Commit 96c1249

Browse files
authored
Merge pull request #581 from AdaWorldAPI/claude/content-store-contract-draft
feat(contract): ContentStore — content-addressed cold text store (D-CC-ARI-3)
2 parents 2ca26f0 + 6103438 commit 96c1249

3 files changed

Lines changed: 256 additions & 0 deletions

File tree

.claude/board/LATEST_STATE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@
120120

121121
## Current Contract Inventory (lance-graph-contract)
122122

123+
> **2026-06-21 — ADDED (content-store for the AriGraph/OSINT episodic arc)**: `lance_graph_contract::content_store::{ContentId, SourceSpan, ContentError, ContentStore, ContentSink}` — the content-addressed **cold text/blob store** contract. `ContentId(u64)` = `hash::fnv1a` of the bytes (stable across versions — the correct content address; `DefaultHasher` must never key one; `0` = sentinel). `SourceSpan{ContentId,u32,u32}` = the fixed-size, `Copy` typed form of `template-equivalence`'s `(source_id,start,end)` provenance; `is_cited()` = "no source span → no claim" (non-sentinel content + non-empty span). `ContentStore` (cold read: `resolve(id) -> Option<&[u8]>` zero-copy slice into the mmap/backing store; `resolve_span`/`contains` defaulted) + `ContentSink` (idempotent `put -> ContentId`, dedup by content-address: many episodes → one source row). **Hot/cold firewall (ADR-022)**: the hot path (SIMD sweep, AriGraph edge traversal) touches only the fixed-size `ContentId`/`SourceSpan`; bytes hydrate cold at the membrane (the fingerprint is the hot-path stand-in for text). Nothing variable-length enters the 512 B node. Additive, zero-dep; +6 tests (stable/dedup, idempotent put, resolve_span slice, OOB/missing errors, uncited-rejected); clippy clean. Consumers: `rs-graph-llm/episodic-arc-task` (replaces its local fnv1a), `template-equivalence` (typed provenance). Plan: `.claude/plans/arigraph-osint-episodic-v1.md` (D-CC-ARI-3). Branch `claude/content-store-contract-draft`.
124+
123125
> **2026-06-18 — ADDED (probe-excel-compute-dag-v1 Inc 0, the `compute_dag` Core gap)**: `lance_graph_contract::class_view::{ComputeEdge, compute_dag_is_acyclic}` + `ClassView::compute_dag(class) -> &[ComputeEdge]` (default `&[]`, zero-fallback). `ComputeEdge {target: u8, inputs: &'static [u8]}` is the harvest-sourced recompute edge (`emitted_by` target ← `depends_on` inputs; field positions index the class `FieldMask`), `const`-constructible like `MethodSig`/`ActionDef` (the harvest IS the manifest). `compute_dag_is_acyclic` is the **registry-build gate** — a cyclic recompute DAG (formula loop / `@api.depends` cycle / self-loop) is rejected at build (Kahn over ≤64 positions, allocation-free; out-of-range positions ignored, no panic, mirrors `FieldMask::from_positions`). This is the Core home for computed-field recompute *dispatch* that EVERY computed-field AR consumer needs (Odoo `@api.depends`, Excel formulas, medcare lab-trends, woa calc, q2 cells — they reduce to a sheet; `E-EXCEL-SHADER-PROJECTION`) and the NNUE-incremental existence-proof shape (`E-CHESS-TENSOR-PROVEN`). **Layout-preserving**: a default trait method + a free fn, resolution metadata ABOVE the SoA, stores nothing on the row, zero `NODE_ROW_STRIDE`/`ENVELOPE_LAYOUT_VERSION` impact (core-gap-auditor's EXTEND-CORE, never an adapter-state hack). The instance recompute that consumes it is gated per-cell by the cycle-aware `write_row` (`E-SOA-CYCLE-OWNERSHIP`). Additive, zero-dep; +4 tests (default-empty, acyclic-chain, cycle/self-loop/3-cycle rejected, out-of-range ignored); 10/10 class_view, clippy/fmt clean. Sibling `ClassView::constraints` (`validation_kind`-sourced) deferred to Inc-follow-up. Plan: `.claude/plans/probe-excel-compute-dag-v1.md`. Branch `claude/particle-wave-click-epiphany`.
124126

125127
> **2026-06-18 — ADDED (D-DO-ARM-1, the OGAR DO arm)**: `lance_graph_contract::action::{ActionState, StateGuard, ActionDef, ClassActions, actions_for, effective_actions, ActionInvocation}` — the Perdurant DO arm completing the OGAR IR (the action-axis sibling of `codegen_manifest`'s `MethodSig`/THINK). Both the 4-agent `sale_order` AR→DO probe (runtime-archaeologist) AND the merged cross-repo PR survey (ruff/OGAR/lance-graph/openproject/tesseract) agreed this was the ONE missing wire: the THINK arm (`classid → ClassView`, `has_function → MethodSig`) is converged + merged; the DO-arm `ActionInvocation`/`ActionDef` type was ABSENT. **`ActionDef`** (static, `const`-constructible, all `&'static`/`Copy`): `predicate` (= harvested `has_function` method), `object_class` (classid), `exec` (`ExecTarget` incl `SurrealQl`), `guard` (`StateGuard` = KausalSpec field==value), `required_role` (RBAC), `overrides` (OGAR `classid→ClassView` inheritance). **`ClassActions`+`actions_for`** (zero-fallback) mirror `ClassMethods`/`methods_for`. **`effective_actions(parent, child)`** = OGAR inheritance on the action axis (child overrides parent by predicate). **`ActionInvocation`** (dynamic, `Copy`): lifecycle `ActionState{Pending→Committed|Failed|Cancelled}` (sticky terminals), S2.5 `cycle` stamp, idempotency/trace keys, HLC `emitted_at_millis`. **`ActionInvocation::commit(def, actor, impact, now)`** is the gated egress — RBAC FIRST (`auth::ActorContext` must hold `required_role` or be admin → else `Failed`), THEN MUL impact (`mul::GateDecision`: `Flow→Committed`+stamped, `Hold→`Pending/escalate, `Block→Cancelled`). This IS "commit to the external consumer (odoo/openproject/woa/tesseract) after the cycle decides sound." Dispatched via `UnifiedStep`/`ExecTarget`, NOT a per-crate endpoint. Additive, zero-dep. +5 tests green. Consumer reference: `docs/OGAR_CONSUMER_API.md`. Branch `claude/soa-write-deinterlace-inc2`.
Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
// SPDX-License-Identifier: Apache-2.0
2+
// SPDX-FileCopyrightText: Copyright The Lance Authors
3+
4+
//! `content_store` — content-addressed cold text/blob store contract (zero-dep).
5+
//!
6+
//! The episodic/OSINT **text table**: `ContentId` (the `fnv1a` hash of the bytes)
7+
//! → bytes, resolved **cold, at the membrane** — never in the hot path. This is
8+
//! the typed surface for the rule the OGAR canon + `I-VSA-IDENTITIES` Test 0
9+
//! (register laziness) demand: *the reference is the identity, never a serialized
10+
//! pointer/offset inlined in the SoA*.
11+
//!
12+
//! ## Three invariants this encodes
13+
//!
14+
//! 1. **The join key IS the identity.** Nothing variable-length enters the 512 B
15+
//! node. The node carries only a fixed-size [`ContentId`] (a value tenant);
16+
//! the text lives in a columnar table next to it and joins by id. No pointer
17+
//! field, no budget break.
18+
//! 2. **Content-address, not raw GUID.** OSINT sources are shared (one document
19+
//! backs many observations). [`ContentId::of`] hashes the bytes, so identical
20+
//! sources dedup (many episodic edges → one source row). Uses [`crate::hash::fnv1a`]
21+
//! — **stable across versions/platforms** (unlike `DefaultHasher`, which must
22+
//! never key a content address; see `TECH_DEBT` re `WitnessEntry::tie_break_hash`).
23+
//! 3. **Hot/cold firewall (ADR-022).** [`ContentStore::resolve`] is the COLD /
24+
//! membrane surface: bytes are materialized only when genuinely needed (LLM
25+
//! hydration, rendering, citing). The hot path (SIMD sweep, resonance,
26+
//! AriGraph edge traversal, family-basin routing) touches only the fixed-size
27+
//! [`ContentId`] + [`SourceSpan`] — the fingerprint is the hot-path stand-in
28+
//! for the text; this trait is never called during computation.
29+
//!
30+
//! ## Provenance: `SourceSpan` is the typed `(source_id, start, end)`
31+
//!
32+
//! The merged `template-equivalence` provenance model uses
33+
//! `source_spans: Vec<(String, usize, usize)>` = `(source_id, start, end)`.
34+
//! [`SourceSpan`] is its fixed-size typed form: `source_id` IS a [`ContentId`]
35+
//! (the content-table key), `start`/`end` index into the resolved bytes. The
36+
//! gate "no source span → no claim" is literally [`SourceSpan::is_cited`].
37+
38+
use crate::hash::fnv1a;
39+
40+
/// A content address: the `fnv1a`-64 hash of the stored bytes.
41+
///
42+
/// Identical bytes ⇒ identical id ⇒ natural dedup. `ContentId(0)` is the
43+
/// reserved **empty/sentinel** (no content), mirroring the canon's zero-fallback
44+
/// ladder (a zero tier = "not consulted", never a valid address).
45+
///
46+
/// Note: 64-bit fnv1a is the workspace-canonical hash and is sufficient for
47+
/// OSINT-corpus scale; if a corpus ever approaches birthday-collision range
48+
/// (~2^32 distinct sources), widen to a 128-bit content address — the upgrade
49+
/// is local to this type.
50+
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Default)]
51+
pub struct ContentId(pub u64);
52+
53+
impl ContentId {
54+
/// Content-address arbitrary bytes.
55+
#[must_use]
56+
pub fn of(bytes: &[u8]) -> Self {
57+
Self(fnv1a(bytes))
58+
}
59+
60+
/// Content-address a string slice.
61+
#[must_use]
62+
pub fn of_str(s: &str) -> Self {
63+
Self(fnv1a(s.as_bytes()))
64+
}
65+
66+
/// The reserved empty/sentinel address (no content).
67+
#[must_use]
68+
pub fn is_sentinel(self) -> bool {
69+
self.0 == 0
70+
}
71+
}
72+
73+
/// A provenance reference: which content, and the `[start, end)` byte span within
74+
/// it. Fixed-size and `Copy` — it lives on the episodic node (a value tenant);
75+
/// the bytes resolve cold via [`ContentStore`]. The typed form of
76+
/// `template-equivalence`'s `(source_id, start, end)`.
77+
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Default)]
78+
pub struct SourceSpan {
79+
/// The content-table key (the source the span cites).
80+
pub content: ContentId,
81+
/// Inclusive start byte offset into the resolved content.
82+
pub start: u32,
83+
/// Exclusive end byte offset.
84+
pub end: u32,
85+
}
86+
87+
impl SourceSpan {
88+
/// New span; `end` is clamped to be `>= start`.
89+
#[must_use]
90+
pub fn new(content: ContentId, start: u32, end: u32) -> Self {
91+
Self { content, start, end: end.max(start) }
92+
}
93+
94+
/// Span length in bytes. Saturating: a malformed span (`end < start`, only
95+
/// constructible by bypassing [`new`](Self::new) via the public fields)
96+
/// reports `0`, consistent with [`is_empty`](Self::is_empty) — never panics
97+
/// (debug) or wraps to a huge value (release).
98+
#[must_use]
99+
pub fn len(self) -> u32 {
100+
self.end.saturating_sub(self.start)
101+
}
102+
103+
/// Whether the span covers zero bytes.
104+
#[must_use]
105+
pub fn is_empty(self) -> bool {
106+
self.end <= self.start
107+
}
108+
109+
/// "No source span → no claim": a claim is cited iff it carries a non-empty
110+
/// span into real (non-sentinel) content. The provenance gate's predicate.
111+
#[must_use]
112+
pub fn is_cited(self) -> bool {
113+
!self.content.is_sentinel() && !self.is_empty()
114+
}
115+
}
116+
117+
/// Failure resolving content from the cold store.
118+
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
119+
pub enum ContentError {
120+
/// No content stored under this id.
121+
NotFound,
122+
/// The span's `[start, end)` exceeds the resolved content's length.
123+
SpanOutOfBounds,
124+
}
125+
126+
impl core::fmt::Display for ContentError {
127+
fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
128+
match self {
129+
ContentError::NotFound => write!(f, "content-store: id not found"),
130+
ContentError::SpanOutOfBounds => write!(f, "content-store: span out of bounds"),
131+
}
132+
}
133+
}
134+
135+
/// The content-addressed **cold** store (read side).
136+
///
137+
/// Lives in the zero-dep contract so any consumer can declare it without pulling
138+
/// Arrow/Lance. Implemented downstream by a Lance text table (and, in-RAM, by the
139+
/// AriGraph `EpisodicMemory` / `WitnessCorpus` acting as the cold tier).
140+
/// `resolve` returns a borrow into the backing store (mmap'd Lance buffer or
141+
/// in-RAM `Bytes`), so reads are zero-copy at the membrane.
142+
pub trait ContentStore {
143+
/// Resolve the full content bytes for an id. `None` if absent. COLD path only.
144+
fn resolve(&self, id: ContentId) -> Option<&[u8]>;
145+
146+
/// Resolve a span's bytes (cold). Default composes [`resolve`](Self::resolve)
147+
/// with a bounds check.
148+
fn resolve_span(&self, span: SourceSpan) -> Result<&[u8], ContentError> {
149+
let bytes = self.resolve(span.content).ok_or(ContentError::NotFound)?;
150+
bytes
151+
.get(span.start as usize..span.end as usize)
152+
.ok_or(ContentError::SpanOutOfBounds)
153+
}
154+
155+
/// Whether an id is present without committing to a borrow shape.
156+
fn contains(&self, id: ContentId) -> bool {
157+
self.resolve(id).is_some()
158+
}
159+
}
160+
161+
/// The content-addressed store (write side, membrane-only).
162+
///
163+
/// Ingest is idempotent by construction: identical bytes ⇒ same [`ContentId`] ⇒
164+
/// dedup (the many-episodes → one-source rule). Writing happens at the cold
165+
/// membrane during ingestion, never on the hot path.
166+
pub trait ContentSink {
167+
/// Store `bytes`, returning their content address. Idempotent.
168+
fn put(&mut self, bytes: &[u8]) -> ContentId;
169+
170+
/// Store a string slice.
171+
fn put_str(&mut self, s: &str) -> ContentId {
172+
self.put(s.as_bytes())
173+
}
174+
}
175+
176+
#[cfg(test)]
177+
mod tests {
178+
use super::*;
179+
use std::collections::HashMap;
180+
181+
/// Reference in-RAM impl (the cold tier mirror) used to exercise the contract.
182+
#[derive(Default)]
183+
struct MemStore {
184+
map: HashMap<ContentId, Vec<u8>>,
185+
}
186+
impl ContentStore for MemStore {
187+
fn resolve(&self, id: ContentId) -> Option<&[u8]> {
188+
self.map.get(&id).map(Vec::as_slice)
189+
}
190+
}
191+
impl ContentSink for MemStore {
192+
fn put(&mut self, bytes: &[u8]) -> ContentId {
193+
let id = ContentId::of(bytes);
194+
self.map.entry(id).or_insert_with(|| bytes.to_vec());
195+
id
196+
}
197+
}
198+
199+
#[test]
200+
fn content_address_is_stable_and_dedups() {
201+
let a = ContentId::of_str("the same source document");
202+
let b = ContentId::of_str("the same source document");
203+
assert_eq!(a, b); // identical bytes ⇒ identical id (dedup key)
204+
assert_ne!(a, ContentId::of_str("a different document"));
205+
}
206+
207+
#[test]
208+
fn put_is_idempotent_one_row_per_source() {
209+
let mut s = MemStore::default();
210+
let id1 = s.put_str("shared OSINT source");
211+
let id2 = s.put_str("shared OSINT source"); // many episodes → one source
212+
assert_eq!(id1, id2);
213+
assert_eq!(s.map.len(), 1);
214+
}
215+
216+
#[test]
217+
fn resolve_span_returns_the_cited_slice() {
218+
let mut s = MemStore::default();
219+
let id = s.put_str("Alice met Bob in Paris.");
220+
let span = SourceSpan::new(id, 10, 13); // "Bob"
221+
assert_eq!(s.resolve_span(span).unwrap(), b"Bob");
222+
assert!(span.is_cited());
223+
}
224+
225+
#[test]
226+
fn out_of_bounds_and_missing_fail() {
227+
let mut s = MemStore::default();
228+
let id = s.put_str("short");
229+
assert_eq!(s.resolve_span(SourceSpan::new(id, 0, 999)), Err(ContentError::SpanOutOfBounds));
230+
assert_eq!(
231+
s.resolve_span(SourceSpan::new(ContentId(123), 0, 1)),
232+
Err(ContentError::NotFound)
233+
);
234+
}
235+
236+
#[test]
237+
fn uncited_span_is_rejected_by_the_gate() {
238+
// sentinel content, or empty span ⇒ not a citation
239+
assert!(!SourceSpan::new(ContentId(0), 0, 5).is_cited());
240+
assert!(!SourceSpan::new(ContentId(7), 5, 5).is_cited());
241+
assert!(SourceSpan::new(ContentId(7), 0, 5).is_cited());
242+
}
243+
244+
#[test]
245+
fn malformed_span_len_saturates_not_panics() {
246+
// Public fields let a consumer build end < start, bypassing new()'s clamp.
247+
// len() must saturate to 0 (consistent with is_empty), never panic/wrap.
248+
let bad = SourceSpan { content: ContentId(7), start: 13, end: 0 };
249+
assert_eq!(bad.len(), 0);
250+
assert!(bad.is_empty());
251+
assert!(!bad.is_cited());
252+
}
253+
}

crates/lance-graph-contract/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ pub mod codegen_spine;
6262
pub mod cognitive_shader;
6363
pub mod collapse_gate;
6464
pub mod container;
65+
pub mod content_store;
6566
pub mod counterfactual;
6667
pub mod crystal;
6768
pub mod cycle_accumulator;

0 commit comments

Comments
 (0)