Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions .claude/plans/ocr-canonical-soa-integration-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,13 @@ keyed by `identity`. Not bundled into the node.
**ValueSchema:** do **NOT** add a 5th `ValueSchema::Ocr` enum variant — that is a
contract-surface addition against the #496 §0 anti-invention guardrail. Shipped
`ocr.rs` already transcodes by riding the POC-`Full` default (`classid_read_mode →
Full`) and writing only the tenants it populates. Post-POC, OCR rides the existing
**`Compressed`** preset (already = Fingerprint + HelixResidue + TurbovecResidue +
EntityType) — or, if a distinct tenant set is truly needed, **mint an OCR class** in
OGAR whose `ClassView` selects existing tenants (the §0-sanctioned opt-in route).
New capability = new column/class, never a new enum variant.
Full`) and writing only the tenants it populates. Post-POC, OCR rides **`Full`** —
the only existing preset carrying the codec residues (HelixResidue + TurbovecResidue)
AND the hot columns the §4 writeback needs (Energy for confidence, Plasticity for the
repair stamp). `Compressed` lacks Energy/Plasticity and `Cognitive` lacks the
residues, so neither fits OCR (codex P2 on #500). A leaner OCR row would need an
Comment on lines +96 to +100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align stale Compressed references with Full requirement

This correction now says post-POC OCR must ride Full because Compressed lacks Energy/Plasticity, but the same plan still tells implementers in §2 and the D-OCR-51 deliverable to use Full POC / Compressed. If the D-OCR-51 work follows those checklist lines, it can reintroduce the exact data-loss path this change is guarding against by selecting Compressed and dropping confidence/repair provenance, so the remaining summaries should be updated to Full or an operator-minted preset.

Useful? React with 👍 / 👎.

operator-minted preset — that is an operator decision, not a plan default; the rule
that holds is **no new enum variant from a plan**.

## 4. Repair: DeepNSM + CAM/PQ nearest-valid-token (D-OCR-52)

Expand Down
6 changes: 4 additions & 2 deletions .claude/plans/ocr-probes-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@
turbovec PQ residue), attempt to recover the **rank** from the residue bytes ALONE
(no stored-rank lookup). Needs deepnsm `Codebook` + helix `Signed360` wired in one
crate (they are not today — that wiring is itself part of the gate).
- **Pass:** ≥ 99 % of the 4096-word vocab round-trips residue→rank→word exactly.
- **Fail:** < 99 %, OR recovery requires the original rank as input ⇒ "reversible
- **Pass:** **100 %** of the 4096-word vocab round-trips residue→rank→word exactly —
a reversibility gate must be exact; a single miss fails it (a lossy map is NOT
"reversible"). Any tolerance belongs in a separate *quality* probe, never this gate.
- **Fail:** any miss, OR recovery requires the original rank as input ⇒ "reversible
without a hash" is FALSE; the corrected plans already say text = identity →
content-store lookup, codebook = repair signal (this probe confirms or lifts that).
- **Cost:** ~80 LOC once deepnsm+helix are co-located; the wiring is the real work.
Expand Down
22 changes: 16 additions & 6 deletions crates/lance-graph-contract/src/ocr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -237,9 +237,10 @@ mod tests {
fn ocr_schema_fit_rides_existing_preset_no_new_variant() {
// Probe OCR-SCHEMA (.claude/plans/ocr-probes-v1.md): the OCR value tenants
// fit an EXISTING ValueSchema preset, so a 5th `ValueSchema::Ocr` enum variant
// is NOT needed (#496 §0 anti-invention). The codec-residue set OCR rides —
// HelixResidue + TurbovecResidue + EntityType (+ Fingerprint) — is exactly
// `Compressed`; everything else OCR could want is in the POC `Full` default.
// is NOT needed (#496 §0 anti-invention). `Compressed` carries the codec
// residues — but OCR also writes confidence→Energy + repair→Plasticity, which
// `Compressed` LACKS, so OCR rides `Full` (the only preset with residues AND
// the hot lifecycle columns), not `Compressed` (codex P2 on #500).
let compressed = ValueSchema::Compressed;
for t in [
ValueTenant::HelixResidue,
Expand All @@ -249,11 +250,20 @@ mod tests {
] {
assert!(
compressed.has(t),
"Compressed already carries {t:?} — OCR rides it"
"Compressed carries the codec residue {t:?}"
);
}
// The shipped transcode rides POC `Full`, which carries every tenant OCR touches
// (incl. Meta anchor / Energy confidence / Plasticity provenance).
// ...but NOT the hot columns OCR's writeback needs — Compressed alone drops them.
assert!(
!compressed.has(ValueTenant::Energy),
"Compressed lacks Energy"
);
assert!(
!compressed.has(ValueTenant::Plasticity),
"Compressed lacks Plasticity"
);
// OCR rides `Full`, which carries every tenant OCR touches (residues + Meta
// anchor + Energy confidence + Plasticity provenance + EntityType).
let full = ValueSchema::Full;
for t in [
ValueTenant::HelixResidue,
Expand Down
Loading