Skip to content

Commit d65dcb7

Browse files
committed
fix(plans)+test: #500 review — OCR rides Full not Compressed; OCR-RT gate is exact
Two review threads on the merged #500: - codex P2: "post-POC OCR rides Compressed" was wrong — Compressed lacks Energy+Plasticity, so the schema-gated transcode would silently drop confidence (→Energy) and repair-provenance (→Plasticity). Corrected: OCR rides Full (the only preset with the codec residues AND the hot lifecycle columns). The OCR-SCHEMA contract test now asserts Compressed lacks Energy/Plasticity (regression guard). - CodeRabbit Major: OCR-RT reversibility gate tightened 99% → 100% exact (a lossy residue→rank map is NOT "reversible"; tolerance moved to a separate quality probe). https://claude.ai/code/session_01D2WSmezQBNC3bUdHuGfGmo
1 parent e192266 commit d65dcb7

3 files changed

Lines changed: 27 additions & 13 deletions

File tree

.claude/plans/ocr-canonical-soa-integration-v1.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -93,11 +93,13 @@ keyed by `identity`. Not bundled into the node.
9393
**ValueSchema:** do **NOT** add a 5th `ValueSchema::Ocr` enum variant — that is a
9494
contract-surface addition against the #496 §0 anti-invention guardrail. Shipped
9595
`ocr.rs` already transcodes by riding the POC-`Full` default (`classid_read_mode →
96-
Full`) and writing only the tenants it populates. Post-POC, OCR rides the existing
97-
**`Compressed`** preset (already = Fingerprint + HelixResidue + TurbovecResidue +
98-
EntityType) — or, if a distinct tenant set is truly needed, **mint an OCR class** in
99-
OGAR whose `ClassView` selects existing tenants (the §0-sanctioned opt-in route).
100-
New capability = new column/class, never a new enum variant.
96+
Full`) and writing only the tenants it populates. Post-POC, OCR rides **`Full`**
97+
the only existing preset carrying the codec residues (HelixResidue + TurbovecResidue)
98+
AND the hot columns the §4 writeback needs (Energy for confidence, Plasticity for the
99+
repair stamp). `Compressed` lacks Energy/Plasticity and `Cognitive` lacks the
100+
residues, so neither fits OCR (codex P2 on #500). A leaner OCR row would need an
101+
operator-minted preset — that is an operator decision, not a plan default; the rule
102+
that holds is **no new enum variant from a plan**.
101103

102104
## 4. Repair: DeepNSM + CAM/PQ nearest-valid-token (D-OCR-52)
103105

.claude/plans/ocr-probes-v1.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,10 @@
2727
turbovec PQ residue), attempt to recover the **rank** from the residue bytes ALONE
2828
(no stored-rank lookup). Needs deepnsm `Codebook` + helix `Signed360` wired in one
2929
crate (they are not today — that wiring is itself part of the gate).
30-
- **Pass:** ≥ 99 % of the 4096-word vocab round-trips residue→rank→word exactly.
31-
- **Fail:** < 99 %, OR recovery requires the original rank as input ⇒ "reversible
30+
- **Pass:** **100 %** of the 4096-word vocab round-trips residue→rank→word exactly —
31+
a reversibility gate must be exact; a single miss fails it (a lossy map is NOT
32+
"reversible"). Any tolerance belongs in a separate *quality* probe, never this gate.
33+
- **Fail:** any miss, OR recovery requires the original rank as input ⇒ "reversible
3234
without a hash" is FALSE; the corrected plans already say text = identity →
3335
content-store lookup, codebook = repair signal (this probe confirms or lifts that).
3436
- **Cost:** ~80 LOC once deepnsm+helix are co-located; the wiring is the real work.

crates/lance-graph-contract/src/ocr.rs

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -237,9 +237,10 @@ mod tests {
237237
fn ocr_schema_fit_rides_existing_preset_no_new_variant() {
238238
// Probe OCR-SCHEMA (.claude/plans/ocr-probes-v1.md): the OCR value tenants
239239
// fit an EXISTING ValueSchema preset, so a 5th `ValueSchema::Ocr` enum variant
240-
// is NOT needed (#496 §0 anti-invention). The codec-residue set OCR rides —
241-
// HelixResidue + TurbovecResidue + EntityType (+ Fingerprint) — is exactly
242-
// `Compressed`; everything else OCR could want is in the POC `Full` default.
240+
// is NOT needed (#496 §0 anti-invention). `Compressed` carries the codec
241+
// residues — but OCR also writes confidence→Energy + repair→Plasticity, which
242+
// `Compressed` LACKS, so OCR rides `Full` (the only preset with residues AND
243+
// the hot lifecycle columns), not `Compressed` (codex P2 on #500).
243244
let compressed = ValueSchema::Compressed;
244245
for t in [
245246
ValueTenant::HelixResidue,
@@ -249,11 +250,20 @@ mod tests {
249250
] {
250251
assert!(
251252
compressed.has(t),
252-
"Compressed already carries {t:?} — OCR rides it"
253+
"Compressed carries the codec residue {t:?}"
253254
);
254255
}
255-
// The shipped transcode rides POC `Full`, which carries every tenant OCR touches
256-
// (incl. Meta anchor / Energy confidence / Plasticity provenance).
256+
// ...but NOT the hot columns OCR's writeback needs — Compressed alone drops them.
257+
assert!(
258+
!compressed.has(ValueTenant::Energy),
259+
"Compressed lacks Energy"
260+
);
261+
assert!(
262+
!compressed.has(ValueTenant::Plasticity),
263+
"Compressed lacks Plasticity"
264+
);
265+
// OCR rides `Full`, which carries every tenant OCR touches (residues + Meta
266+
// anchor + Energy confidence + Plasticity provenance + EntityType).
257267
let full = ValueSchema::Full;
258268
for t in [
259269
ValueTenant::HelixResidue,

0 commit comments

Comments
 (0)