Skip to content

D1.3 decode-kernel + residual composition (Phase 1 scaffold complete, 104/104 tests)#235

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/teleport-session-setup-wMZfb
Apr 21, 2026
Merged

D1.3 decode-kernel + residual composition (Phase 1 scaffold complete, 104/104 tests)#235
AdaWorldAPI merged 1 commit into
mainfrom
claude/teleport-session-setup-wMZfb

Conversation

@AdaWorldAPI

Copy link
Copy Markdown
Owner

Summary

Final Phase 1 scaffold deliverable — D1.3 decode-kernel trait + residual composition. Scope corrected after loading the canonical architecture docs (cognitive-shader-architecture.md + ripple-dto-contracts.md + encoding-ecosystem.md): this module sits on the hydration / calibration path, NOT the cascade inference path (which uses p64_bridge::CognitiveShader::cascade with 8 predicate planes × bgz17 O(1) distance — no per-inference codec work).

104/104 cognitive-shader-driver --features serve tests pass (+9 new).

What lands

crates/cognitive-shader-driver/src/decode_kernel.rs — ~280 LOC:

pub trait DecodeKernel: Send + Sync + std::fmt::Debug {
    fn decode(&self, &[u8]) -> Result<Vec<f32>, DecodeError>;
    fn encode(&self, &[f32]) -> Result<Vec<u8>, DecodeError>;
    fn bytes_per_row(&self) -> u32;
    fn dim(&self) -> u32;
    fn signature(&self) -> u64;           // JIT cache key
    fn backend(&self) -> &'static str;     // never "scalar" on SoA
}

pub struct StubDecodeKernel { dim, tag }    // byte-exact round-trip for tests
pub struct ResidualComposer {
    base: Box<dyn DecodeKernel>,
    residual: Box<dyn DecodeKernel>,        // can itself be a ResidualComposer (depth > 1)
}

pub enum DecodeError { SizeMismatch, Stage { stage, detail } }

Composition semantics (matches plan D1.3 spec):

encode(v) = [ base.encode(v) ; residual.encode(v - base.decode(base.encode(v))) ]
decode(enc) = base.decode(enc[..base_b]) + residual.decode(enc[base_b..])

Stages compose recursively — the residual slot is itself Box<dyn DecodeKernel>, so a depth-2 composer has another ResidualComposer in its residual slot. Tests verify byte-exact round-trip through nested-depth-2 all-stub composition (3 stages, 4 dim × 4 bytes × 3 = 48 bytes per row).

Tests (9 new)

  • stub_round_trip_is_exact
  • stub_rejects_wrong_input_size
  • residual_compose_round_trip_is_exact_when_both_stubs
  • residual_compose_mismatched_dims_rejected
  • residual_compose_bytes_per_row_sums_stages
  • residual_compose_nested_depth_two_round_trip
  • signatures_distinguish_composer_from_stages
  • signature_depends_on_stage_order (base+residual ≠ residual+base — order is part of identity)
  • composer_backend_reports_stub_when_any_stage_is_stub (weakest-link reporting)

Scope correction (per loaded orientation)

Before this PR, my framing of "codec kernels" drifted toward treating them as inference-path infrastructure. Reading cognitive-shader-architecture.md lines 582+ made the distinction explicit:

Path Uses
Cascade inference (Layer 2, per-cycle, ns budget) p64_bridge::CognitiveShader::cascade(query, radius, layer_mask) — 8 predicate planes × bgz17 O(1) palette distance, no Hamming, no POPCNT, table lookup only
Hydration / calibration (one-time per model, seconds-to-minutes) D1.x codec kernels — decode/encode tested against token-agreement gate; once a codec graduates, it runs at weight ingest (GGUF → palette + Fingerprint<256>), not per-inference

StubDecodeKernel is the test fixture; real decoders (once D1.1b lands the ndarray jitson_cranelift::JitEngine adapter) replace it. The composition pattern remains stable across that transition.

Phase 1 state after merge

D-id Deliverable Status
D1.1 CodecKernelCache<H> scaffold ✅ Shipped (#233)
D1.1b Adapter to ndarray::hpc::jitson_cranelift::JitEngine ⏳ Queued
D1.2 Rotation primitives (Identity / Hadamard / OPQ stub) ✅ Shipped (#234)
D1.3 Decode-kernel trait + residual composition ✅ This PR

After merge, Phase 1 scaffold is complete. D1.1b (real Cranelift wiring) is the only remaining Phase 1 piece, and it drops Box<dyn DecodeKernel> kernels that wrap ndarray's JitEngine into the StubDecodeKernel slot in ResidualComposer — no composition-layer changes required.

Board hygiene (same commit)

  • STATUS_BOARD.md — D1.3 Queued → In PR.

Test Plan

  • cargo test --manifest-path crates/cognitive-shader-driver/Cargo.toml --features serve — 104/104 pass (+9 new)
  • cargo test -p lance-graph-contract --lib — 147/147 pass (unchanged)
  • cargo test --manifest-path crates/jc/Cargo.toml — 6/6 pass (JC substrate proof unchanged)
  • Rules A-F honored at the composition layer (A/B/E/F apply directly; C defers to per-stage kernel backend)

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Final Phase 1 scaffold deliverable. Hydration/calibration path, NOT
cascade inference path (per cognitive-shader-architecture.md line 582:
the cascade uses p64_bridge::CognitiveShader::cascade + 8 predicate
planes × bgz17 O(1) distance, no per-inference codec work).

crates/cognitive-shader-driver/src/decode_kernel.rs (~280 LOC):

  DecodeKernel trait — object-safe, Send+Sync+Debug:
    decode(&self, &[u8]) -> Result<Vec<f32>, DecodeError>
    encode(&self, &[f32]) -> Result<Vec<u8>, DecodeError>
    bytes_per_row() -> u32
    dim() -> u32
    signature() -> u64          # JIT cache key
    backend() -> &'static str   # never "scalar" on SoA

  StubDecodeKernel { dim, tag } — byte-exact f32 ↔ u8 round-trip via
    native-endian reinterpret. No quantization, no compression; exists
    so composition plumbing can be tested without a trained palette.
    Backend = "stub". Signature hashes "stub_decode" + dim + tag.

  ResidualComposer { base: Box<dyn DecodeKernel>, residual: Box<dyn DecodeKernel> }
    Two-stage residual composition:
      encode(v) = [base.encode(v); residual.encode(v - base.decode(base.encode(v)))]
      decode(enc) = base.decode(enc[..base_b]) + residual.decode(enc[base_b..])
    Nests recursively — residual slot can itself be a ResidualComposer
    (depth > 1). Rejects mismatched dims at construction.
    Backend = "stub" if either stage is stub, else base's backend
    (weakest-link reporting for latency-critical stages).

  DecodeError { SizeMismatch { expected, actual }, Stage { stage, detail } }

Tests (9 new, all under --features serve):
  - stub_round_trip_is_exact
  - stub_rejects_wrong_input_size
  - residual_compose_round_trip_is_exact_when_both_stubs
    (both stubs = byte-exact; residual all zero; output == input)
  - residual_compose_mismatched_dims_rejected
  - residual_compose_bytes_per_row_sums_stages
  - residual_compose_nested_depth_two_round_trip
    (ResidualComposer whose residual IS another ResidualComposer —
    depth=2 encodes 3 stages, still byte-exact when all stubs)
  - signatures_distinguish_composer_from_stages
  - signature_depends_on_stage_order
    (base+residual vs residual+base produce different signatures)
  - composer_backend_reports_stub_when_any_stage_is_stub

Scope clarification (per orientation loaded this session from
cognitive-shader-architecture.md + ripple-dto-contracts.md):
  - D1.x codec kernels = hydration/calibration path
  - Cascade inference path = p64_bridge::CognitiveShader at L2
  - Real kernels replace StubDecodeKernel once D1.1b lands the
    ndarray::hpc::jitson_cranelift::JitEngine adapter

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md D1.3 Queued → In PR

Rules honored:
  Rule A — in-place &mut operations via Vec; no manual index math
  Rule B — no std::arch / no hpc::simd_avxNNN reach
  Rule C — n/a at the composition layer (real kernel backend selection
           defers to D1.1b per-stage)
  Rule D — codec params come from CodecParams via Wire DTOs (D0.1-D0.3)
  Rule E — trait methods expose signature + backend + bytes_per_row
  Rule F — no serialization between stages; Vec<f32>/Vec<u8> owned

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f58967902

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +179 to +180
for (dst, &r) in out.iter_mut().zip(&residual_v) {
*dst += r;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject stage decode length mismatches before summing

ResidualComposer::decode adds stage outputs with zip, which truncates to the shorter vector. If either stage returns fewer than dim() elements, this returns Ok(...) with silently corrupted output instead of surfacing an error, even though the trait contract says decode should produce full-dimension vectors. Add explicit length checks for both decoded stage vectors before the accumulation loop so malformed/buggy stage implementations fail fast.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit 6bed7ae into main Apr 21, 2026
0 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants