diff --git a/.claude/CODING_PRACTICES.md b/.claude/CODING_PRACTICES.md index 064b0132..1e81cd50 100644 --- a/.claude/CODING_PRACTICES.md +++ b/.claude/CODING_PRACTICES.md @@ -258,6 +258,137 @@ scalar fallback INTERNALLY; the consumer never hand-rolls. hot path → reject + cite this section. Exception: the ndarray crate itself implements backends, not a violation. +### How `ndarray::simd::*` resolves to backends (polyfill chain) + +The `simd.rs` module in ndarray is the **single public surface**; it +re-exports concrete types from backend files based on `cfg` target +features. Consumers never reach around it. The chain: + +``` + ┌─────────────────────────────────────────────────────────────────┐ + │ ndarray::simd (src/simd.rs) ← the ONLY consumer surface │ + │ │ + │ Re-exports F32x16 / U8x64 / F16x32 / F64x8 / BF16x32 etc. from │ + │ the right backend, chosen by cfg(target_feature): │ + │ │ + │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ + │ │ simd_amx.rs │ │simd_avx512.rs│ │ simd_avx2.rs │ │ + │ │ Intel AMX │ │ AVX-512 base │ │ AVX-2 fallbk │ │ + │ │ tiles + │ │ F32x16 / │ │ F32x8 / │ │ + │ │ VNNI + │ │ U8x64 / ... │ │ F64x4 │ │ + │ │ TDPBUSD / │ │ (mandatory │ │ (cfg-gated │ │ + │ │ TDPBF16PS │ │ floor at │ │ when build │ │ + │ │ via inline │ │ target-cpu= │ │ drops to │ │ + │ │ asm (stable) │ │ x86-64-v4) │ │ x86-64-v3) │ │ + │ └──────────────┘ └──────────────┘ └──────────────┘ │ + │ │ │ │ │ + │ ├─ runtime-opt ──┤ │ │ + │ │ (amx_available) │ │ + │ │ compile-time │ │ + │ │ cfg(avx2) │ │ + │ │ + │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ + │ │ simd_neon.rs │ │ simd_wasm.rs │ │ (scalar) │ │ + │ │ aarch64 │ │ wasm32-simd │ │ last resort │ │ + │ │ │ │ │ │ INTERNAL to │ │ + │ │ │ │ │ │ each backend│ │ + │ └──────────────┘ └──────────────┘ └──────────────┘ │ + │ │ + │ hpc/simd_caps.rs — runtime capability struct │ + │ hpc/amx_matmul.rs — Intel AMX tile primitives (tile_dpbusd / │ + │ tile_dpbf16ps etc.) surfaced for callers │ + │ that want explicit matmul routing │ + └─────────────────────────────────────────────────────────────────┘ +``` + +**Mandatory consumer rule:** only ever write `use ndarray::simd::…`. +The backend files are private implementation detail — they can be +reshuffled at any time (new `simd_avx512fp16.rs` shipped in a point +release, backends split per architecture, etc.) without breaking +consumers. + +**Explicit AMX routing** (when the caller wants to force the tile +path rather than let `simd::*` infer it): the AMX sibling modules +(`ndarray::simd_amx::*` and `ndarray::hpc::amx_matmul::*`) are +**first-class canonical surfaces**, not backend reach. They're +named at the top level because Intel AMX needs explicit OS +enablement + XCR0 prctl on Linux + runtime `amx_available()` +gating that's orthogonal to compile-time cfg. + +--- + +## MANDATORY `cargo clippy` + feature-matrix discipline + +Every PR that touches `crates/*/src/` runs this full matrix before +being declared complete. `--features serve` alone is NOT enough +(learned the hard way at PR #238 when `--features grpc` and +`--features lab` silently broke after months of feature-drift). + +```bash +# All four compile-and-warning-clean before commit: +cargo check # default +cargo check --manifest-path crates//Cargo.toml --features serve +cargo check --manifest-path crates//Cargo.toml --features grpc +cargo check --manifest-path crates//Cargo.toml --features lab + +# Clippy WITH -D warnings (not just --no-deps); catches redundant +# closures, needless collects, manual Default impls, hidden type +# complexity, etc.: +cargo clippy --manifest-path crates//Cargo.toml --features lab -- -D warnings +cargo clippy --manifest-path crates//Cargo.toml --features serve -- -D warnings + +# Full test under the widest feature set: +cargo test --manifest-path crates//Cargo.toml --features lab --lib + +# Doc-tests (separate target; --lib skips them): +cargo test --manifest-path crates//Cargo.toml --features lab --doc +``` + +**Why `--lib` is not enough.** `cargo test` without `--lib` also runs +integration tests in `tests/` and the doc-tests embedded in `///` +comments. A doc comment that compiles as prose but fails as code +is a latent failure; doc-tests catch it. The `--doc` run is cheap +(seconds) and mandatory. + +**Why `--features lab` is not enough.** The `lab` umbrella pulls in +everything but only exercises the union. `cargo check --features grpc` +ALONE still needs to work — downstream consumers that only want gRPC +(not REST) compile grpc-only; if wire.rs is `serve`-gated but grpc.rs +references it, the grpc-only build breaks silently. + +**Fix pattern** (applied in PR #238 `_lab-dtos` internal feature): +when two features share a dep (serde / serde_json / base64 / bytemuck +used by both `serve` and `grpc`), factor into an internal feature: + +```toml +[features] +_lab-dtos = ["dep:serde", "dep:serde_json", "dep:base64", "dep:bytemuck"] +serve = ["_lab-dtos", "dep:axum", "dep:tokio"] +grpc = ["_lab-dtos", "dep:prost", "dep:tonic", "dep:tonic-build", "dep:tokio"] +lab = ["serve", "grpc", "with-engine", "with-planner"] +``` + +And widen `pub mod wire` from `#[cfg(feature = "serve")]` to +`#[cfg(any(feature = "serve", feature = "grpc"))]` so both transports +see the shared DTOs. + +**Reviewer trigger:** a PR whose description cites only +`--features serve` test results → request re-run across the full +matrix before approval. The matrix is a first-class part of the +contract, not an afterthought. + +**Rust 1.95 transition note:** `mut ref` / `ref mut` in struct +pattern field shorthand are now feature-gated (were accidentally +stable through 1.94). When the toolchain pin bumps, grep both +`src/` trees: + +```bash +grep -rn "mut ref\b\|ref mut\b" crates/*/src/ +``` + +Zero hits today across `lance-graph/crates/` + `ndarray/src/`. +Stay that way. + --- ## The 3-Way BindSpace Mutation Scheme diff --git a/crates/cognitive-shader-driver/src/wire.rs b/crates/cognitive-shader-driver/src/wire.rs index 1805fdfe..1f0dadf7 100644 --- a/crates/cognitive-shader-driver/src/wire.rs +++ b/crates/cognitive-shader-driver/src/wire.rs @@ -145,6 +145,7 @@ pub struct WireTensorsResponse { /// object after ingress — per Rule F, there is no second deserialise anywhere /// in the pipeline after the handler consumes the request. #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireCalibrateRequest { pub model_path: String, pub tensor_name: String, @@ -183,6 +184,7 @@ fn default_cal_iters() -> usize { 20 } fn default_icc_samples() -> usize { 512 } #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireCalibrateResponse { pub tensor_name: String, pub dims: Vec, @@ -246,6 +248,7 @@ pub struct WireResidualSpec { } #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireCodecParams { pub subspaces: u32, pub centroids: u32, @@ -348,6 +351,7 @@ impl TryFrom for CodecParams { // ═══════════════════════════════════════════════════════════════════════════ #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireTensorView { /// [rows, cols] in elements (not bytes). Actual byte size inferred from lane_width. pub shape: [u32; 2], @@ -955,6 +959,7 @@ pub enum WireBaseline { /// `top1_rate = 0.0` and `candidate_latency_us = 0`. D2.1–D2.3 land the /// real decode-and-compare loop. #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireTokenAgreement { /// Model root directory (safetensors + config.json). Passed to /// `auto_detect::detect` to infer lane width + architecture defaults @@ -974,6 +979,7 @@ pub struct WireTokenAgreement { /// `POST /v1/shader/token-agreement` response. #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireTokenAgreementResult { /// Top-1 token-match rate across the full prompt set. Pass gate: ≥ 0.99. pub top1_rate: f32, @@ -1049,6 +1055,7 @@ pub enum WireMeasure { /// × |distances| × |lane_widths|. Clients SHOULD keep the product ≤ a few /// hundred to fit in one JIT kernel cache warm-up round. #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireSweepGrid { #[serde(default = "default_subspaces_axis")] pub subspaces: Vec, @@ -1131,6 +1138,7 @@ impl WireSweepGrid { /// `POST /v1/shader/sweep` request. Client submits one grid + a measure /// set; server enumerates + calibrates + token-agreements each grid point. #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireSweepRequest { pub tensor_path: String, pub grid: WireSweepGrid, @@ -1156,6 +1164,7 @@ fn default_measure_set() -> Vec { /// One grid-point result, streamed by the sweep handler. Carries the /// candidate that produced it + optional per-measure payloads. #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireSweepResult { /// Zero-based grid index (0 .. grid.cardinality()). pub grid_index: u32, @@ -1179,6 +1188,7 @@ pub struct WireSweepResult { /// `POST /v1/shader/sweep` response for batch (non-streaming) clients. /// Streaming clients receive one `WireSweepResult` per SSE event instead. #[derive(Debug, Clone, Serialize, Deserialize)] +#[non_exhaustive] pub struct WireSweepResponse { pub label: String, pub cardinality: u32, diff --git a/scripts/codec_sweep.sh b/scripts/codec_sweep.sh index e47803f5..a55112ff 100755 --- a/scripts/codec_sweep.sh +++ b/scripts/codec_sweep.sh @@ -56,6 +56,33 @@ echo "$response" | jq '.' echo echo "=== Stub honesty check ===" -stub_flag=$(echo "$response" | jq '.results[0].stub // "no results"') +# Per EPIPHANIES.md 2026-04-20 "D0.2 stub flag is anti-#219 defense at +# the type level" — the check MUST fail the script (not just log) when +# the flag is absent or false. Until D2.2 lands real decode-and-compare, +# Phase 0/2 runs return stub:true. A non-stub response here means +# either the wrong endpoint was hit, the response was malformed, or +# (worst case) the server silently shipped non-stub code and this +# script is now pretending synthetic numbers are real. + +stub_flag=$(echo "$response" | jq -r '.results[0].stub // "missing"') echo "results[0].stub = $stub_flag" -echo "Expected: true (Phase 0 stub; D2.2 flips to false when real decode lands)." + +case "$stub_flag" in + true) + echo "OK — Phase 0 stub honored. (D2.2 will flip this to false when real decode lands;" + echo " at that point, flip this check too.)" + ;; + false) + echo "FAIL — results[0].stub is false but D2.2 has not landed." >&2 + echo " This script refuses to treat non-stub output as real during Phase 0." >&2 + echo " Either the server is running non-scaffold code (update this check)," >&2 + echo " or the request hit the wrong endpoint / unexpected handler." >&2 + exit 3 + ;; + *) + echo "FAIL — results[0].stub missing or unparseable (got: $stub_flag)." >&2 + echo " Response may be malformed or an error payload." >&2 + echo " Inspect the --- response --- section above." >&2 + exit 3 + ;; +esac