@@ -2973,3 +2973,90 @@ The architecture's five consumer perspectives are not layers — they're project
29732973** SoA vs Functional is not a choice — it's a WHERE.** BindSpace is SoA (columnar storage for SIMD). The algebra on it is Functional (methods on carriers). The SoA carries the state; the Functional methods transform it. Both exist simultaneously on the same data. The "struct of arrays vs object thinks for itself" tension resolves as: the ARRAY is the SoA, the ELEMENT (row, trajectory, fingerprint) thinks for itself via methods.
29742974
29752975Cross-ref: CLAUDE.md §The Stance (AGI-as-glove, SoA columns ARE the AGI surface), lab-vs-canonical-surface.md (I1-I11 invariants), ExternalMembrane (contract::external_membrane), BindSpace (cognitive-shader-driver::bindspace).
2976+
2977+ ## 2026-04-26 — FINDING: distance dispatch must be type-intrinsic, not crate-boundary-crossing
2978+
2979+ ** Status:** FINDING
2980+ ** Owner scope:** @family-codec-smith , @truth-architect , @host-glove-designer
2981+
2982+ The struct-of-arrays (BindSpace, RenderFrame, Arrow columns) carries heterogeneous
2983+ fingerprint types that each need a DIFFERENT distance function:
2984+
2985+ | Type | Distance | Where it lives | Notes |
2986+ | ---| ---| ---| ---|
2987+ | ` Binary16K = [u64; 256] ` | Hamming (popcount of XOR) | ` ndarray::hpc::bitwise::hamming_distance_raw ` | 16384-bit, SIMD VPOPCNTDQ |
2988+ | ` Vsa16kF32 = [f32; 16_384] ` | Cosine → FisherZ transform | ` ndarray::hpc::heel_f64x8::cosine_f64_simd ` | f32 dot/norm via F32x16 FMA |
2989+ | ` CamPqCode = [u8; 6] ` | ADC (asymmetric distance computation) | ` ndarray::hpc::cam_pq::adc_distance ` | Precomputed distance tables, O(1) |
2990+ | ` PaletteEdge = [u8; 3] ` | Palette L1 (lookup table) | ` ndarray::hpc::palette_distance::SpoDistanceMatrices::distance ` | bgz17 256×256 table, 1.8 ns |
2991+ | ` Base17 = [u8; 17] ` | Palette nearest (codebook search) | ` bgz17::Palette::nearest ` | 256 centroids, should use precomputed table |
2992+ | ` HighHeelBGZ ` container | Cascade (HHTL skip → palette → ADC fallback) | ` ndarray::hpc::cascade ` + ` bgz-tensor::hhtl_cache ` | Multi-level, route by ` RouteAction ` |
2993+
2994+ ** The problem:** When a SoA column contains mixed types (e.g., one column is Binary16K,
2995+ another is CamPqCode), the distance dispatch currently happens at the call site — the
2996+ caller must know which distance function to use. This works inside a single crate, but
2997+ when the SoA lives in crate A (e.g., ` cognitive-shader-driver::BindSpace ` ) and the
2998+ distance kernel lives in crate B (e.g., ` ndarray::hpc::bitwise ` ), every call crosses
2999+ a crate boundary. That boundary is zero-cost for ` #[inline] ` functions, but NOT zero-cost
3000+ if the function is generic over a trait object (` dyn DistanceFn ` ) or involves dynamic
3001+ dispatch.
3002+
3003+ ** The solution — type-intrinsic dispatch, not dynamic dispatch:**
3004+
3005+ The distance function should be a method ON the carrier type, not a free function
3006+ called FROM the SoA consumer. This follows the "object speaks for itself" doctrine
3007+ (CLAUDE.md §The Click):
3008+
3009+ ``` rust
3010+ // WRONG — caller must know the distance type:
3011+ let d = hamming_distance_raw (fp_a . as_bytes (), fp_b . as_bytes ()); // crate boundary
3012+
3013+ // RIGHT — the type carries its own distance:
3014+ let d = fp_a . distance (& fp_b ); // monomorphized, inlined, zero boundary tax
3015+ ```
3016+
3017+ The contract already has ` CodecRoute: Passthrough | CamPq ` which names the regime.
3018+ What's missing is a ` Distance ` trait that each carrier implements:
3019+
3020+ ``` rust
3021+ pub trait Distance : Sized {
3022+ fn distance (& self , other : & Self ) -> u32 ;
3023+ fn similarity (& self , other : & Self ) -> f32 {
3024+ 1.0 - (self . distance (other ) as f32 / Self :: MAX_DISTANCE as f32 )
3025+ }
3026+ const MAX_DISTANCE : u32 ;
3027+ }
3028+ ```
3029+
3030+ Implementations:
3031+ - ` impl Distance for [u64; 256] ` → ` hamming_distance_raw ` (inline, SIMD)
3032+ - ` impl Distance for CamPqCode ` → ADC lookup (precomputed table ref)
3033+ - ` impl Distance for PaletteEdge ` → palette L1 table lookup
3034+ - ` impl Distance for Vsa16kF32 ` → cosine → FisherZ (F32x16 FMA)
3035+
3036+ The trait monomorphizes at compile time — no dynamic dispatch, no crate boundary
3037+ tax. The SoA column iterates with ` col.chunks().map(|a, b| a.distance(b)) ` and
3038+ the correct distance function is selected by TYPE, not by runtime enum match.
3039+
3040+ ** Where this trait should live:** ` lance-graph-contract ` (zero deps). The
3041+ implementations live in ndarray (for SIMD kernels) or in the carrier crate
3042+ (for precomputed tables). The contract defines the interface; ndarray provides
3043+ the hardware acceleration; the SoA consumer never needs to know which distance
3044+ kernel runs.
3045+
3046+ ** Hard-coded dispatch within the same crate is fine** — when ` BindSpace ` calls
3047+ ` hamming_distance_raw ` on its ` content ` column, that's a direct function call
3048+ into ndarray, monomorphized and inlined. The problem only arises if we try to
3049+ make the SoA generic over distance type via ` dyn ` trait objects. Don't do that.
3050+ Keep the dispatch compile-time via generics or type-specific methods. The SoA
3051+ pays zero boundary tax because Rust's monomorphization erases the crate boundary.
3052+
3053+ ** FisherZ note:** Cosine similarity ∈ [ -1, 1] is nonlinear for averaging. The
3054+ FisherZ transform ` z = atanh(r) ` maps it to a normal-distributed variable that
3055+ can be averaged, then ` r = tanh(z) ` maps back. This matters when the SoA
3056+ accumulates similarities across columns (e.g., weighted multi-column distance).
3057+ The ` Distance ` trait should expose ` fn similarity_z(&self, other: &Self) -> f32 `
3058+ for the FisherZ-transformed variant, defaulting to ` atanh(similarity()) ` .
3059+
3060+ Cross-ref: CLAUDE.md §The Click ("object speaks for itself"), I1 Codec Regime
3061+ Split (` CodecRoute ` ), ` contract::cam::DistanceTableProvider ` (existing trait for
3062+ ADC), ` ndarray::hpc::bitwise::hamming_distance_raw ` , ` ndarray::hpc::palette_distance ` .
0 commit comments