Skip to content

Commit e956e9d

Browse files
authored
Merge pull request #147 from AdaWorldAPI/claude/sprint-12-qualia-stream-w-f4
impl(sprint-12/wave-F): D-CSV-11 vertical streaming scaffolds (QualiaStream + InferenceStream + SplatFieldStream)
2 parents 2a3885d + 4770a87 commit e956e9d

8 files changed

Lines changed: 794 additions & 19 deletions

File tree

scripts/miri-tests.sh

Lines changed: 80 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,87 @@
11
#!/bin/sh
2+
#
3+
# Miri test runner — ephemeral nightly, scoped to this script ONLY.
4+
#
5+
# Rules of the road (do not violate):
6+
# * The repo's default toolchain is stable (see rust-toolchain.toml).
7+
# `cargo build`, `cargo test`, `cargo clippy`, CI's clippy / tests jobs
8+
# all use stable. Nothing else opts into nightly.
9+
# * Miri requires nightly because `src/simd_nightly/` is gated on
10+
# `#![feature(portable_simd)]` (unstable issue #86656), and Miri itself
11+
# ships only on nightly. This script invokes nightly via `+nightly`,
12+
# which is an ephemeral, per-invocation switch — it does NOT change
13+
# the default toolchain.
14+
# * The `nightly-simd` cargo feature is enabled here ONLY. It routes
15+
# `crate::simd::*` through `core::simd::*` (the std polyfill) instead
16+
# of the architecture-specific `_mm*_*` intrinsics, so Miri can
17+
# actually execute the SIMD code paths. Production builds (and CI's
18+
# clippy / tests on stable) keep using the intrinsics backend.
19+
# * `blas` is excluded because Miri cannot FFI into `cblas_gemm`.
20+
#
21+
# If Miri stays clean, the matching CI job at `.github/workflows/ci.yaml`
22+
# § miri promotes this from optional → required.
223

324
set -x
425
set -e
526

6-
# We rely on layout-dependent casts, which should be covered with #[repr(transparent)]
7-
# This should catch if we missed that
8-
RUSTFLAGS="-Zrandomize-layout"
27+
# Idempotent install of the miri component on nightly. No-op when already
28+
# present (rustup short-circuits). Safe in CI fresh checkouts.
29+
rustup component add miri --toolchain nightly >/dev/null 2>&1 || \
30+
rustup +nightly component add miri
931

10-
# Miri reports a stacked borrow violation deep within rayon, in a crate called crossbeam-epoch
11-
# The crate has a PR to fix this: https://github.com/crossbeam-rs/crossbeam/pull/871
12-
# but using Miri's tree borrow mode may resolve it for now.
13-
# Disabled until we can figure out a different rayon issue: https://github.com/rust-lang/miri/issues/1371
14-
# MIRIFLAGS="-Zmiri-tree-borrows"
32+
# Layout randomisation — catches missing `#[repr(transparent)]` and similar
33+
# layout-dependent UB. Cheap; always on.
34+
export RUSTFLAGS="-Zrandomize-layout"
1535

16-
# General tests
17-
# Note that we exclude blas feature because Miri can't do cblas_gemm
18-
cargo miri nextest run -v -p ndarray -p ndarray-rand --features approx,serde
36+
# Miri reports a stacked borrow violation deep within rayon's
37+
# crossbeam-epoch. Upstream fix: crossbeam PR #871.
38+
# Tree-borrow mode resolves it but trips a different rayon issue
39+
# (rust-lang/miri#1371). Left disabled until both upstream stories close.
40+
# export MIRIFLAGS="-Zmiri-tree-borrows"
41+
42+
# Architectural limit on the Miri sweep:
43+
#
44+
# `crate::simd::*` (the production dispatch in `src/simd.rs`) re-exports
45+
# from `simd_avx512` / `simd_avx2` / `simd_neon`, which call `_mm*_*` /
46+
# `vget*` intrinsics directly. Miri rejects those with "calling a
47+
# function that requires unavailable target features: avx" because the
48+
# Miri target doesn't enable AVX/AVX2/AVX-512/NEON target features.
49+
#
50+
# The `nightly-simd` feature ships a parallel module `crate::simd_nightly`
51+
# (the 24-type `core::simd` polyfill, at full parity with the 24 types
52+
# defined across `simd_avx2.rs` + `simd_avx512.rs` — landed in PR #146)
53+
# which IS Miri-checkable. But the default `crate::simd::*` dispatch is
54+
# NOT routed through it; consumer modules that import `crate::simd::F32x16`
55+
# (most of `hpc::*` + the `simd::tests::*` suite) go through intrinsics.
56+
# The polyfill is no longer the bottleneck — the missing piece is a
57+
# `cfg(miri)` switch in `src/simd.rs` that re-exports from `simd_nightly`
58+
# instead of `simd_avx*` under Miri.
59+
cargo +nightly miri nextest run -v \
60+
--no-fail-fast \
61+
-p ndarray -p ndarray-rand \
62+
--features approx,serde,nightly-simd \
63+
-E '!(
64+
test(/^hpc::/) - test(/^hpc::byte_scan/)
65+
) and !test(/^simd::tests::/)
66+
and !test(/^hpc::framebuffer::pyramid_tests::/)
67+
'
68+
#
69+
# Filter rationale (3-clause AND):
70+
#
71+
# 1. `!(test(/^hpc::/) - test(/^hpc::byte_scan/))`
72+
# Skip everything in `hpc::*` EXCEPT `hpc::byte_scan` (the scalar-fallback
73+
# path validated against the `cfg(miri)` SimdCaps bypass).
74+
#
75+
# 2. `!test(/^simd::tests::/)`
76+
# Skip the `simd::tests::*` suite. These exercise `crate::simd::F32x16`
77+
# etc. directly — types that re-export AVX/AVX2/AVX-512 intrinsics. Miri
78+
# rejects every one with "calling a function that requires unavailable
79+
# target features: avx". Same architectural class as `hpc::*`. Will
80+
# become miri-runnable when `crate::simd::*` gains a cfg(miri) dispatch
81+
# through `simd_nightly`.
82+
#
83+
# 3. `!test(/^hpc::framebuffer::pyramid_tests::/)`
84+
# The 3 pyramid tests take 19+ minutes EACH under Miri (large 2D scan
85+
# loops over SIMD-shaped data). Not a UB signal — pure runtime cost.
86+
# Re-enable once the test fixtures are sized down or the loops are
87+
# cfg(miri)-shortened.

src/hpc/mod.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,11 @@ pub mod framebuffer;
236236
/// Transcoded from Opus CELT for the HHTL cascade → waveform pipeline.
237237
pub mod audio;
238238

239+
/// Vertical streaming structs for the EdgeColumn SoA (D-CSV-11b, sprint-12).
240+
/// Per cognitive-substrate-convergence-v1.md §5 L-20.
241+
#[allow(missing_docs)]
242+
pub mod stream;
243+
239244
#[cfg(all(test, feature = "hpc-extras"))]
240245
mod e2e_tests {
241246
//! End-to-end pipeline test: Fingerprint → Node → Seal → Cascade → CLAM → Causality → BNN

src/hpc/simd_caps.rs

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,41 @@ pub fn simd_caps() -> SimdCaps {
100100
}
101101

102102
impl SimdCaps {
103+
/// Miri-only: CPUID inline asm is unsupported by Miri (it can't simulate
104+
/// CPU feature detection). Return an all-scalar capability set so any
105+
/// test reaching this LazyLock under Miri exercises the scalar fallback
106+
/// paths instead of aborting on the `__cpuid_count` call. Scoped to
107+
/// `cfg(miri)` — production builds and stable CI use the real detection
108+
/// below.
109+
#[cfg(miri)]
110+
fn detect() -> Self {
111+
Self {
112+
avx2: false,
113+
avx512f: false,
114+
avx512bw: false,
115+
avx512vl: false,
116+
avx512vpopcntdq: false,
117+
sse41: false,
118+
sse2: false,
119+
fma: false,
120+
avx512vnni: false,
121+
avx512vbmi: false,
122+
amx_tile: false,
123+
amx_int8: false,
124+
amx_bf16: false,
125+
avx512bf16: false,
126+
avxvnniint8: false,
127+
neon: false,
128+
asimd_dotprod: false,
129+
fp16: false,
130+
aes: false,
131+
sha2: false,
132+
crc32: false,
133+
}
134+
}
135+
103136
/// Detect CPU capabilities at runtime.
104-
#[cfg(target_arch = "x86_64")]
137+
#[cfg(all(target_arch = "x86_64", not(miri)))]
105138
fn detect() -> Self {
106139
// `__cpuid_count` is safe on x86_64 (Rust 1.87+): CPUID is always
107140
// available on x86_64 (guaranteed by the ABI) and has no side effects
@@ -140,7 +173,7 @@ impl SimdCaps {
140173
/// AArch64: detect NEON sub-features via `is_aarch64_feature_detected!`.
141174
/// NEON itself is mandatory (always true). The sub-features distinguish
142175
/// Pi Zero 2 W / Pi 3 (A53) from Pi 4 (A72) from Pi 5 (A76).
143-
#[cfg(target_arch = "aarch64")]
176+
#[cfg(all(target_arch = "aarch64", not(miri)))]
144177
fn detect() -> Self {
145178
Self {
146179
// x86 fields: all false on ARM

src/hpc/stream/inference.rs

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
//! InferenceStream — forward-iterator over a borrowed `&[InferenceRow]` slice.
2+
//! Per cognitive-substrate-convergence-v1.md §5 L-20: vertical streaming
3+
//! over the inference-mantissa lane of the EdgeColumn SoA. Used by the
4+
//! integer-SIMD MUL evaluation hot path (D-CSV-8 sprint-12 SIMD vec).
5+
//!
6+
//! Pure iterator scaffold; `par_inference_stream` rayon variant is sprint-13+.
7+
8+
// Local mirror of CausalEdge64 shape (bit-compatible with causal_edge::CausalEdge64).
9+
// No cross-crate import: ndarray is the producer; causal-edge is the consumer.
10+
11+
/// A single row of the EdgeColumn SoA, bit-compatible with
12+
/// `causal_edge::CausalEdge64` v2 layout.
13+
///
14+
/// Fields of interest for the inference-mantissa lane:
15+
/// - bits 46-49: signed 4-bit inference mantissa (−8..+7)
16+
/// - bits 53-58: W-slot corpus root handle (0..=63)
17+
#[repr(C, align(8))]
18+
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug, Default)]
19+
pub struct InferenceRow(pub u64);
20+
21+
impl InferenceRow {
22+
/// Read the 4-bit signed mantissa at bits 46-49 (matches causal-edge v2
23+
/// `inference_mantissa()` exactly — see `causal-edge/src/layout.rs`).
24+
///
25+
/// Sign-extension: extract 4-bit unsigned value, then sign-extend to i8
26+
/// via arithmetic left-shift trick: `(raw << 4) >> 4`.
27+
#[inline]
28+
pub fn inference_mantissa(&self) -> i8 {
29+
let raw = ((self.0 >> 46) & 0xF) as i8;
30+
(raw << 4) >> 4 // sign-extend 4 → 8 bits
31+
}
32+
33+
/// Read the W-slot at bits 53-58 (6 bits, 0..=63).
34+
///
35+
/// The W-slot is the witness corpus root handle per CausalEdge64 v2 L-6.
36+
/// Returns 0 for zero-initialized rows.
37+
#[inline]
38+
pub fn w_slot(&self) -> u8 {
39+
((self.0 >> 53) & 0x3F) as u8
40+
}
41+
}
42+
43+
/// Forward-iterator over a borrowed slice of [`InferenceRow`] values.
44+
///
45+
/// Provides vertical streaming access to the inference-mantissa lane of the
46+
/// EdgeColumn SoA. Yields `(index, &InferenceRow)` tuples so callers can
47+
/// correlate back to the originating row without maintaining external counters.
48+
///
49+
/// # Example
50+
/// ```rust
51+
/// use ndarray::hpc::stream::inference::{InferenceRow, InferenceStream};
52+
///
53+
/// let rows = vec![InferenceRow(0), InferenceRow(1 << 46)];
54+
/// let mut stream = InferenceStream::new(&rows);
55+
/// assert_eq!(stream.len(), 2);
56+
/// let (idx, row) = stream.next().unwrap();
57+
/// assert_eq!(idx, 0);
58+
/// ```
59+
pub struct InferenceStream<'a> {
60+
rows: &'a [InferenceRow],
61+
cursor: usize,
62+
}
63+
64+
impl<'a> InferenceStream<'a> {
65+
/// Construct a new stream over the given slice. The cursor starts at 0.
66+
pub fn new(rows: &'a [InferenceRow]) -> Self {
67+
Self { rows, cursor: 0 }
68+
}
69+
70+
/// Total number of rows in the underlying slice (not remaining).
71+
pub fn len(&self) -> usize {
72+
self.rows.len()
73+
}
74+
75+
/// Returns `true` if the underlying slice is empty.
76+
pub fn is_empty(&self) -> bool {
77+
self.rows.is_empty()
78+
}
79+
80+
/// Number of rows not yet yielded by the iterator.
81+
pub fn remaining(&self) -> usize {
82+
self.rows.len().saturating_sub(self.cursor)
83+
}
84+
85+
/// Reset the cursor to the beginning so the stream can be iterated again.
86+
pub fn reset(&mut self) {
87+
self.cursor = 0;
88+
}
89+
}
90+
91+
impl<'a> Iterator for InferenceStream<'a> {
92+
type Item = (usize, &'a InferenceRow);
93+
94+
fn next(&mut self) -> Option<Self::Item> {
95+
if self.cursor < self.rows.len() {
96+
let i = self.cursor;
97+
self.cursor += 1;
98+
Some((i, &self.rows[i]))
99+
} else {
100+
None
101+
}
102+
}
103+
104+
fn size_hint(&self) -> (usize, Option<usize>) {
105+
let rem = self.remaining();
106+
(rem, Some(rem))
107+
}
108+
}
109+
110+
impl<'a> ExactSizeIterator for InferenceStream<'a> {
111+
fn len(&self) -> usize {
112+
self.remaining()
113+
}
114+
}
115+
116+
#[cfg(test)]
117+
mod tests {
118+
use super::*;
119+
120+
#[test]
121+
fn test_inference_stream_empty() {
122+
let rows: &[InferenceRow] = &[];
123+
let mut stream = InferenceStream::new(rows);
124+
assert!(stream.is_empty());
125+
assert_eq!(stream.len(), 0);
126+
assert_eq!(stream.remaining(), 0);
127+
assert!(stream.next().is_none());
128+
}
129+
130+
#[test]
131+
fn test_inference_stream_yields_all() {
132+
let rows = vec![InferenceRow(0), InferenceRow(1), InferenceRow(2)];
133+
let stream = InferenceStream::new(&rows);
134+
let collected: Vec<_> = stream.collect();
135+
assert_eq!(collected.len(), 3);
136+
assert_eq!(collected[0].0, 0);
137+
assert_eq!(collected[1].0, 1);
138+
assert_eq!(collected[2].0, 2);
139+
assert_eq!(collected[0].1 as *const _, &rows[0] as *const _);
140+
assert_eq!(collected[2].1 as *const _, &rows[2] as *const _);
141+
}
142+
143+
#[test]
144+
fn test_mantissa_signed_extraction() {
145+
// Pack bits 46-49 = 0b1111 = 15 (raw), which is -1 in 4-bit two's complement.
146+
let raw_bits: u64 = 0b1111u64 << 46;
147+
let row = InferenceRow(raw_bits);
148+
assert_eq!(row.inference_mantissa(), -1);
149+
150+
// Pack bits 46-49 = 0b0111 = 7 (raw), positive maximum.
151+
let row_pos = InferenceRow(0b0111u64 << 46);
152+
assert_eq!(row_pos.inference_mantissa(), 7);
153+
154+
// Pack bits 46-49 = 0b1000 = 8 (raw), which is -8 in 4-bit two's complement.
155+
let row_min = InferenceRow(0b1000u64 << 46);
156+
assert_eq!(row_min.inference_mantissa(), -8);
157+
158+
// Zero mantissa.
159+
let row_zero = InferenceRow(0);
160+
assert_eq!(row_zero.inference_mantissa(), 0);
161+
}
162+
163+
#[test]
164+
fn test_w_slot_extraction() {
165+
// Pack bits 53-58 = 0b111111 = 63 (maximum W-slot value).
166+
let raw_bits: u64 = 0b111111u64 << 53;
167+
let row = InferenceRow(raw_bits);
168+
assert_eq!(row.w_slot(), 63);
169+
170+
// W-slot = 0 (zero row).
171+
let row_zero = InferenceRow(0);
172+
assert_eq!(row_zero.w_slot(), 0);
173+
174+
// W-slot = 1.
175+
let row_one = InferenceRow(1u64 << 53);
176+
assert_eq!(row_one.w_slot(), 1);
177+
178+
// W-slot = 32 (bit 58 set, bit 53 clear).
179+
let row_32 = InferenceRow(32u64 << 53);
180+
assert_eq!(row_32.w_slot(), 32);
181+
}
182+
183+
#[test]
184+
fn test_remaining_decrements() {
185+
let rows = vec![InferenceRow(0); 4];
186+
let mut stream = InferenceStream::new(&rows);
187+
assert_eq!(stream.remaining(), 4);
188+
stream.next();
189+
assert_eq!(stream.remaining(), 3);
190+
stream.next();
191+
assert_eq!(stream.remaining(), 2);
192+
stream.next();
193+
assert_eq!(stream.remaining(), 1);
194+
stream.next();
195+
assert_eq!(stream.remaining(), 0);
196+
// Exhausted: remaining stays 0.
197+
stream.next();
198+
assert_eq!(stream.remaining(), 0);
199+
}
200+
201+
#[test]
202+
fn test_reset_restarts() {
203+
let rows = vec![InferenceRow(10), InferenceRow(20)];
204+
let mut stream = InferenceStream::new(&rows);
205+
206+
// Exhaust the stream.
207+
assert!(stream.next().is_some());
208+
assert!(stream.next().is_some());
209+
assert!(stream.next().is_none());
210+
assert_eq!(stream.remaining(), 0);
211+
212+
// After reset, the stream yields from the beginning again.
213+
stream.reset();
214+
assert_eq!(stream.remaining(), 2);
215+
let first = stream.next().unwrap();
216+
assert_eq!(first.0, 0);
217+
assert_eq!(first.1 .0, 10);
218+
}
219+
}

src/hpc/stream/mod.rs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
//! Vertical streaming structs for the SoA columns.
2+
//! Per cognitive-substrate-convergence-v1.md §5 L-20.
3+
//!
4+
//! Sprint-12 scope (W-F4/5/6): `QualiaStream` + `InferenceStream` +
5+
//! `SplatFieldStream` forward-iterator scaffolds. Sprint-13+:
6+
//! `par_*` rayon variants once rayon is wired into the ndarray
7+
//! feature gate.
8+
9+
pub mod inference;
10+
pub mod qualia;
11+
pub mod splat_field;
12+
13+
pub use inference::{InferenceRow, InferenceStream};
14+
pub use qualia::{QualiaI4Row, QualiaStream};
15+
pub use splat_field::{SplatField, SplatFieldStream};

0 commit comments

Comments
 (0)