Skip to content

Commit c24808f

Browse files
committed
WIP: simd_nightly module scaffold + 9/12 fleet files landed
Round-3-portable-simd fleet is in flight. Scaffold + 9 of 12 agent files already landed; 3 still working (u8_types, exotic_methods, tests). Committing the in-flight state per stop-hook policy; the remaining agents will land in follow-up commits before the draft PR opens. Scaffold: - `src/simd_nightly/mod.rs` — module aggregator with flat re-exports - `src/simd_nightly/_original_draft.rs` — preserved 5-type draft for agents to reference / supersede - `src/lib.rs` — `#![cfg_attr(feature = "nightly-simd", feature(portable_simd))]` crate-level gate + `pub mod simd_nightly;` - `Cargo.toml` — `nightly-simd = ["std"]` feature - `.claude/board/AGENT_LOG.md` — round-3-portable-simd manifest + early agent backfills (will receive more entries as remaining agents complete) 9/12 fleet files (line counts at this commit): - f32_types.rs (393) — agent #1: F32x16, F32x8 - f64_types.rs (345) — agent #2: F64x8, F64x4 - u_word_types.rs (145) — agent #4: U16x32, U32x16, U32x8, U64x8, U64x4 - i8_types.rs (266) — agent #5: I8x32, I8x64 - i_word_types.rs (430) — agent #6: I16x16, I16x32, I32x16, I64x8 - masks.rs (188) — agent #7: F32Mask16, F32Mask8, F64Mask8, F64Mask4 - bf16_types.rs (285) — agent #8: BF16x16, BF16x8 (scalar emulation) - f16_types.rs (254) — agent #9: F16x16 (scalar emulation) - ops.rs (273) — agent #10: Add/Sub/Mul/Div/BitAnd/BitOr/BitXor/Default impl macros across all types 3/12 still in flight: - u8_types.rs — agent #3: U8x32, U8x64 - exotic_methods.rs — agent #11: permute_bytes / shuffle_bytes / mask_blend / unpack_lo_epi8 / unpack_hi_epi8 / nibble_popcount_lut scalar fallbacks for U8x32/U8x64 - tests.rs — agent #12: parity tests vs scalar reference Verification deferred: `cargo +nightly check --features nightly-simd` will run after the last 3 agents land + the meta-orchestrator synthesis pass.
1 parent 74b1858 commit c24808f

19 files changed

Lines changed: 3370 additions & 0 deletions

.claude/board/AGENT_LOG.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1178,3 +1178,39 @@ SIMD savings disappear below GPU baseline.
11781178
integration candidate. The performance levers are GPU shader
11791179
optimization + wgpu buffer bandwidth — outside ndarray's scope.
11801180

1181+
1182+
1183+
# ═══════════════════════════════════════════════════════════════════
1184+
# Round 3-portable-simd — full 30-type coverage for crate::simd_nightly
1185+
# ═══════════════════════════════════════════════════════════════════
1186+
1187+
> **Branch:** `claude/portable-simd-nightly`
1188+
> **Goal:** expand `src/simd_nightly/` from 5-type draft (F32x16, F64x8,
1189+
> U8x64, U32x16, F32Mask16) to full 30-type coverage that mirrors the
1190+
> AVX-512 / AVX2 polyfill surface. Miri-runnable backend wrapping
1191+
> `core::simd::*`.
1192+
> **Fleet:** 12 Sonnet workers + 1 Sonnet meta. Same A2A pattern
1193+
> (`tee -a` to this file).
1194+
> **Permission:** the `.claude/settings.local.json` allow-list set up
1195+
> in round-2 still covers `tee -a /home/user/ndarray/.claude/board/AGENT_LOG.md`.
1196+
1197+
## Fleet manifest (round 3-portable-simd)
1198+
1199+
| # | Agent | Scope (file) | Types |
1200+
|---|---|---|---|
1201+
| 1 | f32-wrap | `src/simd_nightly/f32_types.rs` | F32x16, F32x8 |
1202+
| 2 | f64-wrap | `src/simd_nightly/f64_types.rs` | F64x8, F64x4 |
1203+
| 3 | u8-wrap | `src/simd_nightly/u8_types.rs` | U8x32, U8x64 |
1204+
| 4 | u-word-wrap | `src/simd_nightly/u_word_types.rs` | U16x32, U32x16, U64x8 |
1205+
| 5 | i8-wrap | `src/simd_nightly/i8_types.rs` | I8x32, I8x64 |
1206+
| 6 | i-word-wrap | `src/simd_nightly/i_word_types.rs` | I16x16, I16x32, I32x16, I64x8 |
1207+
| 7 | masks-wrap | `src/simd_nightly/masks.rs` | F32Mask16, F64Mask8 |
1208+
| 8 | bf16-emul | `src/simd_nightly/bf16_types.rs` | BF16x16, BF16x8 (scalar emulation — no `core::simd` half-prec) |
1209+
| 9 | f16-emul | `src/simd_nightly/f16_types.rs` | F16x16 (scalar emulation) |
1210+
| 10 | ops-macros | `src/simd_nightly/ops.rs` | Add/Sub/Mul/Div/BitAnd/BitOr/BitXor/Default macros applied to all types |
1211+
| 11 | exotic-fallbacks | `src/simd_nightly/exotic_methods.rs` | permute_bytes, shuffle_bytes scalar fallbacks for U8x32/U8x64 (`core::simd::swizzle` is const N — can't accept runtime idx vector) |
1212+
| 12 | parity-tests | `src/simd_nightly/tests.rs` | Comprehensive parity tests vs simd_avx512 / simd_avx2 references where they exist |
1213+
| M | meta-r3 | synthesis | Sonnet |
1214+
1215+
## Round-3-portable-simd entries (newest first)
1216+

Cargo.toml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,20 @@ serde = ["dep:serde"]
147147
std = ["num-traits/std", "matrixmultiply/std"]
148148
rayon = ["dep:rayon", "std"]
149149

150+
# Portable-SIMD backend (NIGHTLY ONLY). Routes `crate::simd::*` types
151+
# through `core::simd::*` instead of the architecture-specific intrinsics
152+
# in `simd_avx512.rs` / `simd_avx2.rs` / `simd_neon.rs`. The point is
153+
# miri compatibility: miri can execute `core::simd` semantics but treats
154+
# `_mm*_*` intrinsics as opaque. With this feature on, miri-run tests
155+
# exercise the actual SIMD code paths in consumer code (`hpc/byte_scan`,
156+
# `hpc/framebuffer`, etc.) and catch UB that the intrinsics backend hides.
157+
#
158+
# Requires `cargo +nightly` because `src/simd_nightly.rs` is gated on
159+
# `#![feature(portable_simd)]` (Rust unstable issue #86656). The default
160+
# build (stable 1.95) does NOT touch this; the existing intrinsics
161+
# cfg-dispatch in `simd.rs` remains the production path.
162+
nightly-simd = ["std"]
163+
150164
# HPC extras: blake3 hashing, p64 palette/NARS bridge, fractal manifold.
151165
# These pull in a non-trivial dependency tree; downstream crates such as
152166
# burn-ndarray that only need the core array layer can disable this with

rust_out

4.15 MB
Binary file not shown.

src/lib.rs

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@
66
// option. This file may not be copied, modified, or distributed
77
// except according to those terms.
88
#![crate_name = "ndarray"]
9+
// Crate-level nightly feature gate for the optional `nightly-simd` backend
10+
// (`src/simd_nightly/`). When the `nightly-simd` cargo feature is OFF
11+
// (default), this attribute is absent and stable rustc compiles the crate
12+
// normally. When ON, the crate requires nightly rustc to access
13+
// `core::simd::*` types.
14+
#![cfg_attr(feature = "nightly-simd", feature(portable_simd))]
915
#![doc(html_root_url = "https://docs.rs/ndarray/0.15/")]
1016
#![doc(html_logo_url = "https://rust-ndarray.github.io/images/rust-ndarray_logo.svg")]
1117
#![allow(
@@ -240,6 +246,14 @@ pub(crate) mod simd_avx512;
240246
#[allow(clippy::all, missing_docs, dead_code, unused_variables, unused_imports)]
241247
pub mod simd_avx2;
242248

249+
// Portable-SIMD backend — nightly-only. Wraps `core::simd::*` so miri can
250+
// execute the polyfill paths (intrinsic-based backends are opaque to
251+
// miri). Gated behind `nightly-simd` feature; the file itself requires
252+
// `#![feature(portable_simd)]` so it only compiles on nightly rustc.
253+
#[cfg(feature = "nightly-simd")]
254+
#[allow(clippy::all, missing_docs)]
255+
pub mod simd_nightly;
256+
243257
#[cfg(feature = "std")]
244258
#[allow(clippy::all, missing_docs, dead_code, unused_variables, unused_imports)]
245259
// AMX is an x86_64-only ISA (Intel Sapphire Rapids+); the module uses

src/simd.rs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,15 @@ pub const PREFERRED_I16_LANES: usize = 16;
203203
// at compile time → all types use native __m512/__m512d/__m512i.
204204
// The 256-bit types (F32x8, F64x4) also live in simd_avx512 (__m256).
205205

206+
// Note on the `nightly-simd` feature: it adds the `crate::simd_nightly`
207+
// module (a portable-simd backend wrapping `core::simd`) but does NOT
208+
// replace the intrinsics dispatch below. Full type-parity coverage
209+
// would require the nightly module to define ~30 types; the current
210+
// draft covers 5 (F32x16, F64x8, U8x64, U32x16, F32Mask16). Consumers
211+
// who want miri-runnable SIMD code import from `simd_nightly`
212+
// explicitly (e.g. `use ndarray::simd_nightly::F32x16`). The main
213+
// polyfill via `crate::simd::F32x16` continues to use intrinsics.
214+
206215
#[cfg(all(target_arch = "x86_64", target_feature = "avx512f"))]
207216
pub use crate::simd_avx512::{
208217
f32x16,

0 commit comments

Comments
 (0)