You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Complete the portable-simd backend started in the scaffold commit.
12 Sonnet agents (round-3-portable-simd fleet) populated each of the
12 sub-files in `src/simd_nightly/` via the A2A blackboard pattern at
`.claude/board/AGENT_LOG.md`.
Total: ~4,022 LOC of wrapper code + 76 parity tests.
Per-file (line counts at commit):
- f32_types.rs (395) — F32x16, F32x8
- f64_types.rs (307) — F64x8, F64x4
- u8_types.rs (1043) — U8x32, U8x64 + 26 in-file tests
- u_word_types.rs (520) — U16x32, U32x16, U32x8, U64x8, U64x4
- i8_types.rs (263) — I8x32, I8x64
- i_word_types.rs (449) — I16x16, I16x32, I32x16, I64x8
- masks.rs (196) — F32Mask16, F32Mask8, F64Mask8, F64Mask4
- bf16_types.rs (248) — BF16x16, BF16x8 (scalar emulation;
core::simd has no half-precision)
- f16_types.rs (220) — F16x16 (scalar IEEE-754 binary16 emulation)
- ops.rs (265) — Add/Sub/Mul/Div/Neg + bitwise + Default
macros, applied to all 17 numeric types
- exotic_methods.rs (329) — permute_bytes / shuffle_bytes / mask_blend /
unpack_lo_epi8 / unpack_hi_epi8 scalar
fallbacks for U8x32 + U8x64 (core::simd
has no native cross-lane byte ops or
bitmask-driven blend)
- tests.rs (815) — 76 parity tests vs scalar reference
30 types total (mirrors the AVX-512 / AVX2 polyfill surface 1:1).
All re-exported flat from `crate::simd_nightly::*` via the mod.rs
aggregator.
Verification:
rustup run nightly cargo check --features nightly-simd -p ndarray --lib
→ Finished, 0 errors
rustup run nightly cargo test --features nightly-simd -p ndarray --lib simd_nightly
→ test result: ok. 153 passed; 0 failed
cargo check --lib (stable, default features, no nightly-simd)
→ Finished, 0 errors (the existing intrinsics dispatch is unchanged)
Cross-agent findings worth folding into a handover note:
- `std::simd::StdFloat` is the trait that provides mul_add/sqrt/round/
floor on core::simd float vectors. `core::simd::num::SimdFloat`
provides reduce/min/max/clamp but NOT the transcendentals.
- `core::simd::cmp::SimdOrd` is needed for simd_min/simd_max on
integer vectors (SimdPartialOrd alone is not sufficient).
- `core::simd::Mask::to_bitmask()` always returns u64 regardless of
lane count. Wrappers cast `as u8` / `as u16` / `as u32` for narrower
bitmask shapes.
- `core::simd::Simd::swizzle` is `const N: usize` — cannot take a
runtime index vector. permute_bytes / shuffle_bytes need scalar
fallback. Same shape as the AVX-512F-without-VBMI fallback path in
simd_avx512.rs added in PR #142.
What this enables:
Miri can execute every method here (intrinsics-based backends are
opaque to miri). Consumers who want miri-runnable SIMD tests import
from `ndarray::simd_nightly::*` explicitly. The main polyfill via
`crate::simd::*` continues to use intrinsics — the nightly-simd
feature does NOT replace the production dispatch, it provides a
parallel namespace for miri tooling.
Fleet output in .claude/board/AGENT_LOG.md (round-3-portable-simd
section). 6 of 12 agents hit the same AGENT_LOG-write permission
pre-existing block from round-2 — backfilled by the main thread.
- Conversion helpers `f32_to_bf16_bits` (>> 16) and `bf16_bits_to_f32` (<< 16) are pure safe Rust.
1238
+
- 12 unit tests cover splat roundtrip, truncate/expand, slice/array roundtrip, LANES const, and known bit patterns (1.0 = 0x3F80, -1.0 = 0xBF80).
1239
+
-`rustup run nightly cargo check --features nightly-simd -p ndarray --lib`: zero errors in bf16_types.rs (pre-existing errors in other stub files owned by other agents).
-`std::simd::StdFloat` required (not `core::simd::num::SimdFloat`) for `mul_add/sqrt/round/floor` — `core::simd::num::SimdFloat` only covers `reduce_*` and `simd_min/max`; StdFloat provides the FP math methods.
1298
+
- Added `U64x4` and `U32x8` to `u_word_types.rs` as `F64x4::to_bits` and `F32x8::to_bits` companion types (agent #4 scope, but stubs were empty; noted in file header).
- Both: `simd_min`, `simd_max` (required `SimdOrd` import in addition to `SimdPartialOrd`)
1325
+
- Both: `saturating_add`, `saturating_sub`
1326
+
- Both: `pairwise_avg` via `cast::<u16>()` promotion (no native avg in `core::simd`)
1327
+
- Both: `cmpeq_mask`, `cmpgt_mask`, `movemask` — U8x64 → `u64`, U8x32 → `u32` (cast from `u64` since `to_bitmask()` always returns `u64`)
1328
+
- Both: `shr_epi16`, `shl_epi16` via `transmute` to `[u16; N]` scalar loop
1329
+
- Both: `nibble_popcount_lut()` as `from_array` with replicated 0,1,1,2,… pattern
1330
+
- Both: `Default` → `splat(0)`
1331
+
- 26 unit tests covering all methods
1332
+
1333
+
**Decisions:**`nibble_popcount_lut` kept here (pure `from_array`, no shuffle dependency). `permute_bytes`, `shuffle_bytes`, `mask_blend`, `unpack_lo/hi_epi8` deferred to agent #11 (`exotic_methods.rs`) per spec.
1334
+
1335
+
**Key finding:**`core::simd::Mask::to_bitmask()` returns `u64` for ALL lane widths including 32-lane vectors; U8x32 masks cast `as u32` to match AVX2 shape.
1336
+
1337
+
## 2026-05-13T21:45 — agent #5 i8-wrap (sonnet) [backfilled by main]
0 commit comments