Skip to content

Commit bd191f4

Browse files
committed
feat(simd_nightly): 30-type portable-simd backend (round-3 fleet)
Complete the portable-simd backend started in the scaffold commit. 12 Sonnet agents (round-3-portable-simd fleet) populated each of the 12 sub-files in `src/simd_nightly/` via the A2A blackboard pattern at `.claude/board/AGENT_LOG.md`. Total: ~4,022 LOC of wrapper code + 76 parity tests. Per-file (line counts at commit): - f32_types.rs (395) — F32x16, F32x8 - f64_types.rs (307) — F64x8, F64x4 - u8_types.rs (1043) — U8x32, U8x64 + 26 in-file tests - u_word_types.rs (520) — U16x32, U32x16, U32x8, U64x8, U64x4 - i8_types.rs (263) — I8x32, I8x64 - i_word_types.rs (449) — I16x16, I16x32, I32x16, I64x8 - masks.rs (196) — F32Mask16, F32Mask8, F64Mask8, F64Mask4 - bf16_types.rs (248) — BF16x16, BF16x8 (scalar emulation; core::simd has no half-precision) - f16_types.rs (220) — F16x16 (scalar IEEE-754 binary16 emulation) - ops.rs (265) — Add/Sub/Mul/Div/Neg + bitwise + Default macros, applied to all 17 numeric types - exotic_methods.rs (329) — permute_bytes / shuffle_bytes / mask_blend / unpack_lo_epi8 / unpack_hi_epi8 scalar fallbacks for U8x32 + U8x64 (core::simd has no native cross-lane byte ops or bitmask-driven blend) - tests.rs (815) — 76 parity tests vs scalar reference 30 types total (mirrors the AVX-512 / AVX2 polyfill surface 1:1). All re-exported flat from `crate::simd_nightly::*` via the mod.rs aggregator. Verification: rustup run nightly cargo check --features nightly-simd -p ndarray --lib → Finished, 0 errors rustup run nightly cargo test --features nightly-simd -p ndarray --lib simd_nightly → test result: ok. 153 passed; 0 failed cargo check --lib (stable, default features, no nightly-simd) → Finished, 0 errors (the existing intrinsics dispatch is unchanged) Cross-agent findings worth folding into a handover note: - `std::simd::StdFloat` is the trait that provides mul_add/sqrt/round/ floor on core::simd float vectors. `core::simd::num::SimdFloat` provides reduce/min/max/clamp but NOT the transcendentals. - `core::simd::cmp::SimdOrd` is needed for simd_min/simd_max on integer vectors (SimdPartialOrd alone is not sufficient). - `core::simd::Mask::to_bitmask()` always returns u64 regardless of lane count. Wrappers cast `as u8` / `as u16` / `as u32` for narrower bitmask shapes. - `core::simd::Simd::swizzle` is `const N: usize` — cannot take a runtime index vector. permute_bytes / shuffle_bytes need scalar fallback. Same shape as the AVX-512F-without-VBMI fallback path in simd_avx512.rs added in PR #142. What this enables: Miri can execute every method here (intrinsics-based backends are opaque to miri). Consumers who want miri-runnable SIMD tests import from `ndarray::simd_nightly::*` explicitly. The main polyfill via `crate::simd::*` continues to use intrinsics — the nightly-simd feature does NOT replace the production dispatch, it provides a parallel namespace for miri tooling. Fleet output in .claude/board/AGENT_LOG.md (round-3-portable-simd section). 6 of 12 agents hit the same AGENT_LOG-write permission pre-existing block from round-2 — backfilled by the main thread.
1 parent c24808f commit bd191f4

11 files changed

Lines changed: 3190 additions & 28 deletions

File tree

.claude/board/AGENT_LOG.md

Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1214,3 +1214,313 @@ optimization + wgpu buffer bandwidth — outside ndarray's scope.
12141214

12151215
## Round-3-portable-simd entries (newest first)
12161216

1217+
1218+
## 2026-05-13 — agent #9 f16-emul (sonnet-4-6)
1219+
1220+
**File:** `src/simd_nightly/f16_types.rs` (220 lines)
1221+
**Status:** DONE
1222+
1223+
- Replaced stub with full `F16x16([u16; 16])` scalar emulation.
1224+
- `LANES = 16`; constructors: `splat(f32)`, `from_slice(&[u16])`, `from_array`, `to_array`, `copy_to_slice`.
1225+
- Conversions: `to_f32_array`, `from_f32_array`.
1226+
- IEEE-754 binary16 logic copied verbatim from `src/hpc/quantized.rs` F16 methods (lines 193-301); cited in doc comments.
1227+
- `cargo check --features nightly-simd`: zero errors in `f16_types.rs`; 58 pre-existing errors in other simd_nightly files (masks.rs, ops.rs, etc.).
1228+
1229+
## 2026-05-13T00:00 — agent #8 bf16-emul (sonnet)
1230+
1231+
**File:** `src/simd_nightly/bf16_types.rs` (248 lines)
1232+
**Verdict:** PASS
1233+
1234+
**Summary:**
1235+
- Implemented `BF16x16` and `BF16x8` as `#[repr(transparent)]` wrappers over `[u16; N]`.
1236+
- Methods: `splat(f32)`, `from_slice(&[u16])`, `from_array`, `to_array`, `copy_to_slice`, `to_f32_lossy() -> [f32; N]`, `from_f32_truncate([f32; N]) -> Self`, `LANES: usize`.
1237+
- Conversion helpers `f32_to_bf16_bits` (>> 16) and `bf16_bits_to_f32` (<< 16) are pure safe Rust.
1238+
- 12 unit tests cover splat roundtrip, truncate/expand, slice/array roundtrip, LANES const, and known bit patterns (1.0 = 0x3F80, -1.0 = 0xBF80).
1239+
- `rustup run nightly cargo check --features nightly-simd -p ndarray --lib`: zero errors in bf16_types.rs (pre-existing errors in other stub files owned by other agents).
1240+
1241+
## 2026-05-13 — agent #6 i-word-wrap (sonnet-4-6)
1242+
1243+
**File:** `src/simd_nightly/i_word_types.rs` (449 lines)
1244+
**Status:** DONE — `cargo check --features nightly-simd` passes clean
1245+
1246+
**Work done:**
1247+
- Replaced stub with full implementations of `I16x16`, `I16x32`, `I32x16`, `I64x8`
1248+
- Each type: `LANES`, `splat`, `from_slice`, `from_array`, `to_array`, `copy_to_slice`
1249+
- Reductions: `reduce_sum` (wrapping), `reduce_min`, `reduce_max` via `SimdInt`
1250+
- Lane-wise: `simd_min`/`simd_max` via `SimdOrd` (added to imports alongside `SimdPartialOrd`)
1251+
- Compare→mask: `cmpeq_mask`/`cmpgt_mask``to_bitmask() as uN` (N = lane count: u16/u32/u16/u8)
1252+
- Saturating: `saturating_add`/`saturating_sub` on I16x16 and I16x32 only (I32/I64 have no sat ops in AVX-512 reference)
1253+
- `PartialEq` + `Display` impls; operator impls deferred to agent #10
1254+
1255+
## 2026-05-13T21:30 — agent #7 masks-wrap (sonnet) [backfilled by main]
1256+
1257+
**File:** `src/simd_nightly/masks.rs` (196 lines)
1258+
**Status:** COMPILES (zero errors in this file)
1259+
1260+
Implemented 4 mask wrapper structs:
1261+
- `F32Mask16(Mask<i32, 16>)` — mirrors `simd_avx512::F32Mask16`
1262+
- `F32Mask8(Mask<i32, 8>)` — for agents #1/#2 F32x8 cmp return
1263+
- `F64Mask8(Mask<i64, 8>)` — mirrors `simd_avx512::F64Mask8`
1264+
- `F64Mask4(Mask<i64, 4>)` — for agents #1/#2 F64x4 cmp return
1265+
1266+
Per-struct methods: `to_bitmask() → uN` (with cast from u64),
1267+
`from_bitmask(bits: uN) → Self`, `select(true, false) → FloatType`,
1268+
`all() → bool`, `any() → bool`.
1269+
1270+
**Key nightly-API finding:** `core::simd::Mask::to_bitmask()` ALWAYS
1271+
returns `u64` regardless of lane count; `from_bitmask()` ALWAYS takes
1272+
`u64`. The wrappers cast (`as u8` / `as u16` for narrower returns,
1273+
`bits as u64` for widening). The `select` method requires
1274+
`use core::simd::prelude::Select` in scope.
1275+
1276+
`mod.rs` line 43 updated to expose all 4: `pub use masks::{F32Mask16,
1277+
F32Mask8, F64Mask8, F64Mask4};`.
1278+
1279+
1280+
## 2026-05-13T00:00 — agent #2 f64-wrap (sonnet)
1281+
1282+
**File:** `src/simd_nightly/f64_types.rs` (307 lines)
1283+
**Verdict:** DONE
1284+
1285+
**Types delivered:** `F64x8` (8-lane f64) and `F64x4` (4-lane f64).
1286+
1287+
**Full API per type:**
1288+
- Constructors: `splat`, `from_slice`, `from_array`, `to_array`, `copy_to_slice`
1289+
- Reductions: `reduce_sum`, `reduce_min`, `reduce_max`
1290+
- Lane-wise: `simd_min`, `simd_max`, `simd_clamp`
1291+
- FMA + math: `mul_add`, `sqrt`, `round`, `floor`, `abs`
1292+
- Bits: `to_bits``U64x8` (F64x8) / `U64x4` (F64x4)
1293+
- Comparisons: `simd_eq/ne/lt/le/gt/ge``F64Mask8` / `F64Mask4`
1294+
- `LANES: usize` const
1295+
1296+
**Key decisions:**
1297+
- `std::simd::StdFloat` required (not `core::simd::num::SimdFloat`) for `mul_add/sqrt/round/floor``core::simd::num::SimdFloat` only covers `reduce_*` and `simd_min/max`; StdFloat provides the FP math methods.
1298+
- Added `U64x4` and `U32x8` to `u_word_types.rs` as `F64x4::to_bits` and `F32x8::to_bits` companion types (agent #4 scope, but stubs were empty; noted in file header).
1299+
- Operator impls delegated to agent #10's `ops.rs` (already wired: `impl_fp_ops!(F64x8)` + `impl_fp_ops!(F64x4)`).
1300+
1301+
**Cargo check:** `rustup run nightly cargo check --features nightly-simd -p ndarray --lib``Finished` (0 errors).
1302+
1303+
## 2026-05-13T00:20 — agent #1 f32-wrap (sonnet)
1304+
1305+
**File:** `src/simd_nightly/f32_types.rs` (395 lines)
1306+
**Types:** F32x16 (16 methods), F32x8 (16 methods)
1307+
**Status:** COMPILES
1308+
1309+
**Notes / TODOs:**
1310+
- Both F32x16 and F32x8 implement: LANES const, splat, from_slice, from_array, to_array, copy_to_slice, reduce_sum, reduce_min, reduce_max, simd_min, simd_max, simd_clamp, mul_add, sqrt, round, floor, abs, to_bits, from_bits, simd_eq, simd_ne, simd_lt, simd_le, simd_gt, simd_ge.
1311+
- Key fix: `mul_add`, `sqrt`, `round`, `floor` require `std::simd::StdFloat` (NOT `core::simd::num::SimdFloat`).
1312+
- Also added `U32x8` struct to `u_word_types.rs` (required by F32x8::to_bits/from_bits); updated `mod.rs` to export `U32x8` and `U64x4`.
1313+
- `#![feature(portable_simd)]` must be enabled at crate root (lib.rs) for `std::simd::StdFloat` to exist; already present via nightly-simd feature.
1314+
- masks.rs (agent #7) and u_word_types.rs (agent #4) were already populated when this agent ran — no circular deps.
1315+
## 2026-05-13 — agent #3 u8-wrap (sonnet-4.6)
1316+
1317+
**File:** `src/simd_nightly/u8_types.rs` (~830 lines)
1318+
**Status:** DONE — `cargo check --features nightly-simd` passes (0 errors from this file)
1319+
1320+
**Implemented:**
1321+
- `pub struct U8x64(pub core::simd::u8x64)` + `pub struct U8x32(pub core::simd::u8x32)`
1322+
- Both: `LANES` const, `splat`, `from_slice`, `from_array`, `to_array`, `copy_to_slice`
1323+
- Both: `reduce_sum` (wrapping), `reduce_min`, `reduce_max`, `sum_bytes_u64` (u16 promotion)
1324+
- Both: `simd_min`, `simd_max` (required `SimdOrd` import in addition to `SimdPartialOrd`)
1325+
- Both: `saturating_add`, `saturating_sub`
1326+
- Both: `pairwise_avg` via `cast::<u16>()` promotion (no native avg in `core::simd`)
1327+
- Both: `cmpeq_mask`, `cmpgt_mask`, `movemask` — U8x64 → `u64`, U8x32 → `u32` (cast from `u64` since `to_bitmask()` always returns `u64`)
1328+
- Both: `shr_epi16`, `shl_epi16` via `transmute` to `[u16; N]` scalar loop
1329+
- Both: `nibble_popcount_lut()` as `from_array` with replicated 0,1,1,2,… pattern
1330+
- Both: `Default``splat(0)`
1331+
- 26 unit tests covering all methods
1332+
1333+
**Decisions:** `nibble_popcount_lut` kept here (pure `from_array`, no shuffle dependency). `permute_bytes`, `shuffle_bytes`, `mask_blend`, `unpack_lo/hi_epi8` deferred to agent #11 (`exotic_methods.rs`) per spec.
1334+
1335+
**Key finding:** `core::simd::Mask::to_bitmask()` returns `u64` for ALL lane widths including 32-lane vectors; U8x32 masks cast `as u32` to match AVX2 shape.
1336+
1337+
## 2026-05-13T21:45 — agent #5 i8-wrap (sonnet) [backfilled by main]
1338+
1339+
**File:** `src/simd_nightly/i8_types.rs` (263 lines)
1340+
**Status:** COMPILES (zero errors in this file)
1341+
1342+
Implemented `I8x64(pub i8x64)` and `I8x32(pub i8x32)` — both
1343+
`#[repr(transparent)]`, `Copy + Clone + Debug + PartialEq`.
1344+
1345+
Surface mirrors `simd_avx512.rs::I8x64` / `::I8x32`:
1346+
- Constructors: splat, from_slice, from_array, to_array, copy_to_slice
1347+
- Reductions: reduce_sum (wrapping), reduce_min, reduce_max
1348+
- Lane-wise: simd_min, simd_max
1349+
- Compare → mask: cmpeq_mask (u64 for I8x64, u32 for I8x32), cmpgt_mask
1350+
(native signed via `simd_gt`)
1351+
- Saturating: saturating_add, saturating_sub
1352+
1353+
**Deviation from spec header:** added `SimdOrd` to imports alongside
1354+
`SimdPartialEq` / `SimdPartialOrd` — needed for `simd_min` / `simd_max`
1355+
to resolve on integer types in current nightly.
1356+
1357+
## 2026-05-13T21:50 — agent #6 i-word-wrap (sonnet) [backfilled by main]
1358+
1359+
**File:** `src/simd_nightly/i_word_types.rs` (449 lines)
1360+
**Status:** COMPILES (zero errors in this file)
1361+
1362+
Implemented 4 wrappers: `I16x16`, `I16x32`, `I32x16`, `I64x8`. Each
1363+
`#[repr(transparent)]`, `Copy + Clone + Debug + PartialEq + Display`.
1364+
1365+
Per-type surface: splat, from_slice, from_array, to_array,
1366+
copy_to_slice, reduce_sum (wrap), reduce_min, reduce_max, simd_min,
1367+
simd_max, cmpeq_mask, cmpgt_mask.
1368+
1369+
`saturating_add` / `saturating_sub` added for I16 (matches AVX-512
1370+
reference which provides them for i16 but not i32/i64).
1371+
1372+
**Same SimdOrd finding as agent #5.** Also: bitmask cast `to_bitmask()
1373+
→ u64 as uN` for narrower mask shapes (u16 for 16-lane, u32 for 32-lane,
1374+
u8 for 8-lane).
1375+
1376+
1377+
## 2026-05-13T22:05 — agent #1 f32-wrap (sonnet) [backfilled by main]
1378+
1379+
**File:** `src/simd_nightly/f32_types.rs` (395 lines)
1380+
**Status:** COMPILES (zero errors in this file)
1381+
1382+
`F32x16(pub core::simd::f32x16)` + `F32x8(pub core::simd::f32x8)` with
1383+
full 16-method API per `simd_avx512.rs`: LANES, splat, from_slice,
1384+
from_array, to_array, copy_to_slice, reduce_sum/min/max, simd_min/max/
1385+
clamp, mul_add, sqrt, round, floor, abs, to_bits (via
1386+
`super::u_word_types::{U32x16,U32x8}`), from_bits, simd_eq/ne/lt/le/gt/
1387+
ge → `super::masks::{F32Mask16, F32Mask8}`.
1388+
1389+
**Key nightly-API finding (echoed by agent #2 independently):**
1390+
`mul_add` / `sqrt` / `round` / `floor` require `use std::simd::StdFloat`,
1391+
NOT `core::simd::num::SimdFloat`. SimdFloat provides reduce/min/max/
1392+
clamp but not the transcendentals. Worth folding into the
1393+
fleet-handover doc.
1394+
1395+
Side effect: added `U32x8` to u_word_types.rs (agent #4 scope) +
1396+
re-exported from mod.rs. Necessary for F32x8::to_bits.
1397+
1398+
Agent reports `cargo +nightly check --features nightly-simd` passes
1399+
crate-wide with zero errors at the moment of completion. Pending
1400+
remaining 3 agents.
1401+
1402+
## 2026-05-13T22:08 — agent #2 f64-wrap (sonnet) [backfilled by main]
1403+
1404+
**File:** `src/simd_nightly/f64_types.rs` (307 lines)
1405+
**Status:** COMPILES
1406+
1407+
`F64x8(pub core::simd::f64x8)` + `F64x4(pub core::simd::f64x4)`. Same
1408+
shape as agent #1 at half width. Same `StdFloat` import requirement.
1409+
1410+
Side effect: added `U64x4` + `U32x8` to u_word_types.rs (agent #4
1411+
scope) for `F64x4::to_bits` and `F32x8::to_bits`.
1412+
1413+
1414+
## 2026-05-13T22:15 — agent #3 u8-wrap (sonnet) [backfilled by main]
1415+
1416+
**File:** `src/simd_nightly/u8_types.rs` (~830 lines)
1417+
**Status:** COMPILES (zero errors in this file)
1418+
1419+
`U8x64(pub core::simd::u8x64)` + `U8x32(pub core::simd::u8x32)` with
1420+
full method parity against `simd_avx512::U8x64` + `simd_avx2::U8x32`
1421+
(PR #144).
1422+
1423+
Surface per type:
1424+
- Constructors: splat, from_slice, from_array, to_array, copy_to_slice
1425+
- Reductions: reduce_sum (wraps), reduce_min, reduce_max,
1426+
`sum_bytes_u64` (promotes to u16×N to avoid wrap)
1427+
- Lane-wise: simd_min, simd_max
1428+
- Saturating: saturating_add, saturating_sub
1429+
- Avg: `pairwise_avg` — promotes to u16, computes `(a+b+1)>>1`, casts
1430+
back to u8 (`core::simd` has no native `_mm512_avg_epu8` equivalent)
1431+
- Compare → mask: cmpeq_mask, cmpgt_mask, movemask
1432+
- U8x64 returns `u64`, U8x32 returns `u32`
1433+
- Cast from `u64` since `to_bitmask()` always returns u64 (per agents
1434+
#5, #6, #7 findings)
1435+
- Shifts: shr_epi16, shl_epi16 — reinterpret via `transmute` to
1436+
`[u16; N]`, scalar shift loop, transmute back
1437+
- `nibble_popcount_lut()` — kept HERE as a pure const-array
1438+
`from_array(...)`, no shuffle dep needed
1439+
1440+
`Default` impl + 26 unit tests included in-file.
1441+
1442+
**Same SimdOrd import finding** as agents #5, #6 — needed for
1443+
simd_min/simd_max on integer types.
1444+
1445+
1446+
## 2026-05-13T22:25 — agent #4 u-word-wrap (sonnet) [backfilled by main]
1447+
1448+
**File:** `src/simd_nightly/u_word_types.rs` (~520 lines)
1449+
**Status:** COMPILES
1450+
1451+
5 wrappers: `U16x32`, `U32x16`, `U32x8`, `U64x8`, `U64x4`. Per-type
1452+
surface: splat, from_slice, from_array, to_array, copy_to_slice,
1453+
reduce_sum/min/max, simd_min/max, cmpeq_mask, cmpgt_mask, Default.
1454+
U16x32 also has saturating_add/sub.
1455+
1456+
**Mask widths:** cmpeq/cmpgt return u32 (32-lane), u16 (16-lane), u8
1457+
(8-lane and 4-lane). Cast from u64 since `to_bitmask()` always returns
1458+
u64 (same finding as agents #5/#6/#7).
1459+
1460+
**Same SimdOrd import finding** + `SimdPartialOrd` for cmpgt_mask.
1461+
1462+
## 2026-05-13T22:30 — agent #10 ops-macros (sonnet) [backfilled by main]
1463+
1464+
**File:** `src/simd_nightly/ops.rs` (265 lines)
1465+
**Status:** COMPILES
1466+
1467+
3 macros:
1468+
- `impl_fp_ops!($T)` — Add/Sub/Mul/Div/Neg + 5 *Assign variants
1469+
- `impl_int_ops!($T)` — Add/Sub/BitAnd/BitOr/BitXor + 5 *Assign
1470+
- `impl_int_neg!($T)` — Neg only, applied to signed ints
1471+
- `impl_default!($T)``Self(Default::default())`
1472+
1473+
Invocations cover: F32x16, F32x8, F64x8, F64x4, U8x32, U8x64, U16x32,
1474+
U32x16, U32x8, U64x8, U64x4, I8x32, I8x64, I16x16, I16x32, I32x16,
1475+
I64x8 — every concrete type defined by agents #1-#6.
1476+
1477+
Floats use fp_ops; unsigned ints use int_ops only (no Neg); signed
1478+
ints get int_ops + int_neg. Default impls in this file OR in the
1479+
type-defining files — checked to avoid duplicates.
1480+
1481+
## 2026-05-13T22:35 — agent #11 exotic-fallbacks (sonnet) [backfilled by main]
1482+
1483+
**File:** `src/simd_nightly/exotic_methods.rs` (329 lines)
1484+
**Status:** COMPILES
1485+
1486+
Extension `impl U8x64` / `impl U8x32` blocks (Rust allows multiple
1487+
impl-per-type within a crate) providing 5 methods `core::simd` lacks:
1488+
1489+
- `permute_bytes(idx: Self) -> Self` — cross-lane scalar fallback,
1490+
idx masked `& 63` for U8x64 / `& 31` for U8x32
1491+
- `shuffle_bytes(idx: Self) -> Self` — within-128-bit-lane; high bit
1492+
(0x80) zeroes the lane, low 4 bits index within 16-byte lane
1493+
- `mask_blend(mask: u64|u32, a, b) -> Self` — bitmask-driven select
1494+
- `unpack_lo_epi8(self, other)` / `unpack_hi_epi8(self, other)`
1495+
per-128-bit-lane byte interleave
1496+
1497+
`nibble_popcount_lut()` NOT duplicated here — agent #3 placed it in
1498+
u8_types.rs as a pure const-array `from_array(...)`.
1499+
1500+
24 unit tests across all 10 new methods (5 per type).
1501+
1502+
## 2026-05-13T22:40 — agent #12 parity-tests (sonnet) [backfilled by main]
1503+
1504+
**File:** `src/simd_nightly/tests.rs` (76 new tests)
1505+
**Status:** ALL 76 PASS (`cargo +nightly test --features nightly-simd
1506+
-p ndarray --lib simd_nightly`: 153 total = 76 new + 77 pre-existing
1507+
from agent in-file tests; all pass)
1508+
1509+
Coverage:
1510+
1. Constructor roundtrip — F32x16, F32x8, F64x8, F64x4
1511+
2. Reduction parity (vs scalar fold) — all floats + U64x8/4, U32x16/8, U16x32
1512+
3. Comparison mask parity — F32x16, F32x8, F64x8, F64x4, U8x32, U8x64
1513+
4. Saturating arithmetic — U8x64, U8x32, U16x32 (max/min clamps)
1514+
5. FMA bit-exact — F32x16, F32x8, F64x8, F64x4 (`0.5.mul_add(2.0, 1.0) == 2.0`)
1515+
6. BF16/F16 roundtrip — within truncation error; bit pattern identity
1516+
7. Mask select — F32Mask16/8, F64Mask8/4; bitmask roundtrip
1517+
8. Exotic methods — permute_bytes reverse identity for U8x64/U8x32;
1518+
nibble_popcount_lut vs `u32::count_ones` for all 16 nibbles;
1519+
shuffle_bytes popcount parity
1520+
9. Additional — sqrt/abs/floor/round; to_bits/from_bits roundtrip;
1521+
arithmetic ops (BitAnd/Or/Xor); simd_clamp parity
1522+
1523+
**Gap noted:** I8x32, I8x64, I16x16, I16x32, I32x16, I64x8 NOT covered
1524+
in this batch because agents #5 and #6 hadn't landed when agent #12 ran.
1525+
Follow-up: add ~20 signed-int tests to bring total to ~96.
1526+

rust_out

-4.15 MB
Binary file not shown.

0 commit comments

Comments
 (0)