Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .cargo/config-avx512.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[build]
# Explicit AVX-512 config — `x86-64-v4`. Use with:
# cargo --config .cargo/config-avx512.toml build
# cargo --config .cargo/config-avx512.toml test
#
# Compiles `target_feature = "avx512f"` on, so `src/simd.rs` selects the
# `simd_avx512` backend with native `__m512` / `__m512d` / `__m512i`
# storage. Required for the Sapphire Rapids / Granite Rapids hot paths
# (`f32_to_bf16_batch_rne`, the AVX-512BF16 BF16 lanes, the AMX tiles).
#
# Binary produced here will SIGILL on AVX2-only silicon — only use on
# hosts that report `avx512f` in `/proc/cpuinfo`. For shipping a single
# release artifact that adapts at process start, see the LazyLock runtime
# dispatch path in § 7.1 of the architecture doc instead.
[target.'cfg(target_arch = "x86_64")']
rustflags = ["-Ctarget-cpu=x86-64-v4"]
13 changes: 13 additions & 0 deletions .cargo/config-native.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[build]
# Native build config — `target-cpu = "native"`. Use with:
# cargo --config .cargo/config-native.toml build
# cargo --config .cargo/config-native.toml test
#
# rustc resolves the build host's CPUID at invocation and enables every
# `target_feature` the host CPU advertises. `simd.rs` then picks the
# matching backend (typically `simd_avx512` on modern dev machines).
#
# Produces a binary tuned for the developer's exact silicon. The result
# is NOT portable: do not distribute artifacts built with this config.
[target.'cfg(target_arch = "x86_64")']
rustflags = ["-Ctarget-cpu=native"]
28 changes: 25 additions & 3 deletions .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,26 @@
[build]
# No global target-cpu. Each kernel uses #[target_feature(enable = "avx512f")]
# per-function, with LazyLock runtime detection. One binary, all ISAs.
# Railway (AVX-512) and GitHub CI (AVX2) use the same binary.
# Default cargo config — x86-64-v3 (AVX2) baseline. Portable across all
# x86_64 silicon shipping since ~2013 (Haswell+). This is what GitHub CI
# runs against and what `cargo build` produces for general distribution.
#
# Why v3 and not "no target-cpu":
# `src/simd_avx2.rs` composes `F32x16` as two `__m256` halves (AVX
# intrinsics), and the `simd_avx2_*` op funcs use `__m256i` (AVX2).
# Without a global v3 baseline, rustc compiles to x86-64 generic (SSE2)
# and those intrinsics emit instructions the CPU never executes →
# SIGILL at run time, exactly the PR #170 CI failure mode.
#
# AVX-512 builds: use `--config .cargo/config-avx512.toml` (or
# `CARGO_BUILD_RUSTFLAGS='-Ctarget-cpu=x86-64-v4'`). The simd.rs dispatch
# arms key off `target_feature = "avx512f"`; under v4 they pick the
# `simd_avx512` backend (native `__m512` / `__m512d` / `__m512i`).
#
# Build-machine-tuned binaries: use `--config .cargo/config-native.toml`
# (`target-cpu = "native"`); rustc resolves the host CPUID at compile.
#
# Runtime LazyLock dispatch (one release binary, heterogeneous deployment
# silicon) is a fifth opt-in mode — see § 7.1 of
# .claude/knowledge/simd-dispatch-architecture.md. Reserved for the
# release-binary distribution path; never the dev / CI default.
[target.'cfg(target_arch = "x86_64")']
rustflags = ["-Ctarget-cpu=x86-64-v3"]
25 changes: 19 additions & 6 deletions src/simd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -198,10 +198,17 @@ pub const PREFERRED_I16_LANES: usize = 16;
// x86_64: re-export based on tier
// ============================================================================

// Compile-time AVX-512 dispatch via target_feature.
// With target-cpu=x86-64-v4 (.cargo/config.toml), avx512f is enabled
// at compile time → all types use native __m512/__m512d/__m512i.
// The 256-bit types (F32x8, F64x4) also live in simd_avx512 (__m256).
// Compile-time SIMD dispatch via target_feature. The cargo config
// chosen at build (.cargo/config.toml = v3 default / config-avx512.toml
// = v4 / config-native.toml = native) sets the `target_feature` flags
// that select exactly one arm below.
// * v3 / GitHub-CI default → `target_feature = "avx2"` only →
// simd_avx2 backend (F32x16 = two-half (f32x8, f32x8), int wrappers
// are scalar polyfills via the `avx2_int_type!` macro).
// * v4 (or native on AVX-512 host) → `target_feature = "avx512f"` →
// simd_avx512 backend with native __m512 / __m512d / __m512i.
// * aarch64 → simd_neon backend.
// * everything else (wasm32, riscv, etc.) → scalar fallback.

// Note on the `nightly-simd` feature: it adds the `crate::simd_nightly`
// module (a portable-simd backend wrapping `core::simd`) but does NOT
Expand Down Expand Up @@ -272,10 +279,16 @@ pub use crate::simd_avx512::{f32_to_bf16_batch_rne, f32_to_bf16_scalar_rne};
#[cfg(all(target_arch = "x86_64", target_feature = "avx512bf16"))]
pub use crate::simd_avx512::{BF16x16, BF16x8};

#[cfg(all(target_arch = "x86_64", not(target_feature = "avx512f")))]
// AVX2 baseline arm — selected by the `x86-64-v3` cargo default. Requires
// `target_feature = "avx2"` explicitly: building x86_64-without-AVX2 (the
// generic `x86-64` baseline = SSE2) would otherwise pick this arm and
// then SIGILL on the `__m256` / `__m256i` intrinsics inside the wrappers.
// Whoever wants no-AVX2 must pick the scalar fallback path (currently
// non-x86 only — see TD-SIMD-7 in the architecture doc).
#[cfg(all(target_arch = "x86_64", target_feature = "avx2", not(target_feature = "avx512f")))]
pub use crate::simd_avx512::{f32x8, f64x4, i16x16, i8x32, F32x8, F64x4, I16x16, I8x32};

#[cfg(all(target_arch = "x86_64", not(target_feature = "avx512f")))]
#[cfg(all(target_arch = "x86_64", target_feature = "avx2", not(target_feature = "avx512f")))]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore x86_64 fallback when AVX2 is unavailable

Requiring target_feature = "avx2" on the x86_64 re-export arm removes all F32x16/F64x8/integer SIMD type exports for x86_64 builds that are not compiled with AVX2 (for example downstream users building this crate with default x86_64 flags or x86-64-v2). Because this file defines unconditional APIs like simd_exp_f32(x: F32x16), those builds now fail at compile time due to missing type definitions instead of falling back; .cargo/config.toml in this repo does not protect dependency builds in other workspaces.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3ad707 (already on the merged branch) — reverted the target_feature = "avx2" predicate tightening for exactly this reason. Same root cause surfaced in our CI: RUSTFLAGS="-D warnings" env in ci.yaml overrides .cargo/config.toml rustflags entirely (cargo doesn't merge — env wins), so even our own GitHub runner landed on x86-64 baseline without target_feature = "avx2" set, leaving no matching arm → consumer references to crate::simd::F32x16 failed to compile.

Predicate is back to not(avx512f). Per-function #[target_feature(enable = "avx,avx2,fma")] annotations inside simd_avx2.rs gate the actual intrinsic execution at the symbol level; the struct-field types (__m256 / __m256i) are core::arch declarations that don't require AVX/AVX2 at the type level. Downstream consumers building this crate with default x86_64 flags or x86-64-v2 now keep their type exports.


Generated by Claude Code

pub use crate::simd_avx2::{
f32x16, f64x8, i16x32, i32x16, i64x8, i8x64, u32x16, u64x8, u8x64, F32Mask16, F32x16, F64Mask8, F64x8, I16x32,
I32x16, I64x8, I8x64, U16x32, U32x16, U64x8, U8x64,
Expand Down
Loading