Skip to content

Commit 0d22e44

Browse files
AdaWorldAPIclaude
andauthored
feat(quantized): VNNI INT8 GEMM via VPDPBUSD (#128, sprint W3-C)
Closes parity item 12 — INT8 GEMM accelerated via AVX-512 VNNI's VPDPBUSD instruction (4-element u8×i8→i32 dot product). Falls back to scalar int8_gemm_i32 on hardware without VNNI. What ships: - src/hpc/vnni_gemm.rs (387 LOC): int8_gemm_vnni public API, has_vnni() detection, _mm512_dpbusd_epi32 inner kernel, scalar fallback - src/hpc/simd_caps.rs: avx512vnni: bool field added to SimdCaps, is_x86_feature_detected!("avx512vnni") detection wired - src/hpc/mod.rs: pub mod vnni_gemm declaration Hardware coverage: - AVX-512 VNNI: Ice Lake, Sapphire Rapids, Zen 4 (with AVX-512), Tiger Lake - Fallback: any x86_64 / ARM / scalar Tests: 11 passing (4×4, 16×16, 17×17 tail, 1×1 edge, mixed values). Total lib tests: 1817+ pass. Note: type-cast fix applied to _mm512_loadu_si512 / _mm512_storeu_si512 (*const i32 → *const __m512i, *mut i32 → *mut __m512i) per Rust 1.94 intrinsic signatures. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj Co-authored-by: Claude <noreply@anthropic.com>
1 parent 0c30fe2 commit 0d22e44

3 files changed

Lines changed: 401 additions & 0 deletions

File tree

src/hpc/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -411,3 +411,4 @@ mod e2e_tests {
411411
assert!(bnn_result.score > -1.0 && bnn_result.score < 1.0);
412412
}
413413
}
414+
pub mod vnni_gemm;

src/hpc/simd_caps.rs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ pub struct SimdCaps {
4141
pub sse2: bool,
4242
/// FMA (fused multiply-add).
4343
pub fma: bool,
44+
/// AVX-512 VNNI (VPDPBUSD — u8×i8→i32 dot product of 4-element groups).
45+
/// Present on Ice Lake, Sapphire Rapids, Zen 4 (with AVX-512), Tiger Lake.
46+
pub avx512vnni: bool,
4447

4548
// ── aarch64 (ARM) ──
4649
/// NEON 128-bit SIMD (mandatory on aarch64, always true).
@@ -82,6 +85,7 @@ impl SimdCaps {
8285
sse41: is_x86_feature_detected!("sse4.1"),
8386
sse2: is_x86_feature_detected!("sse2"),
8487
fma: is_x86_feature_detected!("fma"),
88+
avx512vnni: is_x86_feature_detected!("avx512vnni"),
8589
// ARM fields: all false on x86
8690
neon: false,
8791
asimd_dotprod: false,
@@ -107,6 +111,7 @@ impl SimdCaps {
107111
sse41: false,
108112
sse2: false,
109113
fma: false,
114+
avx512vnni: false,
110115
// ARM fields: runtime detection
111116
neon: true, // mandatory on aarch64
112117
asimd_dotprod: std::arch::is_aarch64_feature_detected!("dotprod"),
@@ -129,6 +134,7 @@ impl SimdCaps {
129134
sse41: false,
130135
sse2: false,
131136
fma: false,
137+
avx512vnni: false,
132138
neon: false,
133139
asimd_dotprod: false,
134140
fp16: false,
@@ -150,6 +156,13 @@ impl SimdCaps {
150156
self.avx512bw && self.avx512vpopcntdq
151157
}
152158

159+
/// True if AVX-512 VNNI is available (VPDPBUSD on zmm registers).
160+
/// Present on Ice Lake, Tiger Lake, Sapphire Rapids, Zen 4.
161+
#[inline(always)]
162+
pub fn has_avx512_vnni(self) -> bool {
163+
self.avx512f && self.avx512vnni
164+
}
165+
153166
// ── ARM convenience methods ──
154167

155168
/// True if running on aarch64 with NEON (always true on aarch64).

0 commit comments

Comments
 (0)