Commit 0d22e44
feat(quantized): VNNI INT8 GEMM via VPDPBUSD (#128, sprint W3-C)
Closes parity item 12 — INT8 GEMM accelerated via AVX-512 VNNI's VPDPBUSD
instruction (4-element u8×i8→i32 dot product). Falls back to scalar
int8_gemm_i32 on hardware without VNNI.
What ships:
- src/hpc/vnni_gemm.rs (387 LOC): int8_gemm_vnni public API,
has_vnni() detection, _mm512_dpbusd_epi32 inner kernel, scalar fallback
- src/hpc/simd_caps.rs: avx512vnni: bool field added to SimdCaps,
is_x86_feature_detected!("avx512vnni") detection wired
- src/hpc/mod.rs: pub mod vnni_gemm declaration
Hardware coverage:
- AVX-512 VNNI: Ice Lake, Sapphire Rapids, Zen 4 (with AVX-512), Tiger Lake
- Fallback: any x86_64 / ARM / scalar
Tests: 11 passing (4×4, 16×16, 17×17 tail, 1×1 edge, mixed values).
Total lib tests: 1817+ pass.
Note: type-cast fix applied to _mm512_loadu_si512 / _mm512_storeu_si512
(*const i32 → *const __m512i, *mut i32 → *mut __m512i) per Rust 1.94
intrinsic signatures.
https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Co-authored-by: Claude <noreply@anthropic.com>1 parent 0c30fe2 commit 0d22e44
3 files changed
Lines changed: 401 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
411 | 411 | | |
412 | 412 | | |
413 | 413 | | |
| 414 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
44 | 47 | | |
45 | 48 | | |
46 | 49 | | |
| |||
82 | 85 | | |
83 | 86 | | |
84 | 87 | | |
| 88 | + | |
85 | 89 | | |
86 | 90 | | |
87 | 91 | | |
| |||
107 | 111 | | |
108 | 112 | | |
109 | 113 | | |
| 114 | + | |
110 | 115 | | |
111 | 116 | | |
112 | 117 | | |
| |||
129 | 134 | | |
130 | 135 | | |
131 | 136 | | |
| 137 | + | |
132 | 138 | | |
133 | 139 | | |
134 | 140 | | |
| |||
150 | 156 | | |
151 | 157 | | |
152 | 158 | | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
153 | 166 | | |
154 | 167 | | |
155 | 168 | | |
| |||
0 commit comments