Commit bab9f2e
authored
Merge pull request #83 from AdaWorldAPI/claude/setup-embedding-pipeline-Fa65C
docs: clarify VNNI dispatch tiers — F32x16 is the floor, no scalar on x86
avx512vnni (64 MACs) and avxvnniint8 (32 MACs) are mutually exclusive
by hardware generation. The scalar i32 path in matvec_dispatch only
exists for non-x86 correctness. On x86, the thinking engine dispatches
to F32x16 FMA (16 MACs) when no VNNI is detected — never reaches
the scalar path.
https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp1 file changed
Lines changed: 14 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
201 | 201 | | |
202 | 202 | | |
203 | 203 | | |
204 | | - | |
| 204 | + | |
205 | 205 | | |
206 | | - | |
207 | | - | |
208 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
209 | 217 | | |
210 | 218 | | |
211 | 219 | | |
| |||
223 | 231 | | |
224 | 232 | | |
225 | 233 | | |
| 234 | + | |
| 235 | + | |
226 | 236 | | |
227 | 237 | | |
228 | 238 | | |
| |||
0 commit comments