Commit e623d39
committed
cuda: add f16 fallbacks to half FA quant builds
HALF now compiles the K>=V half-matrix plus all K/f16 fallback pairs
needed by TurboQuant/TCQ dequant paths: 103 pairs instead of 91.
The previous literal K>=V rule missed reversed K/f16 pairs like
q8_0/f16 that arise when Turbo/TCQ decode-dequant dispatches an
internal f16 transient for the V side. ALL works because it includes
every pair; HALF now covers those too.
Changes:
- CMake (CUDA/HIP/MUSA): full nested loop with f16 predicate and
count assertion (must equal 103)
- fattn.cu: runtime ggml_cuda_fattn_pair_compiled() HALF branch
adds type_K/type_V == F16 fallback check
- fattn.cu: regenerated HALF dispatch block with 12 new K/f16 pairs
- gen-fattn-vec-dispatch.py: f16 fallback predicate, assertions for
ALL=169, HALF=103, DEFAULT=27
- Docs/CI: make HALF the recommended build flag, ALL the escape hatch1 parent a596a5c commit e623d39
14 files changed
Lines changed: 88 additions & 35 deletions
File tree
- .devops
- .github/workflows
- docs
- ggml
- src
- ggml-cuda
- ggml-hip
- ggml-musa
- scripts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
496 | 496 | | |
497 | 497 | | |
498 | 498 | | |
499 | | - | |
| 499 | + | |
500 | 500 | | |
501 | 501 | | |
502 | 502 | | |
| |||
700 | 700 | | |
701 | 701 | | |
702 | 702 | | |
703 | | - | |
| 703 | + | |
704 | 704 | | |
705 | 705 | | |
706 | 706 | | |
| |||
1101 | 1101 | | |
1102 | 1102 | | |
1103 | 1103 | | |
1104 | | - | |
| 1104 | + | |
1105 | 1105 | | |
1106 | 1106 | | |
1107 | 1107 | | |
| |||
1226 | 1226 | | |
1227 | 1227 | | |
1228 | 1228 | | |
1229 | | - | |
| 1229 | + | |
1230 | 1230 | | |
1231 | 1231 | | |
1232 | 1232 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
346 | | - | |
| 346 | + | |
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
351 | 351 | | |
352 | | - | |
| 352 | + | |
353 | 353 | | |
354 | 354 | | |
355 | 355 | | |
| |||
358 | 358 | | |
359 | 359 | | |
360 | 360 | | |
361 | | - | |
| 361 | + | |
362 | 362 | | |
363 | 363 | | |
364 | 364 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
297 | 297 | | |
298 | 298 | | |
299 | 299 | | |
300 | | - | |
301 | | - | |
| 300 | + | |
| 301 | + | |
302 | 302 | | |
303 | 303 | | |
304 | 304 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
| 74 | + | |
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
| 74 | + | |
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
208 | | - | |
209 | | - | |
| 208 | + | |
| 209 | + | |
210 | 210 | | |
211 | 211 | | |
212 | 212 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
| 155 | + | |
| 156 | + | |
155 | 157 | | |
156 | 158 | | |
157 | 159 | | |
158 | | - | |
| 160 | + | |
159 | 161 | | |
160 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
161 | 167 | | |
162 | 168 | | |
163 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
164 | 174 | | |
165 | 175 | | |
166 | 176 | | |
| |||
0 commit comments