Commit 225088e
authored
sycl: Improve mul_mat_id memory efficiency and add BF16 fast path (ggml-org#22119)
* sycl: size mul_mat_id staging buffers by routed rows
Previously src1_contiguous/dst_contiguous in ggml_sycl_mul_mat_id were
sized to ggml_nelements(src1/dst), which over-allocates when ne12 > 1
and can fail with UR_RESULT_ERROR_OUT_OF_HOST_MEMORY on Level Zero for
MoE models (notably with --cpu-moe). Size them by the actual number of
routed rows (ids->ne[1] * n_ids) instead.
* sycl: add bf16 mul_mat fast path via DNNL
When src0 is BF16 (commonly the case for lm_head / output.weight), the
existing f16 path is skipped because bf16 isn't covered, and the f32
fallback dequantizes the entire src0 slab to f32 in a single pool alloc
(row_diff*ne00 floats). For large-vocab models this can reach several
GB and fail with UR_RESULT_ERROR_OUT_OF_HOST_MEMORY on Level Zero.
Add a bf16xbf16 -> f32 DNNL matmul fast path that uses the bf16 storage
in place and only materializes a small src1 bf16 conversion buffer. bf16
matmul accumulates in f32, so it's correct even when the op requests
GGML_PREC_F32 (as lm_head does).
- gemm.hpp: map bfloat16 to dnnl::memory::data_type::bf16.
- convert.{hpp,cpp}: expose ggml_get_to_bf16_sycl for f32/f16/bf16 -> bf16.
- ggml-sycl.cpp: take the bf16 path early in ggml_sycl_op_mul_mat_sycl
when DNNL and GGML_SYCL_HAS_BF16 are both available.1 parent 82d3f4d commit 225088e
6 files changed
Lines changed: 70 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
31 | 38 | | |
32 | 39 | | |
33 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | 5 | | |
13 | 6 | | |
14 | 7 | | |
| |||
767 | 760 | | |
768 | 761 | | |
769 | 762 | | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
770 | 779 | | |
771 | 780 | | |
772 | 781 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
26 | 31 | | |
27 | 32 | | |
28 | 33 | | |
| |||
35 | 40 | | |
36 | 41 | | |
37 | 42 | | |
| 43 | + | |
38 | 44 | | |
39 | 45 | | |
40 | 46 | | |
41 | 47 | | |
| 48 | + | |
42 | 49 | | |
43 | 50 | | |
| 51 | + | |
44 | 52 | | |
45 | 53 | | |
46 | 54 | | |
| 55 | + | |
47 | 56 | | |
48 | 57 | | |
49 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
32 | 35 | | |
33 | 36 | | |
34 | 37 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2176 | 2176 | | |
2177 | 2177 | | |
2178 | 2178 | | |
| 2179 | + | |
| 2180 | + | |
| 2181 | + | |
| 2182 | + | |
| 2183 | + | |
| 2184 | + | |
| 2185 | + | |
| 2186 | + | |
| 2187 | + | |
| 2188 | + | |
| 2189 | + | |
| 2190 | + | |
| 2191 | + | |
| 2192 | + | |
| 2193 | + | |
| 2194 | + | |
| 2195 | + | |
| 2196 | + | |
| 2197 | + | |
| 2198 | + | |
| 2199 | + | |
| 2200 | + | |
| 2201 | + | |
| 2202 | + | |
| 2203 | + | |
2179 | 2204 | | |
2180 | 2205 | | |
2181 | 2206 | | |
| |||
3848 | 3873 | | |
3849 | 3874 | | |
3850 | 3875 | | |
3851 | | - | |
3852 | | - | |
| 3876 | + | |
| 3877 | + | |
| 3878 | + | |
3853 | 3879 | | |
3854 | 3880 | | |
3855 | 3881 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
8 | 12 | | |
9 | 13 | | |
10 | 14 | | |
| |||
181 | 185 | | |
182 | 186 | | |
183 | 187 | | |
| 188 | + | |
184 | 189 | | |
185 | 190 | | |
186 | 191 | | |
| |||
193 | 198 | | |
194 | 199 | | |
195 | 200 | | |
| 201 | + | |
196 | 202 | | |
197 | 203 | | |
198 | 204 | | |
| |||
0 commit comments