Commit 66c4f9d
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (#21168)
* ds_read_b128 for q4_0 and q4_1 mmq kernels
Current for loop generates ds_read_b32 instructions with hip compiler, the new solution generates ds_read_b128 instructions for the same operation, saving some LDS bandwidth. Tested on MI50 and RX6800XT, its faster on both.
* Vectorized lds load update: used ggml_cuda_get_max_cpy_bytes and ggml_cuda_memcpy_1 functions for generic implementation
* Explicit for loop in mmq, renamed vec into tmp
* Fixed max_cpy usage in the loading loop
* Fixed typo in q4_1 kernel
* Update ggml/src/ggml-cuda/mmq.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/mmq.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/mmq.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Renoved trailing white line 500
* Update mmq.cuh removed other whitelines
* Remove trailing whitespaces
---------
Co-authored-by: iacopPBK <iacopPBK@users.noreply.github.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: iacopPBK <iacop@deneb.com>1 parent 93bdc61 commit 66c4f9d
1 file changed
Lines changed: 27 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
386 | 386 | | |
387 | 387 | | |
388 | 388 | | |
389 | | - | |
390 | 389 | | |
391 | 390 | | |
392 | 391 | | |
393 | 392 | | |
394 | | - | |
395 | | - | |
396 | | - | |
397 | | - | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
398 | 403 | | |
399 | 404 | | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
400 | 408 | | |
401 | 409 | | |
402 | 410 | | |
| |||
489 | 497 | | |
490 | 498 | | |
491 | 499 | | |
492 | | - | |
493 | 500 | | |
494 | 501 | | |
495 | 502 | | |
496 | 503 | | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
501 | 514 | | |
502 | 515 | | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
503 | 519 | | |
504 | 520 | | |
505 | 521 | | |
| |||
4170 | 4186 | | |
4171 | 4187 | | |
4172 | 4188 | | |
| 4189 | + | |
0 commit comments