Commit 2974f4b
committed
ggml-ve: vectorize the YaRN NeoX rope kernel (prompt-eval 1.48x)
ftrace on a 330-token prompt showed the rope was 32% of prompt-eval and
fully SCALAR (V.OP 0%): the per-element ve_rope_yarn() function call blocked
vectorization, and the theta *= theta_scale recurrence serialized the inner
loop (~30M scalar calls).
Restructure ve_rope_neox_hbm_omp_nocache: precompute theta_scale^i once
(breaks the recurrence so theta[i] is independent), fold YaRN into a
branchless form (ramp_mix == 0 when ext_factor == 0, magnitude scale
precomputed), and drop the function call. The inner loop is then call-free
and branch-free, so NCC vectorizes cos/sin via libsysve.
Result: the rope goes V.OP 0% -> 98.4%, from ~32% of prompt-eval to 0.1%;
interpreter prompt-eval 8.54 -> 12.67 tok/s (1.48x) on Bonsai-8B-VEBP, output
still correct (YaRN math unchanged, just reorganized). Helps every prompt and
will carry into the compiled N>1 path. Not pushed.1 parent 0fb4f29 commit 2974f4b
1 file changed
Lines changed: 24 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11647 | 11647 | | |
11648 | 11648 | | |
11649 | 11649 | | |
| 11650 | + | |
| 11651 | + | |
| 11652 | + | |
| 11653 | + | |
| 11654 | + | |
| 11655 | + | |
| 11656 | + | |
| 11657 | + | |
| 11658 | + | |
| 11659 | + | |
| 11660 | + | |
| 11661 | + | |
11650 | 11662 | | |
11651 | 11663 | | |
11652 | 11664 | | |
| |||
11656 | 11668 | | |
11657 | 11669 | | |
11658 | 11670 | | |
11659 | | - | |
| 11671 | + | |
11660 | 11672 | | |
11661 | 11673 | | |
11662 | 11674 | | |
11663 | 11675 | | |
11664 | 11676 | | |
11665 | 11677 | | |
11666 | 11678 | | |
11667 | | - | |
| 11679 | + | |
11668 | 11680 | | |
11669 | 11681 | | |
11670 | | - | |
11671 | | - | |
11672 | | - | |
| 11682 | + | |
| 11683 | + | |
| 11684 | + | |
| 11685 | + | |
| 11686 | + | |
| 11687 | + | |
| 11688 | + | |
| 11689 | + | |
11673 | 11690 | | |
11674 | 11691 | | |
11675 | 11692 | | |
11676 | 11693 | | |
11677 | | - | |
11678 | | - | |
11679 | | - | |
11680 | | - | |
| 11694 | + | |
| 11695 | + | |
11681 | 11696 | | |
11682 | 11697 | | |
11683 | 11698 | | |
| |||
0 commit comments