[pull] master from ggml-org:master by pull[bot] · Pull Request #100 · CrazyForks/llama.cpp

pull · 2026-05-26T15:42:37Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* ci : reduce [no ci] * cont : disable sycl, cann + rename caches [no ci] * cont : cann [no ci]

* hexagon: add support for CONCAT with optimized concat_2d_transposed qwen3.5 models are quite heavy on the CONCAT with large and transposed src1. * hex-concat: use fastdiv in generic version * hex-concat: make checks for transposed a bit more readable * hex-concat: reoder dma ops for better pipelining * hex-cont/cpy: optimize CPY and CONT ops The primary change is to avoid scalar divs in the inner loops. We were calling hvx_copy_uu(... type_size) where type_size is non a constexpr. This causes runtime divs by that value which is normally just 4 or 2 (f32/f16). * hex-get-rows: optimize GET_ROWS for large rows We now use DMA for larger rows and also split them into chunks to improve perf for Qwen3.5 and other models that do lots of GET_ROWS with huge (2MB+ rows). Also bump the DMA queue depth now that we can take advantage of it. * hex-concat: unroll the inner loops of concat_2d * hex-concat: more updates to concat_2d to improve perf a bit further * hex-cpy: fixed n_rows per thread checks in the copy ops * hmx-fa: fix alignment issues while computing dma sizes * hex-set-rows: add early returns for idle threads * hvx-rope: minor optimization to replace loops with fastdiv logic * hex-rope: replace scalar tail processing with HVX * hex-rope: optimize rope cache init with HVX Add hvx-utils sin/cos helpers that use an aprox method (similar to rsqrt, inverse, etc) Use the helpers to optimize ROPE.

* ci : remove vulkan dep from webgpu build * cont : add ccache to `ubuntu-24-webgpu-wasm` * ci : fix name + add wasm test

* vulkan: add CONV_SHAPE_64x128 for medium-K conv2d * vulkan: skip conv2d bounds checks when shapes align with tile sizes * vulkan: use WG_SIZE=128 for CONV_SHAPE_64x32 conv2d * vulkan: stage cm2 conv2d accumulator through shmem before global store * vulkan: add coopmat1 conv2d path * fallback when using too much shared memory. clean up comments * Require 16x16x16 and subgroup size 32 or 64 * check whether shared memory is sufficient before overwriting conv2d params with coopmat1 values

ggerganov and others added 7 commits May 26, 2026 15:21

ci : reduce (disable SYCL and CANN builds/releases) (#23705)

3dc7684

* ci : reduce [no ci] * cont : disable sycl, cann + rename caches [no ci] * cont : cann [no ci]

ci : move sanitizer jobs to self-hosted runners (#23713)

ef41a69

ci : move more CPU jobs to self-hosted runners (#23715)

678d43d

ci : remove vulkan SDK dep from webgpu job (#23718)

3a3ed15

* ci : remove vulkan dep from webgpu build * cont : add ccache to `ubuntu-24-webgpu-wasm` * ci : fix name + add wasm test

ci : move macos jobs to the apple workflow + fix names (#23721)

5190c2e

pull Bot locked and limited conversation to collaborators May 26, 2026

pull Bot added the ⤵️ pull label May 26, 2026

pull Bot merged commit 5190c2e into CrazyForks:master May 26, 2026

github-actions Bot added testing ggml Vulkan devops Hexagon labels May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#100

[pull] master from ggml-org:master#100
pull[bot] merged 7 commits into
CrazyForks:masterfrom
ggml-org:master

pull Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pull Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pull Bot commented May 26, 2026 •

edited

Loading