[pull] master from ggml-org:master by pull[bot] · Pull Request #113 · CrazyForks/llama.cpp

pull · 2026-05-30T09:42:24Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* server: in SSE mode, send HTTP headers when slot starts * ref to pr * stream should be false by default

After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made `model->devices` non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified memory), this caused all tensors to be allocated on the RPC peer alone and model loading to fail. Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer suppress the local iGPU. closes: #23858

* ci : ios use macos-15 again * ci : add and test ccache-clear * cont : fix * cont : set permission * cont : another permission * cont : token * cont : print key * cont : bring back perms * cont : test windows * cont : add token * cont : cleanup * ci : make release jobs clean-up their ccache

* ci : fix s390x release job * ci : multi-thread build for `ios-xcode` * ocd : names

* vulkan: add flash attention bf16 kv support * vulkan: bf16 FA coopmat1 support * vulkan: bf16 FA coopmat2 support * fix FA bf16 f32 fallback * fix FA bf16 coopmat1 shader * fix FA bf16 coopmat2 shader * code cleanup * cleanup comment change * address feedback * add O_TYPE for cm2 FA * use O_TYPE for gqaStore function * reduce BFLOAT16 ifdefs

* loongarch : optimize LSX fp16 load/store with native intrinsics Use __lsx_vfcvtl_s_h and __lsx_vfcvt_h_s instead of scalar loops in __lsx_f16x4_load and __lsx_f16x4_store. * loongarch : add LSX implementation for q8_0 dot product * loongarch : add LSX implementation for q6_K dot product * loongarch : add LSX implementation for iq4_xs dot product * Improve reduce ops when sun int16 pairs to int32

ngxson and others added 6 commits May 30, 2026 00:06

server: in SSE mode, send HTTP headers when slot starts (#23884)

0821c5f

* server: in SSE mode, send HTTP headers when slot starts * ref to pr * stream should be false by default

ci : fix s390x release job (#23898)

3375285

* ci : fix s390x release job * ci : multi-thread build for `ios-xcode` * ocd : names

pull Bot locked and limited conversation to collaborators May 30, 2026

pull Bot added the ⤵️ pull label May 30, 2026

pull Bot merged commit d48a56e into CrazyForks:master May 30, 2026
11 of 32 checks passed

github-actions Bot added examples server ggml Vulkan devops labels May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#113

[pull] master from ggml-org:master#113
pull[bot] merged 6 commits into
CrazyForks:masterfrom
ggml-org:master

pull Bot commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pull Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pull Bot commented May 30, 2026 •

edited

Loading