Skip to content

Sync master with upstream release b9066#509

Merged
jan-service-account merged 17 commits into
devfrom
update-dev-from-master-2026-05-08-01-04
May 8, 2026
Merged

Sync master with upstream release b9066#509
jan-service-account merged 17 commits into
devfrom
update-dev-from-master-2026-05-08-01-04

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b9066) from ggml-org/llama.cpp

angt and others added 17 commits May 7, 2026 08:24
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
The error:
./examples/sycl/test.sh: line 122: level_zero:${$GGML_SYCL_DEVICE}: bad
substitution

was thrown whenever the user used this command:
./examples/sycl/test.sh -mg 0

Fix is to get rid of a dollar sign.
* codeowners : add ZenDNN backend codeowner

* codeowners : fix zendnn owners to use individual github handles
* webui: fix ?model= URL param race in router mode

* chore: update webui build output
* add mimo-v2.5 support

* mimo-v2.5: fix modify_tensors row split

* mimi-v2.5: forgot `add_attn_value_scale` plumbing

* mimi-v2.5: fix tp dequant to detect tp rows

* mimo-v2.5: fix TP iteration to be descending

* mimo-v2.5: fix comment

* mimo-v2.5: retain fused qkv

* mimo-v2.5: missed the attn_value scale during merge

* mimo-v2.5: fused QKV needs contiguous for scaling attention value

* mimo-v2.5: move `speech_embeddings.` to TextModel filter_tensors

* Update src/llama-hparams.h

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/models/mimo2.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/models/mimo2.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/models/mimo2.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* mimo-v2.5: include MTP weights in gguf

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Write a readme on Multi-GPU usage in llama.cpp

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
…gml-org#22149)

* sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET

Signed-off-by: Chun Tao <chun.tao@intel.com>

* Fix abort during test-backend-ops

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

* Regenerate ops.md

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

* Add scope_dbg_print to newly added SYCL ops.

Also add scope_dbg_print to existing ssm_conv op.

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

---------

Signed-off-by: Chun Tao <chun.tao@intel.com>
Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
Co-authored-by: Chun Tao <chun.tao@intel.com>
Co-authored-by: Todd Malsbary <todd.malsbary@intel.com>
…ml-org#22794)

* tests : add long-seq + tail cases for gated_delta_net

* tests : realistic input ranges for gated_delta_net
* webui: add LLM title generation option

* webui: use chat_template_kwargs for title gen + fix conversation check

* webui: capture firstUserMessage before async streamChatCompletion to fix race condition

* webui: extract LLM title generation into separate method

* webui: use constants and ChatService for LLM generated titles

* webui: rebuild static output

* webui: add LLM title generation setting to new settings location

* webui: use sendMessage in generateTitle

* webui: rebuild static output

* webui: fix formatting

* webui: configurable title prompt, remove think tag regexes, fix TS error

* webui: group title constants into TITLE object, use TruncatedText for CSS truncation and fix race condition

* webui: rebuild static output
…org#22651)

* CUDA: batch out_prod inner loop with cublasSgemmStridedBatched

* CUDA: batch out_prod inner loop with cublasSgemmStridedBatched

* CUDA: add cublasSgemmStridedBatched mapping for HIP and MUSA backends
@jan-service-account jan-service-account merged commit 3bfd8b3 into dev May 8, 2026
9 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-05-08-01-04 branch May 8, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.