Sync master with upstream release b9066#509
Merged
jan-service-account merged 17 commits intoMay 8, 2026
Merged
Conversation
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
The error:
./examples/sycl/test.sh: line 122: level_zero:${$GGML_SYCL_DEVICE}: bad
substitution
was thrown whenever the user used this command:
./examples/sycl/test.sh -mg 0
Fix is to get rid of a dollar sign.
…gml-org#22773) * add fill-mode-forwards * generated diffs
* codeowners : add ZenDNN backend codeowner * codeowners : fix zendnn owners to use individual github handles
* webui: fix ?model= URL param race in router mode * chore: update webui build output
* add mimo-v2.5 support * mimo-v2.5: fix modify_tensors row split * mimi-v2.5: forgot `add_attn_value_scale` plumbing * mimi-v2.5: fix tp dequant to detect tp rows * mimo-v2.5: fix TP iteration to be descending * mimo-v2.5: fix comment * mimo-v2.5: retain fused qkv * mimo-v2.5: missed the attn_value scale during merge * mimo-v2.5: fused QKV needs contiguous for scaling attention value * mimo-v2.5: move `speech_embeddings.` to TextModel filter_tensors * Update src/llama-hparams.h Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * mimo-v2.5: include MTP weights in gguf --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Write a readme on Multi-GPU usage in llama.cpp * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Address review comments * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
…gml-org#22149) * sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET Signed-off-by: Chun Tao <chun.tao@intel.com> * Fix abort during test-backend-ops Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> * Regenerate ops.md Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> * Add scope_dbg_print to newly added SYCL ops. Also add scope_dbg_print to existing ssm_conv op. Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> --------- Signed-off-by: Chun Tao <chun.tao@intel.com> Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> Co-authored-by: Chun Tao <chun.tao@intel.com> Co-authored-by: Todd Malsbary <todd.malsbary@intel.com>
…ml-org#22794) * tests : add long-seq + tail cases for gated_delta_net * tests : realistic input ranges for gated_delta_net
* webui: add LLM title generation option * webui: use chat_template_kwargs for title gen + fix conversation check * webui: capture firstUserMessage before async streamChatCompletion to fix race condition * webui: extract LLM title generation into separate method * webui: use constants and ChatService for LLM generated titles * webui: rebuild static output * webui: add LLM title generation setting to new settings location * webui: use sendMessage in generateTitle * webui: rebuild static output * webui: fix formatting * webui: configurable title prompt, remove think tag regexes, fix TS error * webui: group title constants into TITLE object, use TruncatedText for CSS truncation and fix race condition * webui: rebuild static output
…org#22651) * CUDA: batch out_prod inner loop with cublasSgemmStridedBatched * CUDA: batch out_prod inner loop with cublasSgemmStridedBatched * CUDA: add cublasSgemmStridedBatched mapping for HIP and MUSA backends
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates dev branch with latest release (b9066) from ggml-org/llama.cpp