[pull] master from ggml-org:master by pull[bot] · Pull Request #95 · CrazyForks/llama.cpp

pull · 2026-05-25T09:42:30Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* ci : remove tag from build-self-hosted.yml * ci : slim -> self-hosted * ci : prevent heavy CPU jobs from running on fast runners * ci : prevent cmake pkg to run on dedicated fast runners * ci : try to bump 3.11 -> 3.13 * ci : move lint back to 3.11 * ci : back to 3.11 * ci : add comment about UI jobs * ci : move python requirements check to CPU runners this job is a bit slow for a dedicated "fast" runner * ci : add self-hosted ui workflow * ci : fix UI naming * tmp to check if arm64 fast is compatible with all jobs * revert last commit

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* common : add common_chat_split_by_role * cont : fix spans to reach end of message * server: fix checkpoints creation - extract message_spans from chat templates - find the prompt token position before the latest user message - split prompt batching at that position - create a context checkpoint before the latest user input - avoid periodic mid-prompt checkpoints when that position is known - handle multimodal prompts when mapping text/template positions to server prompt tokens - add --checkpoint-min-step to control minimum spacing between checkpoints * cont : clean-up * Support autoparser detection for message barriers * server: fix message span delimiter and update docs --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>

* ui: media attachments before text * fix prettier formatting

- Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl. - Move the OpenMP detection from ggml-cpu to ggml-base. - Update OpenMP dependencies in ggml-config.cmake.in.

…nce (#23520)

* fix(action): update SpacemiT toolchain URL and version Change-Id: If4cc1c738a855274103f8c3ad52daa33528acd0c * fix(action): add -L flag to curl command for URL redirection Change-Id: I9b6c37390f0c7a733a36308c8fb53d22d234ab06

* ggml: implement `gguf_init_from_buffer` * test: `gguf_init_from_buffer` * fix: memory breakdown for a model loaded with `no_alloc` from a file is consistent with being loaded from a buffer * fix: use `GGML_UNUSED` Co-authored-by: Copilot <copilot@github.com> * fix: remove `total_size` from `gguf_reader` * fix: file offset calculation, rename `offset` to `data_offset` Co-authored-by: Copilot <copilot@github.com> * refactor: extract model loader bug fixes to another PR * feat: add `gguf_init_from_callback` * fix: always require a max expected size * fix: change `gguf_reader_callback_t`'s `output` type to `void *`, change `max_expected_size` and offsets to `uint64_t` * fix: harden against offset overflow in buffer read * fix: remove seek behavior from the callback * feat: `max_chunk_read == 0` means `SIZE_MAX` * fix: seeking in a gguf file with no tensors --------- Co-authored-by: Copilot <copilot@github.com>

* TP: fix ggml context size calculation, memory leak * move split state cache back into the context * revert to constant ggml context size for cgraphs * increase headroom for statically allocated tensors * remove obsolete include

…ggml/1492)

ggerganov and others added 17 commits May 25, 2026 08:11

perplexity : fix even more integer overflows (#23623)

6d57c26

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

vendor : update cpp-httplib to 0.45.1 (#23639)

9627d0f

ui: media attachments before text (#23467)

b964876

* ui: media attachments before text * fix prettier formatting

ggml : Parallelize quant LUT init (#23595)

826539c

- Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl. - Move the OpenMP detection from ggml-cpu to ggml-base. - Update OpenMP dependencies in ggml-config.cmake.in.

ci : install host compiler on android-ndk build (#23630)

d55fb97

llama : document that only one on-device state can be saved per seque…

314e729

…nce (#23520)

ci : fix pre-tokenizer-hashes check (#23651)

062d311

server: MTP layer kv-cache should respect draft type ctk (#23646)

6c4cbdc

TP: fix ggml context size calculation (#22616)

ae251b5

* TP: fix ggml context size calculation, memory leak * move split state cache back into the context * revert to constant ggml context size for cgraphs * increase headroom for statically allocated tensors * remove obsolete include

ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (…

fa97041

…ggml/1492)

ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500)

b251f74

ggml : bump version to 0.12.1 (ggml/1508)

ce5890b

sync : ggml

22307b3

pull Bot locked and limited conversation to collaborators May 25, 2026

pull Bot added the ⤵️ pull label May 25, 2026

pull Bot merged commit 22307b3 into CrazyForks:master May 25, 2026
22 of 54 checks passed

github-actions Bot added documentation Improvements or additions to documentation testing examples python server ggml devops script server/ui labels May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#95

[pull] master from ggml-org:master#95
pull[bot] merged 17 commits into
CrazyForks:masterfrom
ggml-org:master

pull Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Conversation

pull Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

pull Bot commented May 25, 2026 •

edited

Loading