Sync master with upstream release b9204 by jan-service-account · Pull Request #519 · janhq/llama.cpp

jan-service-account · 2026-05-18T01:12:21Z

Updates dev branch with latest release (b9204) from ggml-org/llama.cpp

* ngram : reduce noisy logs * ngram : reduce noisy logs

The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set and read params.embd_normalize as the handler's default. The per-request "embd_normalize" body field continues to override.

* common : remove atomic from json arguments * common : remove parsing logic on JSON arguments

* ci/run: set explicit SPIR-V Headers search path for macOS vulkan CI For whatever reason, the files are under additional sub-path `vulkan/` under the cmake directory, which does not match either current LunarG macOS Vulkan SDK structure (`lib/cmake/SPIRV-Headers`), nor what gets installed when you run the cmake build+install for SPIRV-Headers itself on at least Linux (`share/cmake/SPIRV-Headers`). This allows for SPIRV-Headers to be found, as currently the CI runner's setup does not seem to include the relevant path in list of search locations. * ggml-vulkan/CMakeLists: add a check for SPIRV-Headers This is installed by the project if it is built and installed. Receiving an error during the configuration step is generally preferred to receiving an error in the middle of a build.

…ers (ggml-org#23089) * common : delegate assistant continuation to template handler * server : implement echo parameter to exclude assistant prefill in the response * server : fix tests for prefill * server : use existing llama template * cont : clean up

* llama: avoid copying logits during prompt decode in MTP * review: update comment * llama-graph: call set_output for t_h_pre_norm

Cont of ggml-org#22936, forgot to update one site

Branch: ModalityConditionalAdapters AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

foldl and others added 14 commits May 17, 2026 02:13

webui: support video files as input (ggml-org#22830)

4f13cb7

ngram : reduce noisy logs (ggml-org#23185)

a16cce8

* ngram : reduce noisy logs * ngram : reduce noisy logs

vulkan: fuse SSM_CONV + BIAS + SILU (ggml-org#22653)

3fbadb0

common : enable streaming JSON argument values (ggml-org#23173)

f4cc787

* common : remove atomic from json arguments * common : remove parsing logic on JSON arguments

vulkan: Support unaligned tensors for ROPE (ggml-org#22637)

7ba22c6

vulkan: add cpy bf16 -> f32 pipelines (ggml-org#22677)

fcae601

llama: avoid copying logits during prompt decode in MTP (ggml-org#23198)

3e12fbd

* llama: avoid copying logits during prompt decode in MTP * review: update comment * llama-graph: call set_output for t_h_pre_norm

CUDA: Continue directly including cuda/iterator (ggml-org#23102)

84c6782

Cont of ggml-org#22936, forgot to update one site

cmake : do not install conversion script (ggml-org#23204)

e0de4c2

cmake : fix LLAMA_BUILD_UI logic (ggml-org#23190)

8758904

feat: Support d_conv=15 for ssm-conv.cu (ggml-org#23017)

726704a

Branch: ModalityConditionalAdapters AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

jan-service-account merged commit 8168daf into dev May 18, 2026
14 checks passed

jan-service-account deleted the update-dev-from-master-2026-05-18-01-12 branch May 18, 2026 01:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync master with upstream release b9204#519

Sync master with upstream release b9204#519
jan-service-account merged 14 commits into
devfrom
update-dev-from-master-2026-05-18-01-12

jan-service-account commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

jan-service-account commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants