Skip to content

Sync master with upstream release b9204#519

Merged
jan-service-account merged 14 commits into
devfrom
update-dev-from-master-2026-05-18-01-12
May 18, 2026
Merged

Sync master with upstream release b9204#519
jan-service-account merged 14 commits into
devfrom
update-dev-from-master-2026-05-18-01-12

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b9204) from ggml-org/llama.cpp

foldl and others added 14 commits May 17, 2026 02:13
* ngram : reduce noisy logs

* ngram : reduce noisy logs
The --embd-normalize flag was registered only for the embedding and debug
examples, so llama-server rejected it and the /embedding handler used a
hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's
example set and read params.embd_normalize as the handler's default. The
per-request "embd_normalize" body field continues to override.
* common : remove atomic from json arguments

* common : remove parsing logic on JSON arguments
* ci/run: set explicit SPIR-V Headers search path for macOS vulkan CI

For whatever reason, the files are under additional sub-path
`vulkan/` under the cmake directory, which does not match either
current LunarG macOS Vulkan SDK structure (`lib/cmake/SPIRV-Headers`),
nor what gets installed when you run the cmake build+install for
SPIRV-Headers itself on at least Linux (`share/cmake/SPIRV-Headers`).

This allows for SPIRV-Headers to be found, as currently the CI
runner's setup does not seem to include the relevant path in
list of search locations.

* ggml-vulkan/CMakeLists: add a check for SPIRV-Headers

This is installed by the project if it is built and installed.
Receiving an error during the configuration step is generally
preferred to receiving an error in the middle of a build.
…ers (ggml-org#23089)

* common : delegate assistant continuation to template handler

* server : implement echo parameter to exclude assistant prefill in the response

* server : fix tests for prefill

* server : use existing llama template

* cont : clean up
* llama: avoid copying logits during prompt decode in MTP

* review: update comment

* llama-graph: call set_output for t_h_pre_norm
Branch: ModalityConditionalAdapters
AI-usage: none
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@jan-service-account jan-service-account merged commit 8168daf into dev May 18, 2026
14 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-05-18-01-12 branch May 18, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.