Sync master with upstream release b9190 by jan-service-account · Pull Request #518 · janhq/llama.cpp

jan-service-account · 2026-05-17T01:10:54Z

Updates dev branch with latest release (b9190) from ggml-org/llama.cpp

…a-ui` / `LLAMA_UI` naming (ggml-org#23064) * webui: Move static build output from `tools/server/public` to `build/ui` directory * refactor: Move to `tools/ui` * refactor: rename CMake variables and preprocessor defines - Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated) - Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated) - Backward compat: old vars auto-forward to new ones with DEPRECATION warning - Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc. - Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET - Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines - Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED * refactor: rename CLI flags (--webui -> --ui) with backward compat - Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases) - Add --ui-config (old --webui-config kept as deprecated alias) - Add --ui-config-file (old --webui-config-file kept as deprecated alias) - Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated) - Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY - C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields - Backward compat: old fields synced to new ones in g_params_to_internals * refactor: update C++ server internals with backward compat - Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta) - Rename params.webui usage -> params.ui (both synced, old still works) - JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys - Server routes use params.ui_mcp_proxy || params.webui_mcp_proxy - Preprocessor guards use #if defined(LLAMA_BUILD_UI) || defined(LLAMA_BUILD_WEBUI) * refactor: rename CI/CD workflows, artifacts, and build script - Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build - Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT - Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks - Update server.yml: job/artifact refs webui-build -> ui-build - Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT - Update server-self-hosted.yml: webui-build -> ui-build - Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION - Rename webui-download.cmake -> ui-download.cmake (internal refs updated) - Update labeler.yml: server/webui -> server/ui path label * docs: update CODEOWNERS and server README docs - Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/ - Update server README.md: CLI tables show --ui flags with deprecated --webui aliases - Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/ * fix: Small fixes for UI build * fix: CMake.txt syntax * chore: Formatting * fix: `.editorconfig` for llama-ui * chore: Formatting * refactor: Use `APP_NAME` in Error route * refactor: Cleanup * refactor: Single migration service * make llama-ui a linkable target * fix: UI Build output * fix: Missing change * fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI * refactor: UI workflows cleanup --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* NvFP4 quantized LM head support Signed-off-by: ynankani <ynankani@nvidia.com> * Address review commnets Signed-off-by: ynankani <ynankani@nvidia.com> * Add assert for NvFp4 lm head and tied embeddings Signed-off-by: ynankani <ynankani@nvidia.com> * Address review commnets Signed-off-by: ynankani <ynankani@nvidia.com> * Create output_s tensor only when LM head NvFp4 Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com>

…g#23134)

…gml-org#23127)

…u trigger and items (ggml-org#22736) * fix tab order on attach button, and dont focus on disabled mennu item * add a11y tests

* Fix handling of MCP resource template parameters * Fix formatting for uri-template.test.ts --------- Co-authored-by: kuba <kuba@laptop.local.net>

* spec: support MTP * fix batch size * rename files * cont : simplify (#7) * MTP: clean-up (#9) * MTP: clean-up * review: use llama_context_type instead of llama_graph_type * review: remove llama_model_has_mtp * review: fix convert issues * convert: fix pycheck * review: formatting * use `mtp-` for identifying mtp models * convert: fix mtp conversion * mtp -> draft-mtp * remove unused llama_arch * add need_embd in speculative * llama: allow partial seq_rm for GDN models for speculative decoding Currently speculative checkpoint needs to restart from a checkpoint after some draft tokens are not accepted, this leads to some wastage in running the target again. This PR adds the ability to rollback upto `draft_max` by storing the GDN intermediates. * fix pending state * vulkan: add GDN partial rollback * meta: extend check to axis 1 * metal: add GDN partial rollback Extend the gated delta net kernel to store intermediate states for partial rollback support on the Metal backend. - Add K (snapshot slot count) as a function constant - Read input state from slot 0 of the 3D state tensor - Write intermediate states to different slots during token loop - For K=1, maintain backward-compatible single-slot behavior Ref: ggml-org@8c05923 Assisted-by: llama.cpp:local pi * delta_net_base: use ggml_pad instead of new_tensor * review: add need_rs_seq * review: rename part_bounded to n_rs * review: deslop comments * review: rename, add asserts * server : adjust checkpoint logic (#11) * server : adjust checkpoint logic * cont : rm asserts * server-context: fix early exit * spec : fix compatibility with n-gram and add TODOs (#13) * metal : cleanup * llama : fix faulty bitwise check in recurrent memory * server : disable RS-based MTP in combination with other spec types * spec : add TODOs * cont : fix comment * cont : update comment * common : fix logic for ngram + mtp compat * llama-memory: enable checkpointing with partial rollback * cont: add test-case for loading into a dirty ctx * llama-memory-recurrent: clear rs_idx in clear * download: fix mtp path * llama-arch: fix enorm op * docs: update docs * conversion: fix type annotations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

In `tools/ui/README.md`, update the relative links, now that the `README.md` file has been moved from `tools/server/webui/` to `tools/ui/`. See ggml-org@59778f0.

That's always how it's done: https://github.com/search?q=path%3ACMakeLists.txt%20%22%24%7BCMAKE_INSTALL_LIBDIR%7D%2Fpkgconfig%22&type=code

…/1477) For a given output position j on the time axis, only input positions i such that i*s0 <= j < i*s0 + K contribute -- i.e. i in [ceil((j - K + 1)/s0), floor(j/s0)] intersected with [0, IL-1]. That's at most ceil(K/s0) values (typically 2 for stride==K/2 transposed convs). The current kernel iterates the full IL range and filters with an `if`, amplifying per-thread work by IL/ceil(K/s0) (~160x for IL=320, K=10, s0=5 -- a representative codec-decoder shape). On Apple M1 the wasted work trips the macOS GPU watchdog (kIOGPUCommandBufferCallbackErrorImpactingInteractivity) on long graphs. Compute i_min, i_max analytically before the inner loop and iterate only [i_min, i_max]. Output is bit-identical (same multiplies and adds in the same order); loop bound shrinks by IL/ceil(K/s0). Tested on M1 with a downstream consumer running a TTS codec at full T_codec; end-to-end codec decode ~3-4x faster, zero watchdog hits across long synthesis runs vs ~30% pre-patch.

* feat: Add request timeout for MCP tool calls in llama-ui * feat: MCP Settings tab with max timeout setting

… primary context (ggml-org#23137)

allozaur and others added 17 commits May 16, 2026 02:02

fix: Add build step using build workflow to publish workflow (ggml-or…

1d9f99a

…g#23134)

ui: untrack settings sync in props effect to prevent reactive loop (g…

366c5e2

…gml-org#23127)

webui : [ChatFormActionAdd][a11y] fix accessibility issues in add men…

1428004

…u trigger and items (ggml-org#22736) * fix tab order on attach button, and dont focus on disabled mennu item * add a11y tests

ui: Fix handling of MCP resource template parameters (ggml-org#23117)

b81c2cd

* Fix handling of MCP resource template parameters * Fix formatting for uri-template.test.ts --------- Co-authored-by: kuba <kuba@laptop.local.net>

vendor : update cpp-httplib to 0.45.0 (ggml-org#23103)

18675b6

ui: Correct links in tools/ui/README.md [no ci] (ggml-org#23139)

25b1bc9

In `tools/ui/README.md`, update the relative links, now that the `README.md` file has been moved from `tools/server/webui/` to `tools/ui/`. See ggml-org@59778f0.

ggml: install ggml.pc in <libdir>/pkgconfig (ggml/1480)

2eb3e6b

That's always how it's done: https://github.com/search?q=path%3ACMakeLists.txt%20%22%24%7BCMAKE_INSTALL_LIBDIR%7D%2Fpkgconfig%22&type=code

ggml : bump version to 0.12.0 (ggml/1494)

e6c37a1

sync : ggml

3a92bc9

ui: Add request timeout for MCP tool calls (ggml-org#23138)

0253fb2

* feat: Add request timeout for MCP tool calls in llama-ui * feat: MCP Settings tab with max timeout setting

vulkan: removed duplicate #include <memory> in headers (ggml-org#23144)

6049906

server: skip device enumeration in router mode to avoid creating CUDA…

64b38b5

… primary context (ggml-org#23137)

server: (router) alloc tmp buffer on heap (ggml-org#23159)

b64739e

jan-service-account merged commit c1531b9 into dev May 17, 2026
14 checks passed

jan-service-account deleted the update-dev-from-master-2026-05-17-01-10 branch May 17, 2026 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync master with upstream release b9190#518

Sync master with upstream release b9190#518
jan-service-account merged 17 commits into
devfrom
update-dev-from-master-2026-05-17-01-10

jan-service-account commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

jan-service-account commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants