Skip to content

chore: bump llama.cpp to b9770#19

Open
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp
Open

chore: bump llama.cpp to b9770#19
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp

Conversation

@github-actions

@github-actions github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

llama.cpp update

Upstream changelog

Release notes for b9770
Details

server: fix remote preset handling, add test (#24938)

  • server: add test for remote preset

  • fix remote preset handling

  • fix

  • fix test

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Commit range

Commits from b9699 to b9770 (first 80)
  • [SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO (#24719) (9724f66)
  • mtmd: refactor preprocessor, add mtmd_image_preproc_out (#24736) (24bba7b)
  • server: fix router args not being forwarded to child instances (#24760) (968c438)
  • server: (router) rework -hf preset repo (#24739) (552258c)
  • server : return HTTP 400 on invalid grammar (#24144) (#24154) (1078621)
  • ui: provide touch accessible model selection UI (#24604) (2083217)
  • server : add last-5-seconds generation speed display (#24291) (0802307)
  • server: add "schema" and validation (#24150) (e1efd09)
  • server: (router) fix stopping_thread potentially hang (#24728) (fe7c8b2)
  • docs: fix export-lora --lora-scaled syntax [no release] (#24703) (7b6c5a2)
  • hexagon: support for op-trace (fine-grain tracing of HVX/HMX/DMA events) (#24592) (d2c6795)
  • mtmd: refactor llava-uhd overview image handling (always use ov_img_first) (#24769) (060ce1b)
  • cmake : fix ui build with read-only source (#24752) (32eddaf)
  • mtmd: add batching for mtmd-cli, add video tests (#24778) (a6b3260)
  • server: add "X-Accel-Buffering": "no" header to streaming endpoints (#24774) (40f3aaf)
  • Ggml/cuda col2im 1d (#24417) (3a3edc9)
  • mtmd: add batching support for internvl (#24775) (db52540)
  • ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753) (8141e73)
  • server : consolidate slot selection into get_available_slot (#24755) (80452d6)
  • pi : remove docs from system prompt (#24791) (5bd21b8)
  • ggml : bump version to 0.15.2 (ggml/1548) (1868af1)
  • sync : ggml (5fd2dc2)
  • server: fix non-bound n_discard value (ctx shifting) (#24786) (159d093)
  • spec: support eagle3 for qwen3.5 & 3.6 (#24593) (b14e3fb)
  • mtmd: several bug fixes (#24784) (e2e7a9b)
  • docker : build the UI (#24794) (38724ab)
  • server: add --agent arg, remove redundant webui naming compat (#24801) (8c2d6f6)
  • vendor : update cpp-httplib to 0.48.0 (#24787) (0d2d9cc)
  • arg: Add comment line support to --api-key-file (#23168) (fabde3b)
  • server: remove all internal mentions about "webui" (#24817) (175147e)
  • mtmd, arg: fix utf8 handling on windows (#24779) (e475fa2)
  • server : optimize get_token_probabilities (#24796) (4b48a53)
  • server: refactor child --> router communication (#24821) (2b686a9)
  • ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA (f449e05)
  • convert : more consistent handling of rope_parameters (#24833) (f4043fe)
  • ggml : optimize AMX (#24806) (37a77fb)
  • model : glm-dsa load DSA indexer tensors as optional (#24770) (796f41b)
  • docker : prebuild web UI for s390x build [no release] (#24829) (67e9fd3)
  • server: avoid forwarding auth headers in CORS proxy (#24373) (e27f308)
  • release: add missing link for win opencl adreno arm64 (#24809) (8452824)
  • arg: try fixing test-args-parser randomly fails (#24826) (75f460a)
  • llama : use LLM_KV for quantization_version & file_type (#24802) (84de01a)
  • fix(hexagon): use padded stride for ssm-conv weights (#24470) (4a80943)
  • common/json-schema-to-grammar : align spacing rules with parsers (#24835) (c576070)
  • common/peg : refactor until gbnf grammar generation (#24839) (063d9c1)
  • spec : Support Step3.5/3.7 flash mtp3 (#24340) (d789527)
  • minor : clean-up whitespaces (#24862) (8a118ee)
  • server: real-time model load progress tracking via /models/sse (#24828) (d6d8995)
  • server: add "verbose" field to schema (#24864) (bfa3219)
  • mtmd: add load progress callback (#24865) (2f89acc)
  • jinja : implement call statement (#24847) (bf53382)
  • mtmd: fix mtmd_get_memory_usage (#24867) (0d135df)
  • server: refactor batch construction (#24843) (bddfd2b)
  • server: fix report progress for loading spec models, add "stages" list (#24870) (7c082bc)
  • common/peg : implement ac parser for stricter grammar generation (#24869) (52b3df0)
  • docs/android.md: Add dependency libandroid-spawn for building in termux (#21812) (0ef6f06)
  • server: fix edit_file crash on append at end of file (line_start -1) (#24893) (d0f9d2e)
  • sampling : remove unconditional softmax+sort in top-n-sigma sampler (#22645) (37957e8)
  • [SYCL] support bf16 on bin_bcast OP and unary OPs (#24838) (f8cc15f)
  • ui: model status and load progress via /models/sse feed (#24878) (099b579)
  • server: refactor/generalize input file schema (#24299) (6ee0f65)
  • server: (router) move model downloading to dedicated process (#24834) (721354f)
  • ui: Prioritize favorite models in model selection (#24766) (9c0ac88)
  • server : Add id to tool call responses api (#24882) (dec5ca5)
  • opencl: q8_0 gemv precision improvement (#24923) (23ee879)
  • server: improve user message detection and create checkpoints at every user message (#24176) (73618f2)
  • codeowners: add yomaytk to ggml-webgpu (#24930) (035cd8f)
  • ggml-webgpu: improve MTP inference by using mat-vec path for small batches (#24811) (7c90850)
  • model: Granite Speech Plus (#24818) (a3900a6)
  • vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled (#24444) (c926ad0)
  • server: fix remote preset handling, add test (#24938) (75ad0b2)

Web bridge review focus

Please pay extra attention to upstream changes touching:

  • WebGPU, WASM, Emscripten, pthreads, or memory64 build behavior
  • ggml backend APIs used by the bridge
  • model loading, tokenizer, chat template, context/state persistence, or cache semantics
  • CMake/build flags that can affect the generated JS/WASM artifacts

Validation

  • Emscripten build passed
  • Browser WebGPU/state-persistence smoke passed
  • Generated bridge artifacts include wasm32 and memory64 outputs
  • No stale hard-coded llama.cpp tag remains in CI/publish defaults

Automation behavior

This PR is managed from the stable branch automation/bump-llama-cpp. If another llama.cpp release appears before merge, the scheduled workflow updates this same PR instead of opening a duplicate. The workflow skips if a non-automation PR already changes llama_cpp.version.

@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from fcc8744 to 37a6ad0 Compare June 19, 2026 13:49
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9701 chore: bump llama.cpp to b9724 Jun 19, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from 37a6ad0 to 83f441d Compare June 22, 2026 15:32
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9724 chore: bump llama.cpp to b9760 Jun 22, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from 83f441d to 3892492 Compare June 23, 2026 12:47
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9760 chore: bump llama.cpp to b9770 Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant