chore: bump llama.cpp to b9770 by github-actions[bot] · Pull Request #19 · leehack/llama-web-bridge

github-actions · 2026-06-18T13:47:01Z

llama.cpp update

Previous pin: b9699
New pin: b9770
Upstream release: https://github.com/ggml-org/llama.cpp/releases/tag/b9770
Compare: ggml-org/llama.cpp@b9699...b9770

Upstream changelog

Release notes for b9770

Details

server: fix remote preset handling, add test (#24938)

server: add test for remote preset
fix remote preset handling
fix
fix test

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

Commit range

Commits from b9699 to b9770 (first 80)

[SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO (#24719) (9724f66)
mtmd: refactor preprocessor, add mtmd_image_preproc_out (#24736) (24bba7b)
server: fix router args not being forwarded to child instances (#24760) (968c438)
server: (router) rework -hf preset repo (#24739) (552258c)
server : return HTTP 400 on invalid grammar (#24144) (#24154) (1078621)
ui: provide touch accessible model selection UI (#24604) (2083217)
server : add last-5-seconds generation speed display (#24291) (0802307)
server: add "schema" and validation (#24150) (e1efd09)
server: (router) fix stopping_thread potentially hang (#24728) (fe7c8b2)
docs: fix export-lora --lora-scaled syntax [no release] (#24703) (7b6c5a2)
hexagon: support for op-trace (fine-grain tracing of HVX/HMX/DMA events) (#24592) (d2c6795)
mtmd: refactor llava-uhd overview image handling (always use ov_img_first) (#24769) (060ce1b)
cmake : fix ui build with read-only source (#24752) (32eddaf)
mtmd: add batching for mtmd-cli, add video tests (#24778) (a6b3260)
server: add "X-Accel-Buffering": "no" header to streaming endpoints (#24774) (40f3aaf)
Ggml/cuda col2im 1d (#24417) (3a3edc9)
mtmd: add batching support for internvl (#24775) (db52540)
ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753) (8141e73)
server : consolidate slot selection into get_available_slot (#24755) (80452d6)
pi : remove docs from system prompt (#24791) (5bd21b8)
ggml : bump version to 0.15.2 (ggml/1548) (1868af1)
sync : ggml (5fd2dc2)
server: fix non-bound n_discard value (ctx shifting) (#24786) (159d093)
spec: support eagle3 for qwen3.5 & 3.6 (#24593) (b14e3fb)
mtmd: several bug fixes (#24784) (e2e7a9b)
docker : build the UI (#24794) (38724ab)
server: add --agent arg, remove redundant webui naming compat (#24801) (8c2d6f6)
vendor : update cpp-httplib to 0.48.0 (#24787) (0d2d9cc)
arg: Add comment line support to --api-key-file (#23168) (fabde3b)
server: remove all internal mentions about "webui" (#24817) (175147e)
mtmd, arg: fix utf8 handling on windows (#24779) (e475fa2)
server : optimize get_token_probabilities (#24796) (4b48a53)
server: refactor child --> router communication (#24821) (2b686a9)
ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA (f449e05)
convert : more consistent handling of rope_parameters (#24833) (f4043fe)
ggml : optimize AMX (#24806) (37a77fb)
model : glm-dsa load DSA indexer tensors as optional (#24770) (796f41b)
docker : prebuild web UI for s390x build [no release] (#24829) (67e9fd3)
server: avoid forwarding auth headers in CORS proxy (#24373) (e27f308)
release: add missing link for win opencl adreno arm64 (#24809) (8452824)
arg: try fixing test-args-parser randomly fails (#24826) (75f460a)
llama : use LLM_KV for quantization_version & file_type (#24802) (84de01a)
fix(hexagon): use padded stride for ssm-conv weights (#24470) (4a80943)
common/json-schema-to-grammar : align spacing rules with parsers (#24835) (c576070)
common/peg : refactor until gbnf grammar generation (#24839) (063d9c1)
spec : Support Step3.5/3.7 flash mtp3 (#24340) (d789527)
minor : clean-up whitespaces (#24862) (8a118ee)
server: real-time model load progress tracking via /models/sse (#24828) (d6d8995)
server: add "verbose" field to schema (#24864) (bfa3219)
mtmd: add load progress callback (#24865) (2f89acc)
jinja : implement call statement (#24847) (bf53382)
mtmd: fix mtmd_get_memory_usage (#24867) (0d135df)
server: refactor batch construction (#24843) (bddfd2b)
server: fix report progress for loading spec models, add "stages" list (#24870) (7c082bc)
common/peg : implement ac parser for stricter grammar generation (#24869) (52b3df0)
docs/android.md: Add dependency libandroid-spawn for building in termux (#21812) (0ef6f06)
server: fix edit_file crash on append at end of file (line_start -1) (#24893) (d0f9d2e)
sampling : remove unconditional softmax+sort in top-n-sigma sampler (#22645) (37957e8)
[SYCL] support bf16 on bin_bcast OP and unary OPs (#24838) (f8cc15f)
ui: model status and load progress via /models/sse feed (#24878) (099b579)
server: refactor/generalize input file schema (#24299) (6ee0f65)
server: (router) move model downloading to dedicated process (#24834) (721354f)
ui: Prioritize favorite models in model selection (#24766) (9c0ac88)
server : Add id to tool call responses api (#24882) (dec5ca5)
opencl: q8_0 gemv precision improvement (#24923) (23ee879)
server: improve user message detection and create checkpoints at every user message (#24176) (73618f2)
codeowners: add yomaytk to ggml-webgpu (#24930) (035cd8f)
ggml-webgpu: improve MTP inference by using mat-vec path for small batches (#24811) (7c90850)
model: Granite Speech Plus (#24818) (a3900a6)
vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled (#24444) (c926ad0)
server: fix remote preset handling, add test (#24938) (75ad0b2)

Web bridge review focus

Please pay extra attention to upstream changes touching:

WebGPU, WASM, Emscripten, pthreads, or memory64 build behavior
ggml backend APIs used by the bridge
model loading, tokenizer, chat template, context/state persistence, or cache semantics
CMake/build flags that can affect the generated JS/WASM artifacts

Validation

Emscripten build passed
Browser WebGPU/state-persistence smoke passed
Generated bridge artifacts include wasm32 and memory64 outputs
No stale hard-coded llama.cpp tag remains in CI/publish defaults

Automation behavior

This PR is managed from the stable branch automation/bump-llama-cpp. If another llama.cpp release appears before merge, the scheduled workflow updates this same PR instead of opening a duplicate. The workflow skips if a non-automation PR already changes llama_cpp.version.

github-actions Bot added automated dependencies labels Jun 18, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from fcc8744 to 37a6ad0 Compare June 19, 2026 13:49

github-actions Bot changed the title ~~chore: bump llama.cpp to b9701~~ chore: bump llama.cpp to b9724 Jun 19, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 37a6ad0 to 83f441d Compare June 22, 2026 15:32

github-actions Bot changed the title ~~chore: bump llama.cpp to b9724~~ chore: bump llama.cpp to b9760 Jun 22, 2026

chore: bump llama.cpp to b9770

3892492

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 83f441d to 3892492 Compare June 23, 2026 12:47

github-actions Bot changed the title ~~chore: bump llama.cpp to b9760~~ chore: bump llama.cpp to b9770 Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump llama.cpp to b9770#19

chore: bump llama.cpp to b9770#19
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llama.cpp update

Upstream changelog

Commit range

Web bridge review focus

Validation

Automation behavior

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 18, 2026 •

edited

Loading