[pull] master from ggml-org:master by pull[bot] · Pull Request #62 · CrazyForks/llama.cpp

pull · 2026-05-14T21:42:28Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@CISC

…23051) * fix: Propagate version tag to WebUI asset download in self-hosted CI * refactor: Apply suggestions from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix: Skip npm build when Node.js is not installed Avoid 'no such file or directory' errors on CI runners that lack Node.js. Check if npm is available via find_program before attempting npm install + npm run build. Falls back to HF Bucket download. * fix: Use + separator for ASSETS list to fix Windows build Replace fragile \; escaping with a + separator when passing the WebUI asset list via -DASSETS to the download script. On Windows, the \; escaping was not reliably preserved through the CMake build system, causing all asset filenames to be concatenated into one (e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single file), which broke the HF Bucket download and subsequent xxd.cmake step. + is safe because it is not special in cmd.exe (unlike | which is a pipe operator), not special in CMake's -D argument parser, and not a valid Windows filename character. CMakeLists.txt joins assets with + and webui-download.cmake splits them back via regex. * fix: Validate HF_WEBUI_VERSION environment variable with regex Add input validation for the HF_WEBUI_VERSION env var to prevent CMake list separator or path-traversal issues in stamp filenames and download URLs. Rejects non-conforming characters early. * fix: Remove 'latest' fallback for HF_WEBUI_VERSION When needs.determine-tag.outputs.tag_name is empty, let CMake's default resolution handle it (empty -> git-based version lookup) instead of falling back to 'latest'. This ensures the sentinel stamp file is consistent with CMake's resolution logic. * fix: Demote checksum verification failure to warning instead of hard gate * fix: End line character --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* ggml-webgpu: makes the flash attn vec path compile and size its split/reduce work from the device’s reported subgroup range instead of assuming 32 subgroup size. * ggml-webgpu: remove the extra max_wg_size >= max_subgroup_size guard. Remove hardcoded 32 when determine the value of reduce_wg_size and vec_nwg_cap

* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.

* update test scripts * align CI behavior between linux and android * remove automatically cancel in 15min * enable cancel-in-progress * fix ty check issue * update and fix pylint issue * update runner such that we are not restricted by the 15min limit rule * fix flake8 lint issue * update runner according to review feedback * code update according to review feedback * switch from llama-cli to llama-completion binary with -no-cnv flag

Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension. To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128.

am17an and others added 6 commits May 14, 2026 23:55

contributing: new contributors should not submit trivial fixes (#23045)

97b658c

pull Bot locked and limited conversation to collaborators May 14, 2026

pull Bot added the ⤵️ pull label May 14, 2026

pull Bot merged commit 3e037f3 into CrazyForks:master May 14, 2026

github-actions Bot added Nvidia GPU testing examples python server ggml devops WebGPU script labels May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#62

[pull] master from ggml-org:master#62
pull[bot] merged 6 commits into
CrazyForks:masterfrom
ggml-org:master

pull Bot commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

pull Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pull Bot commented May 14, 2026 •

edited

Loading