[pull] master from ggml-org:master#62
Merged
Merged
Conversation
…23051) * fix: Propagate version tag to WebUI asset download in self-hosted CI * refactor: Apply suggestions from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix: Skip npm build when Node.js is not installed Avoid 'no such file or directory' errors on CI runners that lack Node.js. Check if npm is available via find_program before attempting npm install + npm run build. Falls back to HF Bucket download. * fix: Use + separator for ASSETS list to fix Windows build Replace fragile \; escaping with a + separator when passing the WebUI asset list via -DASSETS to the download script. On Windows, the \; escaping was not reliably preserved through the CMake build system, causing all asset filenames to be concatenated into one (e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single file), which broke the HF Bucket download and subsequent xxd.cmake step. + is safe because it is not special in cmd.exe (unlike | which is a pipe operator), not special in CMake's -D argument parser, and not a valid Windows filename character. CMakeLists.txt joins assets with + and webui-download.cmake splits them back via regex. * fix: Validate HF_WEBUI_VERSION environment variable with regex Add input validation for the HF_WEBUI_VERSION env var to prevent CMake list separator or path-traversal issues in stamp filenames and download URLs. Rejects non-conforming characters early. * fix: Remove 'latest' fallback for HF_WEBUI_VERSION When needs.determine-tag.outputs.tag_name is empty, let CMake's default resolution handle it (empty -> git-based version lookup) instead of falling back to 'latest'. This ensures the sentinel stamp file is consistent with CMake's resolution logic. * fix: Demote checksum verification failure to warning instead of hard gate * fix: End line character --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* ggml-webgpu: makes the flash attn vec path compile and size its split/reduce work from the device’s reported subgroup range instead of assuming 32 subgroup size. * ggml-webgpu: remove the extra max_wg_size >= max_subgroup_size guard. Remove hardcoded 32 when determine the value of reduce_wg_size and vec_nwg_cap
* Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.
* update test scripts * align CI behavior between linux and android * remove automatically cancel in 15min * enable cancel-in-progress * fix ty check issue * update and fix pylint issue * update runner such that we are not restricted by the 15min limit rule * fix flake8 lint issue * update runner according to review feedback * code update according to review feedback * switch from llama-cli to llama-completion binary with -no-cnv flag
Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension. To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )