ggml-webgpu: Update register tiling matmul to use f32 accumulation by reeselevine · Pull Request #21644 · ggml-org/llama.cpp

reeselevine · 2026-04-08T18:58:38Z

Overview

Fixes issue here: WebGPU: Qwen2 models produce garbled output (repeated @ token) #21602, by doing the following:
- updates register tiling matmul to use f32 for accumulation, for increased precision and removing NaN on some older Qwen models
- Fixes weird issue with q4/q5_k quants that only affects Chrome. May be issue with Chrome's WebGPU compiler which we are planning on reporting soon
Updates shader batching to 64, which should help with performance on iOS while maintaining stability
Removes unnecessary callback_mode() accidentally introduced in ggml-webgpu: address quantization precision and backend lifecycle managment #21521.
Also fixes issue compiling with WebGPU GPU profiling turned on.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: no

abhijitramesh · 2026-04-09T18:01:11Z

The f32 accumulator fix seems to be helping us. I hooked this branch with wllama and ran the Qwen models; it seems to be working as expected on Firefox and Safari now. Chrome still seems to be broken, but it's no longer printing @@@.

Chrome output:

abhijitramesh · 2026-04-10T05:10:17Z

Now it works on Chrome as well!

…gml-org#21644) * Update register tiling matmul to use f32 accumulation * fix profiling code * Fix register tiling matmul for chrome, i'm blaming dawn * Update batch tuning value for iOS * compile fix * Fix use of new load function

reeselevine added 2 commits April 8, 2026 09:19

Update register tiling matmul to use f32 accumulation

e94d13c

fix profiling code

af4c1d5

reeselevine requested a review from a team as a code owner April 8, 2026 18:58

reeselevine mentioned this pull request Apr 8, 2026

WebGPU: Qwen2 models produce garbled output (repeated @ token) #21602

Closed

Fix register tiling matmul for chrome, i'm blaming dawn

ac5267d

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning WebGPU labels Apr 9, 2026

reeselevine added 4 commits April 10, 2026 19:21

Update batch tuning value for iOS

4edf91b

compile fix

354cb5c

Merge remote-tracking branch 'upstream/master' into reg-tile-accum-fix

bde5922

Fix use of new load function

0928d31

reeselevine added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 13, 2026

reeselevine mentioned this pull request Apr 13, 2026

ggml-webgpu: compute pass batching and removing profiling overhead #21873

Merged

ggerganov merged commit 5a23695 into ggml-org:master Apr 14, 2026
46 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-webgpu: Update register tiling matmul to use f32 accumulation#21644

ggml-webgpu: Update register tiling matmul to use f32 accumulation#21644
ggerganov merged 7 commits intoggml-org:masterfrom
reeselevine:reg-tile-accum-fix

reeselevine commented Apr 8, 2026 •

edited

Loading

Uh oh!

abhijitramesh commented Apr 9, 2026 •

edited

Loading

Uh oh!

abhijitramesh commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

reeselevine commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

abhijitramesh commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhijitramesh commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

reeselevine commented Apr 8, 2026 •

edited

Loading

abhijitramesh commented Apr 9, 2026 •

edited

Loading