WebGPU backend roadmap for Q1_0 kernels?

Hi, thanks for the great work on Bonsai-8B and the Q1_0 quantization.

I'd like to ask about plans for running Q1_0 models in the browser via WebGPU.

Current state (as far as I can see):
- The Q1_0 custom kernels exist for CUDA (.cu), Metal (.metal), and Vulkan (GLSL/SPIR-V), plus CPU.
- Upstream ggml-org/llama.cpp has an active WebGPU backend in development (multiple PRs merged in recent weeks, including Intel subgroup matrix support).
- However, the WebGPU backend upstream only targets standard quant types; there is no WGSL implementation of the Q1_0 kernel yet.

Questions:
1. Is a WGSL port of the Q1_0 kernel on your roadmap, or under consideration?
2. If not planned internally, would you be open to a community PR once the upstream WebGPU backend stabilizes?
3. Any preference between waiting for upstream WebGPU to mature vs. an earlier experimental port?

Running Bonsai-8B purely in-browser (no server, no install) would be a great showcase of the 1-bit quantization story — curious where this sits in your plans.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebGPU backend roadmap for Q1_0 kernels? #24

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WebGPU backend roadmap for Q1_0 kernels? #24

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions