Skip to content

WebGPU backend roadmap for Q1_0 kernels? #24

@pmenic

Description

@pmenic

Hi, thanks for the great work on Bonsai-8B and the Q1_0 quantization.

I'd like to ask about plans for running Q1_0 models in the browser via WebGPU.

Current state (as far as I can see):

  • The Q1_0 custom kernels exist for CUDA (.cu), Metal (.metal), and Vulkan (GLSL/SPIR-V), plus CPU.
  • Upstream ggml-org/llama.cpp has an active WebGPU backend in development (multiple PRs merged in recent weeks, including Intel subgroup matrix support).
  • However, the WebGPU backend upstream only targets standard quant types; there is no WGSL implementation of the Q1_0 kernel yet.

Questions:

  1. Is a WGSL port of the Q1_0 kernel on your roadmap, or under consideration?
  2. If not planned internally, would you be open to a community PR once the upstream WebGPU backend stabilizes?
  3. Any preference between waiting for upstream WebGPU to mature vs. an earlier experimental port?

Running Bonsai-8B purely in-browser (no server, no install) would be a great showcase of the 1-bit quantization story — curious where this sits in your plans.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions