Skip to content

feat(bb/msm): headless WebGPU-only MSM autorun + static build for real-device runs#23471

Draft
AztecBot wants to merge 6 commits into
zw/msm-webgpu-experiments-v2from
cb/94fc3cdaedb7
Draft

feat(bb/msm): headless WebGPU-only MSM autorun + static build for real-device runs#23471
AztecBot wants to merge 6 commits into
zw/msm-webgpu-experiments-v2from
cb/94fc3cdaedb7

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented May 21, 2026

WebGPU MSM + base-field diagnostics & timing on real devices (BrowserStack), without the WASM/COI path, plus a static build so the page loads on a phone.

Autorun modes (?autorun= / mobile-safe ?cfg=<base64(JSON)>)

  • msm-webgpu — one WebGPU MSM (random [kᵢ]G points by default).
  • msm-diag — per-stage diagnostics. MsmV2.runStaged() submits each kernel separately (canary readback per stage → device-loss attribution); collectDiagnostics() digests every stage buffer for M2-vs-S25 comparison; posts host scalar/point digests + device.lost.
  • msm-sweep — WebGPU MSM timing sweep logn_min..logn_max, median of reps, per size; skips MsmV2's all-0x01 warm-up (which collapses every point into one bucket → a maximally deep single-submit pair-tree that trips a mobile GPU's watchdog) and warms with the real inputs.
  • bench-field-verify — 1M montmul/fr_add/fr_sub vs JS BigInt.
  • ?cfg=<base64> packs all params into one truncation-safe value (BrowserStack mobile workers drop everything after the first &).

Findings (BrowserStack, identical seeded input via cfg)

  • Field primitives bit-exact on M2 and S25 (montmul/add/sub, 1,048,576 each, 0 mismatches).
  • MSM compute is correct on the S25: per-stage, every stage buffer hashes identically M2==S25 (incl. bucketResult, redBuf); the only difference, valIdx_scatter, is benign atomic-ordering (bucket sums are order-independent).
  • Timing sweep (median ms, random points, seed=1):
log₂N M2 (M2 macOS Chrome) S25 (Android, Snapdragon)
10 12.8 3340.9 (hang/recover)
11 13.9 value is not invertible
12 15.6
13 19.6
14 26.9
15 38.5
16 53.4
17 104.6
18 171.6

M2 runs the whole batched MSM cleanly across all sizes. The S25 cannot run the batched single-submit run() reliably — it hangs/recovers (3.3 s at 2^10) or corrupts the result (value is not invertible at 2^11), even though its per-stage execution is correct. The defect is the large single command buffer on the Adreno/Dawn-Vulkan path (VK_ERROR_DEVICE_LOST / watchdog), not the algorithm.

  • iPhone: still not reachable via BrowserStack /5/worker JS-Testing.

Run it

cd barretenberg/ts && yarn install
node_modules/.bin/vite build --config dev/msm-webgpu/vite.build.config.ts
node dev/msm-webgpu/scripts/serve-static.mjs --port 5199
# tunnel 5199; on the device (single mobile-safe cfg param):
#   <tunnel>/dev/msm-webgpu/index.html?cfg=<base64({"autorun":"msm-sweep","logn_min":10,"logn_max":18,"reps":3,"scalar_seed":1})>

dist/ is gitignored.

@AztecBot AztecBot added the claudebox Owned by claudebox. it can push to this PR. label May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant