feat(bb/msm): headless WebGPU-only MSM autorun + static build for real-device runs#23471
Draft
AztecBot wants to merge 6 commits into
Draft
feat(bb/msm): headless WebGPU-only MSM autorun + static build for real-device runs#23471AztecBot wants to merge 6 commits into
AztecBot wants to merge 6 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WebGPU MSM + base-field diagnostics & timing on real devices (BrowserStack), without the WASM/COI path, plus a static build so the page loads on a phone.
Autorun modes (
?autorun=/ mobile-safe?cfg=<base64(JSON)>)msm-webgpu— one WebGPU MSM (random[kᵢ]Gpoints by default).msm-diag— per-stage diagnostics.MsmV2.runStaged()submits each kernel separately (canary readback per stage → device-loss attribution);collectDiagnostics()digests every stage buffer for M2-vs-S25 comparison; posts host scalar/point digests +device.lost.msm-sweep— WebGPU MSM timing sweeplogn_min..logn_max, median ofreps, per size; skips MsmV2's all-0x01warm-up (which collapses every point into one bucket → a maximally deep single-submit pair-tree that trips a mobile GPU's watchdog) and warms with the real inputs.bench-field-verify— 1M montmul/fr_add/fr_subvs JS BigInt.?cfg=<base64>packs all params into one truncation-safe value (BrowserStack mobile workers drop everything after the first&).Findings (BrowserStack, identical seeded input via
cfg)bucketResult,redBuf); the only difference,valIdx_scatter, is benign atomic-ordering (bucket sums are order-independent).value is not invertibleM2 runs the whole batched MSM cleanly across all sizes. The S25 cannot run the batched single-submit
run()reliably — it hangs/recovers (3.3 s at 2^10) or corrupts the result (value is not invertibleat 2^11), even though its per-stage execution is correct. The defect is the large single command buffer on the Adreno/Dawn-Vulkan path (VK_ERROR_DEVICE_LOST/ watchdog), not the algorithm./5/workerJS-Testing.Run it
dist/is gitignored.