Commit d80f827
🥂 v0.7 gpu-scaffold: omnimcode-gpu crate with wgpu, 4.04x on RX 580 via Vulkan
The user's primary GPU is an AMD Radeon RX 580 (Polaris/gfx803).
Official ROCm dropped Polaris at version 4.0 and Ollama "gets fussy
about it" — the wgpu/Vulkan path avoids that pain entirely.
## Architecture
- New omnimcode-gpu crate:
- ComputeBackend trait — one method (matmul) for v0.7
- Matrix — row-major f32 boundary type
- CpuBackend — naive triple-loop, always-available ground truth
- WgpuBackend (feature wgpu) — Vulkan/Metal/DX12/OpenGL compute
- pick_backend() — feature + OMC_GPU_BACKEND env-aware
- Naive WGSL matmul kernel (16x16 workgroup, no tiling)
## Measured on AMD RX 580 (RADV POLARIS10 / Vulkan)
size (mxkxn) cpu ms wgpu ms speedup parity
64x64x64 0.052 0.228 0.23x OK
128x128x128 0.281 0.340 0.83x OK
256x256x256 1.966 0.880 2.24x OK
512x512x512 14.503 4.273 3.39x OK
1024x1024x1024 115.516 28.577 4.04x OK
Crossover ~128x128. At 1024x1024, GPU is 4.04x faster than the
naive CPU baseline. Parity passes at every size.
## Why wgpu instead of ROCm
- Official ROCm dropped Polaris at 4.0; unofficial Polaris builds
are fragile.
- wgpu via Vulkan works out of the box on the open-source RADV
driver with no SDK install.
- The ComputeBackend trait is ready for ROCm/CUDA/Metal plug-ins
when running on supported hardware. None of those are in v0.7
because Polaris (the user's target) doesn't benefit from them.
## Tests
11/11 GPU tests pass, including wgpu kernel parity check on the
user's actual GPU (max diff < 1e-4).
## What's NOT in v0.7
- Prometheus integration (v0.8 candidate: route tape_matmul through
this backend when shapes exceed CPU crossover)
- Backward pass on GPU
- Tiled / shared-memory kernels (untuned scaffold)
- f16/bf16
- ROCm/CUDA/Metal backends (trait ready, impls deferred)
## Files
- omnimcode-gpu/{Cargo.toml, README.md, src/{lib,cpu,wgpu_backend}.rs,
shaders/matmul.wgsl, examples/bench_matmul.rs}
- Cargo.toml — workspace member added
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent 3f9957c commit d80f827
15 files changed
Lines changed: 2220 additions & 24 deletions
File tree
- experiments/prometheus_parity
- omnimcode-gpu
- examples
- shaders
- src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
32 | 106 | | |
33 | 107 | | |
34 | 108 | | |
| |||
0 commit comments