fix Fuzz constant OOM for CUDA#8481
Conversation
Merging this PR will improve performance by 15.58%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.7 µs | 213.9 µs | -16.94% |
| ⚡ | Simulation | take_10k_random |
255.8 µs | 197.8 µs | +29.27% |
| ⚡ | Simulation | take_10k_contiguous |
276.3 µs | 218.5 µs | +26.46% |
| ⚡ | Simulation | patched_take_10k_contiguous_patches |
291 µs | 232.3 µs | +25.26% |
| ⚡ | Simulation | patched_take_10k_random |
303 µs | 244.2 µs | +24.07% |
| ⚡ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
352 µs | 301.7 µs | +16.69% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 186.1 ns | +15.67% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
275.6 ns | 246.4 ns | +11.84% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing aduffy/debug-cuda-fuzz-oom (18787a8) with develop (85aad72)
Footnotes
-
11 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
cargo-fuzz builds the compress_gpu target with AddressSanitizer, which
mprotects the shadow gap PROT_NONE. That collides with the large
virtual-address range the CUDA driver reserves, so CudaSession::try_default()
fails and the target aborts on the very first input ("Failed to initialize
CUDA device 0" at vortex-cuda/src/session.rs). Setting protect_shadow_gap=0
stops ASan protecting the gap and frees the VA range for CUDA, letting device
init succeed.
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0d37fb9 to
18787a8
Compare
Pull request was closed
Summary
The Fuzz check has been failing for CUDA
compress_gpujob for a very long time now (at least a month).After some investigation, it appears this is because ASAN will mprotect a very large memory region that conflicts with some address space that the CUDA driver wants to own, resulting in cudaCtx initialization failing with OOMs like we've been seeing.
I've run a sample Fuzz workflow with this change and confirmed we are able to get past context init now.