Skip to content

fix: CPU refit for BVH AABBs to work around Metal relaxed atomics#15

Draft
jkrumbiegel wants to merge 1 commit intoJuliaGeometry:sd/multitype-vecfrom
jkrumbiegel:jk/fix-metal-bvh-refit
Draft

fix: CPU refit for BVH AABBs to work around Metal relaxed atomics#15
jkrumbiegel wants to merge 1 commit intoJuliaGeometry:sd/multitype-vecfrom
jkrumbiegel:jk/fix-metal-bvh-refit

Conversation

@jkrumbiegel
Copy link
Copy Markdown

Summary

  • The GPU refit_aabbs_kernel uses atomic counters for bottom-up AABB computation (Karras 2012). On Metal, Atomix.jl hardcodes memory_order_relaxed for all atomics, so regular memory writes (node AABB updates) aren't visible across threadgroups. This causes some internal BVH nodes to have stale/zeroed AABBs non-deterministically, leading to missing geometry in renders.
  • Adds CPU refit functions (cpu_refit_blas_aabbs!, cpu_refit_tlas_aabbs!) that recursively recompute all internal AABBs after the GPU kernel. Applied in build_blas, _build_tlas_topology, and refit_tlas!.
  • Adds a determinant threshold (1e-7) in fast_intersect_triangle to avoid division by zero for parallel rays.

How these changes were tested

  • Built BLAS on Metal 5 times before fix: 2–4 zeroed AABBs each time (non-deterministic). After fix: 0 zeroed AABBs in all 5 trials.
  • Rendered a single extruded polygon MWE (2040 faces) that previously had missing chunks — now renders completely.
  • Rendered full volcano contour scene (30 isoband layers, ~100 mesh instances) with no missing geometry.
  • Verified CPU backend is unaffected (the CPU refit is skipped on KA.CPU).

Created with the help of Claude Code

The GPU refit_aabbs_kernel uses atomic counters for bottom-up AABB
computation (Karras 2012). On Metal, Atomix.jl hardcodes
memory_order_relaxed for all atomic operations, ignoring the requested
ordering. This means regular memory writes (node AABB updates) are not
guaranteed to be visible across threadgroups, causing some internal
nodes to end up with stale or zeroed AABBs non-deterministically.

The fix copies nodes to CPU after the GPU refit kernel, recursively
recomputes all internal AABBs, and copies back. Applied to build_blas,
_build_tlas_topology, and refit_tlas!.

Also adds a determinant threshold (1e-7) to fast_intersect_triangle to
avoid division by zero for rays exactly parallel to a triangle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant