Apply GPU optimizations to TLSPH#1139
Conversation
69f3560 to
006d4f5
Compare
006d4f5 to
7518a8c
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors Total Lagrangian SPH (TLSPH) neighbor interactions to better match GPU-friendly execution patterns (per-particle threading, reduced memory traffic, fewer repeated loads), following the newer PointNeighbors neighbor-iteration approach.
Changes:
- Refactor TLSPH deformation gradient and RHS assembly to use per-particle
@threadedloops withforeach_neighbor, accumulating intoRefs to reduce global writes. - Optimize penalty force and viscosity kernels for GPU performance (preload deformation gradients, use
div_fast, and usesmoothing_kernel_unsafeafter cutoff filtering). - Add a SIMD-based fast path for extracting 2×2 matrices (
extract_smatrix) and wire in theSIMDdependency; update tests’ mock systems accordingly.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/systems/tlsph_system.jl | Updates TLSPH test mock system to a concrete struct (GPU-friendly) and adjusts numeric literals. |
| test/schemes/structure/total_lagrangian_sph/rhs.jl | Updates RHS tests’ mock system layout and adds deformation_gradient stub required by new RHS path. |
| src/schemes/structure/total_lagrangian_sph/viscosity.jl | Threads deformation gradient through viscosity path and applies div_fast in hot divisions. |
| src/schemes/structure/total_lagrangian_sph/system.jl | Reworks deformation gradient assembly into per-particle neighbor loops with reduced memory writes. |
| src/schemes/structure/total_lagrangian_sph/rhs.jl | Reworks RHS assembly similarly; passes deformation gradients into penalty/viscosity for fewer loads. |
| src/schemes/structure/total_lagrangian_sph/penalty_force.jl | Converts penalty force to an in-place accumulator API and switches to unsafe kernel + fast divisions. |
| src/schemes/boundary/wall_boundary/system.jl | Adds smoothing_kernel_unsafe specialization for wall boundary systems. |
| src/general/neighborhood_search.jl | Adds foreach_neighbor wrapper around PointNeighbors neighbor iteration. |
| src/general/abstract_system.jl | Adds extract_smatrix Val-specialization and a SIMD 2D fast path. |
| src/TrixiParticles.jl | Imports SIMD module for use in extract_smatrix. |
| Project.toml | Adds SIMD dependency + compat; minor reordering of weakdeps/extensions entries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1139 +/- ##
===========================================
- Coverage 89.06% 67.26% -21.81%
===========================================
Files 128 128
Lines 9868 9857 -11
===========================================
- Hits 8789 6630 -2159
- Misses 1079 3227 +2148
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/run-gpu-tests |
|
/run-gpu-tests |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…rticles.jl into tlsph-gpu-performance
|
/run-gpu-tests |
Based on trixi-framework/PointNeighbors.jl#154. Tests will pass once PointNeighbors 0.6.6 is released.