Skip to content

Apply GPU optimizations to TLSPH#1139

Merged
efaulhaber merged 21 commits intotrixi-framework:mainfrom
efaulhaber:tlsph-gpu-performance
Apr 16, 2026
Merged

Apply GPU optimizations to TLSPH#1139
efaulhaber merged 21 commits intotrixi-framework:mainfrom
efaulhaber:tlsph-gpu-performance

Conversation

@efaulhaber
Copy link
Copy Markdown
Member

@efaulhaber efaulhaber commented Apr 13, 2026

Based on trixi-framework/PointNeighbors.jl#154. Tests will pass once PointNeighbors 0.6.6 is released.

@efaulhaber efaulhaber force-pushed the tlsph-gpu-performance branch from 006d4f5 to 7518a8c Compare April 14, 2026 10:26
@efaulhaber efaulhaber marked this pull request as ready for review April 14, 2026 15:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Total Lagrangian SPH (TLSPH) neighbor interactions to better match GPU-friendly execution patterns (per-particle threading, reduced memory traffic, fewer repeated loads), following the newer PointNeighbors neighbor-iteration approach.

Changes:

  • Refactor TLSPH deformation gradient and RHS assembly to use per-particle @threaded loops with foreach_neighbor, accumulating into Refs to reduce global writes.
  • Optimize penalty force and viscosity kernels for GPU performance (preload deformation gradients, use div_fast, and use smoothing_kernel_unsafe after cutoff filtering).
  • Add a SIMD-based fast path for extracting 2×2 matrices (extract_smatrix) and wire in the SIMD dependency; update tests’ mock systems accordingly.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/systems/tlsph_system.jl Updates TLSPH test mock system to a concrete struct (GPU-friendly) and adjusts numeric literals.
test/schemes/structure/total_lagrangian_sph/rhs.jl Updates RHS tests’ mock system layout and adds deformation_gradient stub required by new RHS path.
src/schemes/structure/total_lagrangian_sph/viscosity.jl Threads deformation gradient through viscosity path and applies div_fast in hot divisions.
src/schemes/structure/total_lagrangian_sph/system.jl Reworks deformation gradient assembly into per-particle neighbor loops with reduced memory writes.
src/schemes/structure/total_lagrangian_sph/rhs.jl Reworks RHS assembly similarly; passes deformation gradients into penalty/viscosity for fewer loads.
src/schemes/structure/total_lagrangian_sph/penalty_force.jl Converts penalty force to an in-place accumulator API and switches to unsafe kernel + fast divisions.
src/schemes/boundary/wall_boundary/system.jl Adds smoothing_kernel_unsafe specialization for wall boundary systems.
src/general/neighborhood_search.jl Adds foreach_neighbor wrapper around PointNeighbors neighbor iteration.
src/general/abstract_system.jl Adds extract_smatrix Val-specialization and a SIMD 2D fast path.
src/TrixiParticles.jl Imports SIMD module for use in extract_smatrix.
Project.toml Adds SIMD dependency + compat; minor reordering of weakdeps/extensions entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/general/abstract_system.jl Outdated
Comment thread src/schemes/structure/total_lagrangian_sph/viscosity.jl Outdated
Comment thread src/schemes/structure/total_lagrangian_sph/rhs.jl
Comment thread src/schemes/structure/total_lagrangian_sph/system.jl
@efaulhaber efaulhaber requested a review from svchb April 14, 2026 15:42
Comment thread src/schemes/structure/total_lagrangian_sph/penalty_force.jl Outdated
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 74.71264% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.26%. Comparing base (379db88) to head (d5766f0).

Files with missing lines Patch % Lines
...es/structure/total_lagrangian_sph/penalty_force.jl 14.28% 12 Missing ⚠️
...chemes/structure/total_lagrangian_sph/viscosity.jl 16.66% 5 Missing ⚠️
src/schemes/boundary/wall_boundary/system.jl 0.00% 3 Missing ⚠️
src/general/abstract_system.jl 84.61% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (379db88) and HEAD (d5766f0). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (379db88) HEAD (d5766f0)
total 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1139       +/-   ##
===========================================
- Coverage   89.06%   67.26%   -21.81%     
===========================================
  Files         128      128               
  Lines        9868     9857       -11     
===========================================
- Hits         8789     6630     -2159     
- Misses       1079     3227     +2148     
Flag Coverage Δ
total ?
unit 67.26% <74.71%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@efaulhaber
Copy link
Copy Markdown
Member Author

/run-gpu-tests

@efaulhaber
Copy link
Copy Markdown
Member Author

/run-gpu-tests

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/schemes/structure/total_lagrangian_sph/system.jl
Comment thread src/general/neighborhood_search.jl
Comment thread src/schemes/structure/total_lagrangian_sph/system.jl
@efaulhaber efaulhaber marked this pull request as draft April 15, 2026 17:01
@efaulhaber
Copy link
Copy Markdown
Member Author

/run-gpu-tests

@efaulhaber efaulhaber marked this pull request as ready for review April 16, 2026 13:13
@efaulhaber efaulhaber merged commit 6ee0bf4 into trixi-framework:main Apr 16, 2026
15 of 17 checks passed
@efaulhaber efaulhaber deleted the tlsph-gpu-performance branch April 16, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants