Skip to content

Commit a98e178

Browse files
efaulhaberLasNikas
andauthored
Further improve TLSPH deformation gradient performance on GPUs in 2D (#1149)
* Further improve TLSPH deformation performance on GPUs * Add comment * Add news entry * Update comments --------- Co-authored-by: Niklas Neher <73897120+LasNikas@users.noreply.github.com>
1 parent 8d7de50 commit a98e178

3 files changed

Lines changed: 19 additions & 8 deletions

File tree

NEWS.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ TrixiParticles.jl follows the interpretation of
44
[semantic versioning (semver)](https://julialang.github.io/Pkg.jl/dev/compatibility/#Version-specifier-format-1)
55
used in the Julia ecosystem. Notable changes will be documented in this file for human readability.
66

7-
## Version 0.4.4
7+
## Version 0.5
88

99
### API Changes
1010

@@ -21,6 +21,13 @@ used in the Julia ecosystem. Notable changes will be documented in this file for
2121
- Added new validation case hydrostatic water column (#724).
2222
- Added Carreau–Yasuda non-Newtonian viscosity model (#1010).
2323

24+
### Performance
25+
26+
- Greatly improved GPU performance of WCSPH and TLSPH
27+
(#1128, #1117, #1124, #1125, #1130, #1116, #1139, #1149).
28+
See [#1131](https://github.com/trixi-framework/TrixiParticles.jl/issues/1131)
29+
for a detailed breakdown including benchmark results.
30+
2431
### Important Bugfixes
2532

2633
- Fixed the periodic array of cylinders example file (#975).

src/schemes/structure/total_lagrangian_sph/rhs.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@ end
7171
F_b = @inbounds deformation_gradient(system, neighbor)
7272

7373
current_pos_diff_ = current_coords_a - current_coords_b
74-
# On GPUs, convert `Float64` coordinates to `Float32` after computing the difference
74+
# In mixed-precision simulations, convert from `coordinates_eltype(system)`
75+
# to `eltype(system)` immediately after computing the difference.
7576
current_pos_diff = convert.(eltype(system), current_pos_diff_)
7677
current_distance = norm(current_pos_diff)
7778

src/schemes/structure/total_lagrangian_sph/system.jl

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -486,9 +486,6 @@ end
486486
@inline function calc_deformation_grad!(deformation_grad, system, semi)
487487
(; mass, material_density) = system
488488

489-
# Reset deformation gradient
490-
set_zero!(deformation_grad)
491-
492489
# For `distance == 0`, the analytical gradient is zero, but the unsafe gradient
493490
# and the density diffusion divide by zero.
494491
# To account for rounding errors, we check if `distance` is almost zero.
@@ -530,10 +527,15 @@ end
530527
grad_kernel = smoothing_kernel_grad_unsafe(system, initial_pos_diff,
531528
initial_distance, particle)
532529

533-
volume = @inbounds mass[neighbor] / material_density[neighbor]
530+
# Since this is one of the most performance critical functions, using fast
531+
# divisions here gives a significant speedup on GPUs.
532+
# See the docs page "Development" for more details on `div_fast`.
533+
volume = @inbounds div_fast(mass[neighbor], material_density[neighbor])
534534
current_coords_b = @inbounds current_coords(system, neighbor)
535+
535536
pos_diff_ = current_coords_a - current_coords_b
536-
# On GPUs, convert `Float64` coordinates to `Float32` after computing the difference
537+
# In mixed-precision simulations, convert from `coordinates_eltype(system)`
538+
# to `eltype(system)` immediately after computing the difference.
537539
pos_diff = convert.(eltype(system), pos_diff_)
538540

539541
# The tensor product pos_diff ⊗ (L_{0a} * ∇W) is equivalent to multiplication
@@ -542,7 +544,8 @@ end
542544
end
543545

544546
for j in 1:ndims(system), i in 1:ndims(system)
545-
@inbounds deformation_grad[i, j, particle] += result[][i, j]
547+
# We overwrite every entry of `deformation_grad`, so no `set_zero!` is required.
548+
@inbounds deformation_grad[i, j, particle] = result[][i, j]
546549
end
547550
end
548551

0 commit comments

Comments
 (0)