Red2Band: revisit distributed panel computation to group communications#1338
Draft
albestro wants to merge 7 commits into
Draft
Red2Band: revisit distributed panel computation to group communications#1338albestro wants to merge 7 commits into
albestro wants to merge 7 commits into
Conversation
albestro
commented
Aug 20, 2025
c64fdb0 to
fa8f577
Compare
fa8f577 to
f47206d
Compare
commit 3e1596821460d0f01008ce73ef22398d546dc532
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Tue Oct 7 11:14:42 2025 +0200
drop comment about atomic vs barrier
commit f47206d
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Fri Aug 22 14:15:47 2025 +0200
revert mirrored distribution and assign extra work to last thread instead of first
commit 5d9fa58
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Thu Aug 21 18:39:35 2025 +0200
reverse work distribution over bulk tasks
tid=0 is kept as last so that it might be the one with less general
work, considering that it might already have to do other stuff (e.g. axpy)
commit 14087d7
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Fri Aug 22 10:07:50 2025 +0200
skip w* and w (and shrink allreduce message accordingly) on last reflector
commit fcbb4cc
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Thu Aug 21 18:36:47 2025 +0200
tid=0 is not separate from others, so 1 is enough
commit cf60ff8
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Wed Aug 20 15:21:10 2025 +0200
move barrier after w computation in the right place
commit 6d19666
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Wed Aug 20 14:32:09 2025 +0200
minor changes
commit 32dcdbb
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Wed Aug 20 12:52:21 2025 +0200
simplify GER by using a well-formed reflector (i.e. delaying set-band)
commit 4dbb17b
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Wed Aug 20 12:27:09 2025 +0200
unify tiles split over threads for all steps
had to move squares computation just before reduce to skip a sync barrier
commit a1b5790
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Fri Aug 8 17:06:01 2025 +0200
delete unused code
commit 07d52a5
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Fri Aug 8 12:03:44 2025 +0200
remove console output used for debugging
commit f9f1fde
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Fri Aug 8 12:01:02 2025 +0200
fix complex: row0 should be conjugated transpoed (used lacgv)
commit fc2a7d9
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Thu Aug 7 17:40:45 2025 +0200
fix problem for banded matrix with quick return (but set tau = 0)
commit ac53c25
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Thu Aug 7 14:21:13 2025 +0200
rename w and its workspaces + avoid reallocation at each iteration
commit 5c75894
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Thu Aug 7 12:05:09 2025 +0200
WIP: switch to multi-threaded scal and ger
commit 0d6c3c4
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date: Thu Aug 7 11:11:47 2025 +0200
WIP: basic implementation with a lot of sharp edges
f47206d to
9fdf255
Compare
Collaborator
Author
|
cscs-ci run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR explores a possible optimization of reduction to band panel computation, by grouping the two all reduce in a single one (with more data) in each reflector computation.
Panel computation without this PR
With this PR instead of having two separate
MPI_AllReducewe group them in a single one (with more data), and it needs a bit of reshuffling of the operationsstd::span(see Red2Band: use well-formed reflector for internal panel computations #1348 (comment))