Skip to content

Red2Band: revisit distributed panel computation to group communications#1338

Draft
albestro wants to merge 7 commits into
masterfrom
alby/r2b-panel-revisited
Draft

Red2Band: revisit distributed panel computation to group communications#1338
albestro wants to merge 7 commits into
masterfrom
alby/r2b-panel-revisited

Conversation

@albestro
Copy link
Copy Markdown
Collaborator

@albestro albestro commented Aug 8, 2025

This PR explores a possible optimization of reduction to band panel computation, by grouping the two all reduce in a single one (with more data) in each reflector computation.

Panel computation without this PR

for (j : panel.cols()) {
  computeX0AndSquares();    // each rank computes local sum(squares) + just head rank extract x0

  allReduce(x0, squares);   // each rank get total sum(squares) + x0 (it just get broadcasted)

  tau = computeReflector(); // now the reflector can be computed
  computeW();               // each rank computes w = Pt* . v

  allReduce(w);             // each rank get the final w

  updateTrailingPanel();    // update trailing panel with v and w
}

With this PR instead of having two separate MPI_AllReduce we group them in a single one (with more data), and it needs a bit of reshuffling of the operations

for (j : panel.cols()) {
  computeX0AndSquares();             // each rank computes local sum(squares) + just head rank extract
  computeWtmp();                     // each rank start computing w, but first row is skipped
  
  allReduce(x0, squares, row0, w_s); // x0, sum(squares), row0, sum(w0)

  tau = computeReflector();
  computeW();                        // w = row0.T + (1 / (x0 - y)) * w
  updateTrailingPanel();             // update trailing panel with v and w
}

Comment thread include/dlaf/eigensolver/reduction_to_band/impl.h Outdated
@albestro albestro force-pushed the alby/r2b-panel-revisited branch 2 times, most recently from c64fdb0 to fa8f577 Compare August 21, 2025 17:23
@msimberg msimberg moved this from In Progress to Todo in DLA-F Planning Sep 11, 2025
@albestro albestro force-pushed the alby/r2b-panel-revisited branch from fa8f577 to f47206d Compare October 7, 2025 07:49
commit 3e1596821460d0f01008ce73ef22398d546dc532
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Tue Oct 7 11:14:42 2025 +0200

    drop comment about atomic vs barrier

commit f47206d
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Fri Aug 22 14:15:47 2025 +0200

    revert mirrored distribution and assign extra work to last thread instead of first

commit 5d9fa58
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Thu Aug 21 18:39:35 2025 +0200

    reverse work distribution over bulk tasks

    tid=0 is kept as last so that it might be the one with less general
    work, considering that it might already have to do other stuff (e.g. axpy)

commit 14087d7
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Fri Aug 22 10:07:50 2025 +0200

    skip w* and w (and shrink allreduce message accordingly) on last reflector

commit fcbb4cc
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Thu Aug 21 18:36:47 2025 +0200

    tid=0 is not separate from others, so 1 is enough

commit cf60ff8
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Wed Aug 20 15:21:10 2025 +0200

    move barrier after w computation in the right place

commit 6d19666
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Wed Aug 20 14:32:09 2025 +0200

    minor changes

commit 32dcdbb
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Wed Aug 20 12:52:21 2025 +0200

    simplify GER by using a well-formed reflector (i.e. delaying set-band)

commit 4dbb17b
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Wed Aug 20 12:27:09 2025 +0200

    unify tiles split over threads for all steps
    had to move squares computation just before reduce to skip a sync barrier

commit a1b5790
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Fri Aug 8 17:06:01 2025 +0200

    delete unused code

commit 07d52a5
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Fri Aug 8 12:03:44 2025 +0200

    remove console output used for debugging

commit f9f1fde
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Fri Aug 8 12:01:02 2025 +0200

    fix complex: row0 should be conjugated transpoed (used lacgv)

commit fc2a7d9
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Thu Aug 7 17:40:45 2025 +0200

    fix problem for banded matrix with quick return (but set tau = 0)

commit ac53c25
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Thu Aug 7 14:21:13 2025 +0200

    rename w and its workspaces + avoid reallocation at each iteration

commit 5c75894
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Thu Aug 7 12:05:09 2025 +0200

    WIP: switch to multi-threaded scal and ger

commit 0d6c3c4
Author: Alberto Invernizzi <alberto.invernizzi@cscs.ch>
Date:   Thu Aug 7 11:11:47 2025 +0200

    WIP: basic implementation with a lot of sharp edges
@albestro albestro force-pushed the alby/r2b-panel-revisited branch from f47206d to 9fdf255 Compare October 9, 2025 09:01
@albestro albestro changed the base branch from master to alby/r2b-well-formed-refactoring October 9, 2025 09:23
@albestro
Copy link
Copy Markdown
Collaborator Author

albestro commented Oct 9, 2025

cscs-ci run

@albestro albestro moved this from Todo to Review in DLA-F Planning Oct 20, 2025
Base automatically changed from alby/r2b-well-formed-refactoring to master October 21, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

2 participants