Skip to content

Add DLAF_WITH_MPI_GPU_AWARE_NO_REDUCE_OPS and adapt spack variant#1354

Open
rasolca wants to merge 6 commits into
masterfrom
rasolca/openmpi
Open

Add DLAF_WITH_MPI_GPU_AWARE_NO_REDUCE_OPS and adapt spack variant#1354
rasolca wants to merge 6 commits into
masterfrom
rasolca/openmpi

Conversation

@rasolca
Copy link
Copy Markdown
Collaborator

@rasolca rasolca commented Jan 29, 2026

OPENMPI doesn't support GPU direct reduce operations.

Added a flag to force reduce and allreduce to use CPU memory and adapted the mpi_gpu_aware spack variant to allow full GPU aware communication or limited GPU aware communication.

@rasolca rasolca added this to the v0.11.0 milestone Jan 29, 2026
@rasolca rasolca requested review from albestro and msimberg January 29, 2026 17:19
@rasolca rasolca self-assigned this Jan 29, 2026
@rasolca rasolca added the Type:Bug Something isn't working label Jan 29, 2026
@github-project-automation github-project-automation Bot moved this to In Progress in DLA-F Planning Jan 29, 2026
@rasolca
Copy link
Copy Markdown
Collaborator Author

rasolca commented Jan 29, 2026

cscs-ci run

Copy link
Copy Markdown
Collaborator

@msimberg msimberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rasolca, this makes sense. Minor comment on the cmake option, and one question for my understanding on the variant.

Comment thread CMakeLists.txt Outdated
Comment thread spack/packages/dla-future/package.py
Comment thread spack/packages/dla-future/package.py
Copy link
Copy Markdown
Collaborator

@albestro albestro Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a possible alternative idea

https://godbolt.org/z/86Pdfv8qe

@rasolca
Copy link
Copy Markdown
Collaborator Author

rasolca commented Jan 30, 2026

cscs-ci run

Comment thread spack/packages/dla-future/package.py Outdated
@rasolca
Copy link
Copy Markdown
Collaborator Author

rasolca commented Feb 27, 2026

cscs-ci run

@rasolca
Copy link
Copy Markdown
Collaborator Author

rasolca commented Mar 18, 2026

cscs-ci run

@rasolca rasolca requested a review from msimberg March 18, 2026 14:42
Copy link
Copy Markdown
Collaborator

@msimberg msimberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @rasolca.

@albestro's also looks like a nice idea (https://github.com/eth-cscs/DLA-Future/pull/1354/changes#r2747356273), but I think it can be left like this as well.

Copy link
Copy Markdown
Collaborator

@albestro albestro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just seen that CI fails, and it might be worth to report that I'm facing runtime problems with this PR (on jureca). Not investigated yet.

Copy link
Copy Markdown
Collaborator

@albestro albestro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just seen that CI fails, and it might be worth to report that I'm facing runtime problems with this PR (on jureca). Not investigated yet.

My runtime errors has been fixed with the following commit 8957afe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority:High Type:Bug Something isn't working

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants