Add support for batched tasks.#668
Open
bosilca wants to merge 3 commits intoICLDisco:masterfrom
Open
Conversation
2c004b1 to
fffc3ec
Compare
fffc3ec to
9998554
Compare
abouteiller
reviewed
Feb 26, 2025
| if( NULL != type_property) { | ||
|
|
||
| if (!strcasecmp(type_property->expr->jdf_var, "cuda") | ||
| if (!strncasecmp(type_property->expr->jdf_var, "cuda", 4) /* for batched */ |
Contributor
There was a problem hiding this comment.
Looks like a leftover from a prior iteration of that patchset that used the type=cuda_batched instead of adding a new batched property.
I assume the expectation is that we can have batched and non batched CUDA bodies simultaneously. Did you test this works?
The idea is the following: - tasks incarnations (aka. BODY) can be marked with the "batch" property allowing the runtime to provide the task with the entire list of ready tasks of the execution stream instead of just extracting the head. - this list of ready tasks is in fact a ring, that can then be trimmed by the kernel and divided into batch and the rest. The rest of the tasks will be left in the ring, while the batch group will be submitted for execution. - the kernel also needs to provide a callback into the gpu_task complete_stage, such that the runtime can call the specialized function able to complete all batched tasks. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Replace the CUDA-specific batch build switch with PARSEC_HAVE_DEV_CAPABILITY_BATCH so batching is a runtime capability shared by all supported device types. Export the new option through parsec_options and PaRSECConfig. Add per-device MCA parameters to disable batching for CPU, recursive, CUDA, HIP, and Level Zero devices. Use shared helpers to sanitize batch chore types in DTD and to gate GPU task-ring batching on the selected device. Teach PTG to accept batch=true for CPU/default bodies as well as typed device bodies, and add CPU batch examples for both PTG and DTD with ctest coverage for the enabled and CPU-disabled DTD paths. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Replace the CUDA-specific batch build switch with PARSEC_HAVE_DEV_CAPABILITY_BATCH so batching is a runtime capability shared by all supported device types. Export the new option through parsec_options and PaRSECConfig. Add per-device MCA parameters to disable batching for CPU, recursive, CUDA, HIP, and Level Zero devices. Use shared helpers to sanitize batch chore types in DTD and to gate GPU task-ring batching on the selected device. Teach PTG to accept batch=true for CPU/default bodies as well as typed device bodies, and add CPU batch examples for both PTG and DTD with ctest coverage for the enabled and CPU-disabled DTD paths. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
88bbbd3 to
41fa201
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The idea is the following: