[Core][DO NOT MERGE] Improve OpenMP loop scheduling and chunk partitioning configurability in `ParallelUtilities` by loumalouomega · Pull Request #14273 · KratosMultiphysics/Kratos

loumalouomega · 2026-03-10T13:44:28Z

📝 Description

This PR updates shared-memory loop execution in ParallelUtilities to improve load balancing and runtime tunability, while keeping dynamic scheduling as the default fastest option from local benchmarking.

This is kind of a retake of #12923

NOTE: After discussing with @pooyan-dadvand we need further changes, and I may retake this later.

🚀Benchmarking

Intel® Core™ Ultra 9 Processor 285HX in Windows 11

Performance improvement for large sets, for small sets equal or slightly worse

Intel® Core™ Ultra 9 Processor 285HX in Ubuntu 24.04 via WSL

Consistent x8+ performance

🔀 What changed

Added ChunkPartitioningScheme with two modes:
- DIVIDE_BY_NUMBER_OF_CHUNKS
- DIVIDE_BY_CHUNK_SIZE
Extended BlockPartition and IndexPartition templates to accept the partitioning scheme as a template parameter.
Updated partition constructors to accept a generic N (interpreted as chunk count or chunk size depending on scheme).
Added global tuning knobs:
- ParallelUtilitiesMaxChunkSize (default 1024)
- ParallelUtilitiesMaxNumberOfChunks (default Globals::MaxAllowedThreads, then adapted at init)
Added env-based runtime configuration in InitializeNumberOfThreads():
- KRATOS_PARALLEL_MAX_CHUNK_SIZE
- KRATOS_PARALLEL_MAX_CHUNKS
If KRATOS_PARALLEL_MAX_CHUNKS is not set, default is adjusted to min(4 * num_threads, ParallelUtilitiesMaxNumberOfChunks).
Set OpenMP loop scheduling to schedule(dynamic) in all relevant loops in parallel_utilities.h.

📒Notes

schedule(dynamic) is intentionally kept hardcoded for now based on benchmark results; switching to schedule(runtime) can be done later if runtime policy control is preferred.

😅 TODO

Benchmark in different OS
Benchmark in different CPU

🆕 Changelog

…elUtilities` and consider `dynamic` schedulling

…d-local storage tests

…nd thread-local storage tests" This reverts commit feacdda.

…performance

…ased on thread count

…lobals::MaxAllowedThreads`

…mproved performance" This reverts commit 9768af1.

…upport for parallel utilities

loumalouomega · 2026-03-10T14:23:50Z

Maybe this conflicts with C++ implementation

…rallelUtilities

…tilities; add accessors and mutators

… performance measurement

…ource files

…osMultiphysics/Kratos into core/dynamic-scheduling-omp

…replace fixed-size arrays with vectors for improved flexibility

…ZE for BlockPartition and related functions

…fined behavior

…osMultiphysics/Kratos into core/dynamic-scheduling-omp

…tlySuite

loumalouomega added 8 commits March 9, 2026 16:49

[Core] Add environment variable support for maximum chunks in `Parall…

f9c634f

…elUtilities` and consider `dynamic` schedulling

Increase benchmark input sizes for vector power, reduction, and threa…

feacdda

…d-local storage tests

Revert "Increase benchmark input sizes for vector power, reduction, a…

e69b997

…nd thread-local storage tests" This reverts commit feacdda.

[Core] Change OpenMP scheduling from dynamic to runtime for improved …

9768af1

…performance

Set default value for maximum chunks in InitializeNumberOfThreads b…

76f380a

…ased on thread count

[Core] Update default maximum chunks in ParallelUtilities to use `G…

b75eefc

…lobals::MaxAllowedThreads`

Revert "[Core] Change OpenMP scheduling from dynamic to runtime for i…

964a854

…mproved performance" This reverts commit 9768af1.

[Core] Introduce chunk partitioning scheme and environment variable s…

c5e258a

…upport for parallel utilities

loumalouomega requested a review from pooyan-dadvand March 10, 2026 13:44

loumalouomega added Kratos Core Performance Parallel-SMP Shared memory parallelism with OpenMP or C++ Threads labels Mar 10, 2026

loumalouomega added 4 commits March 10, 2026 17:33

[Core] Add parameter for chunk size to block_for_each functions in Pa…

698f45a

…rallelUtilities

[Core] Refactor chunk size and number of chunks handling in ParallelU…

f306f04

…tilities; add accessors and mutators

[Core] Add thread count retrieval in benchmark functions for improved…

6c30cff

… performance measurement

Merge branch 'master' into core/dynamic-scheduling-omp

8c05ee6

loumalouomega changed the title ~~[Core] Improve OpenMP loop scheduling and chunk partitioning configurability in ParallelUtilities~~ [Core][DO NOT MERGE] Improve OpenMP loop scheduling and chunk partitioning configurability in ParallelUtilities Mar 11, 2026

loumalouomega and others added 12 commits March 11, 2026 13:09

Fix formatting of license comments in parallel utilities header and s…

ea6364d

…ource files

Merge branch 'core/dynamic-scheduling-omp' of https://github.com/Krat…

2971d0a

…osMultiphysics/Kratos into core/dynamic-scheduling-omp

[Core] Update chunk size and memory allocation in ParallelUtilities; …

38d0874

…replace fixed-size arrays with vectors for improved flexibility

[Core] Change default chunk partitioning scheme to DIVIDE_BY_CHUNK_SI…

859ce4c

…ZE for BlockPartition and related functions

Merge branch 'master' into core/dynamic-scheduling-omp

116a39e

Add checks for positive chunk sizes in BlockPartition to prevent unde…

632c24c

…fined behavior

Looks like python GIL wants to go slower

4b75f6a

Making GIL happy

c827fac

Moving to nightly

71c9c21

Merge branch 'core/dynamic-scheduling-omp' of https://github.com/Krat…

aee09e6

…osMultiphysics/Kratos into core/dynamic-scheduling-omp

Consistency

7b85aca

Reorganize test cases: move MPC contact tests from smallSuite to nigh…

98ee14f

…tlySuite

Revert

f8f6f02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core][DO NOT MERGE] Improve OpenMP loop scheduling and chunk partitioning configurability in `ParallelUtilities`#14273

[Core][DO NOT MERGE] Improve OpenMP loop scheduling and chunk partitioning configurability in `ParallelUtilities`#14273
loumalouomega wants to merge 25 commits intomasterfrom
core/dynamic-scheduling-omp

loumalouomega commented Mar 10, 2026 •

edited

Loading

Uh oh!

loumalouomega commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loumalouomega commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🚀Benchmarking

Intel® Core™ Ultra 9 Processor 285HX in Windows 11

Intel® Core™ Ultra 9 Processor 285HX in Ubuntu 24.04 via WSL

🔀 What changed

📒Notes

😅 TODO

🆕 Changelog

Uh oh!

loumalouomega commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

loumalouomega commented Mar 10, 2026 •

edited

Loading