Skip to content

Adding different types of parallelism to the elementwise layer#222

Merged
aobolensk merged 51 commits into
mainfrom
AndreySorokin7/Add_parall_ew_layer
Dec 9, 2025
Merged

Adding different types of parallelism to the elementwise layer#222
aobolensk merged 51 commits into
mainfrom
AndreySorokin7/Add_parall_ew_layer

Conversation

@AndreySorokin7

Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread include/layers/EWLayer.hpp Outdated
EWLayerImpl() = delete;
EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F,
float beta = 0.0F);
float beta = 0.0F, int type_parall = 0);

@allnes allnes Nov 8, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a strongly-typed backend enum instead of int for readability and safety.

enum class ParBackend { Seq = 0, Threads = 1, TBB = 2, OMP = 3 };

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagate ParBackend through API instead of raw int

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread include/layers/EWLayer.hpp Outdated
Comment on lines +75 to +81
int available_threads = -1;
if (type_parall_ == 0) available_threads = 1;
if (type_parall_ == 1)
available_threads = std::thread::hardware_concurrency();
if (type_parall_ == 2)
available_threads = oneapi::tbb::info::default_concurrency();
if (type_parall_ == 3) available_threads = omp_get_max_threads();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap common function for getting thread number

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread include/layers/Layer.hpp Outdated
@@ -1,5 +1,11 @@
#pragma once
#include <omp.h>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard the OpenMP/TBB includes and add ; otherwise non-OpenMP builds fail.

#ifdef HAS_OPENMP
#include <omp.h>
#endif
#include <thread>
#ifdef HAS_TBB
#include <oneapi/tbb/blocked_range.h>
#include <oneapi/tbb/parallel_for.h>
#include <oneapi/tbb/info.h>
#endif

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread include/layers/EWLayer.hpp Outdated
EWLayerImpl() = delete;
EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F,
float beta = 0.0F);
float beta = 0.0F, int type_parall = 0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagate ParBackend through API instead of raw int

Comment thread include/layers/Layer.hpp Outdated
};

template <typename Func>
inline void parallel_for(int count, Func func, int mode = 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline void parallel_for(int count, Func func, int mode = 0) {
inline void parallel_for(int count, Func func, int mode = 0) {
if (count <= 0) return;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread include/layers/Layer.hpp Outdated
@@ -1,5 +1,11 @@
#pragma once
#include <omp.h>

@allnes allnes Nov 9, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move all backend headers (OpenMP/TBB/Threads) and implementation details into a small parallel module. Expose a single, inline header API so call sites incur no extra call/indirection.

  • include/parallel/parallel.hpp (inline API)
  • include/parallel/backends.hpp (backend helpers; guarded includes)
  • No <omp.h>/TBB headers leaking into layer headers.

Example:

// include/parallel/parallel.hpp
#pragma once
#include <cstddef>

enum class ParBackend { Auto, Seq, Threads, TBB, OMP };

struct ParOptions {
  ParBackend backend = ParBackend::Auto;
  int max_threads = 0;         // 0 = runtime default
  std::size_t min_parallel_n = 4096; // small tasks stay sequential
  std::size_t grain = 1024;    // backend-specific chunk hint
};

// Header-only: one branch + inlined backend
template <class F>
inline void parallel_for(std::size_t n, F&& f, const ParOptions& opt) {
  if (n == 0) return;
  const ParBackend b = select_backend(opt, n); // inline, cheap
  switch (b) {
    case ParBackend::Seq:     return impl_seq(n, f);
    case ParBackend::Threads: return impl_threads(n, f, opt);
    case ParBackend::TBB:     return impl_tbb(n, f, opt);
    case ParBackend::OMP:     return impl_omp(n, f, opt);
    case ParBackend::Auto:    return impl_seq(n, f); // unreachable
  }
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not add auto

Comment thread include/layers/Layer.hpp Outdated
@@ -1,5 +1,11 @@
#pragma once
#include <omp.h>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid re-evaluating “Auto” logic every call. Resolve once (feature flags + environment + problem size) and cache in the layer/context.

// Called once per layer or first use
inline ParBackend resolve_auto_once(const ParOptions& opt, std::size_t n) noexcept {
#if defined(HAS_OMP)
  if (n >= opt.min_parallel_n) return ParBackend::OMP;
#elif defined(HAS_TBB)
  if (n >= opt.min_parallel_n) return ParBackend::TBB;
#elif defined(HAS_THREADS)
  if (n >= opt.min_parallel_n) return ParBackend::Threads;
#endif
  return ParBackend::Seq;
}

inline ParBackend select_backend(const ParOptions& opt, std::size_t n) noexcept {
  if (opt.backend != ParBackend::Auto) return opt.backend;
  static ParBackend cached = resolve_auto_once(opt, n); // or store in the layer
  return cached;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@aobolensk aobolensk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we can leave remaining solution basically as is. The problem with OpenMP slowdown is actually reproducible, but I suggest to focus on parallel_for itself. Anyway, this effect is not that visible on matrix multiplication workloads. For further investigation we will take a look at the compilation details (which code it has been lowered to). For now we can proceed as is

@codecov

codecov Bot commented Dec 1, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.00000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.18%. Comparing base (752c273) to head (d9f3d13).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
include/parallel/backends.hpp 80.00% 3 Missing and 5 partials ⚠️
include/parallel/parallel.hpp 85.00% 0 Missing and 3 partials ⚠️
include/layers/EWLayer.hpp 96.87% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   83.09%   83.18%   +0.09%     
==========================================
  Files          44       46       +2     
  Lines        2271     2349      +78     
  Branches     1349     1397      +48     
==========================================
+ Hits         1887     1954      +67     
- Misses        187      190       +3     
- Partials      197      205       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AndreySorokin7 AndreySorokin7 requested a review from allnes December 1, 2025 09:26
@AndreySorokin7

Copy link
Copy Markdown
Collaborator Author
image

Comment thread CMakeLists.txt Outdated
if (NOT WIN32)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -Werror")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Werror")
# if (NOT WIN32)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, revert. Flags are still required

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

70c83f1

I tried to uncomment them, but then I get errors

Comment thread test/single_layer/test_ewlayer.cpp Outdated
end = std::chrono::high_resolution_clock::now();
total_duration =
std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << "TBB notmatrix: " << total_duration.count() << " ms"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, clean up prints

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

AndreySorokin7 added 2 commits December 2, 2025 11:46
@allnes

allnes commented Dec 5, 2025

Copy link
Copy Markdown
Member

Proposed fix for the Linux OpenMP failure:

if(NOT WIN32)
    find_package(OpenMP)
endif()

if(OpenMP_FOUND)
    message(STATUS "OpenMP found - enabling parallel support")
    add_compile_definitions(HAS_OPENMP)
    link_libraries(OpenMP::OpenMP_CXX)
else()
    message(STATUS "OpenMP not found - parallel features disabled")
endif()

Notes:

  • Drop the forced CMAKE_BUILD_TYPE=Release so CI Debug matrix keeps working.
  • Linking OpenMP::OpenMP_CXX at the top level propagates -fopenmp to all targets, eliminating the #pragma omp unknown-pragmas error under -Werror.

@aobolensk aobolensk merged commit c115709 into main Dec 9, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants