ENH: PCA CPU SPMD support by DDJHB · Pull Request #3585 · uxlfoundation/oneDAL

DDJHB · 2026-03-29T13:09:27Z

Description

This PR adds distributed (SPMD) training support for PCA on CPU using the covariance method. Previously, the PCA SPMD path was only available on GPU (via SYCL/DPC++); this change enables it for CPU-only environments using MPI or oneCCL communicators.

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.

CI : http://intel-ci.intel.com/f143b984-b8cb-f1f5-949b-a4bf010d0e2d

The failure in CI (LinuxSklearnex Python311) step is in IncrementalLinearRegression test.
It should not be related to the changes in PCA.

I have extended testing suite if new functionality was introduced in this PR.

New examples are added

Performance

not applicable

DDJHB · 2026-03-29T13:15:13Z

/intelci: run

Fixes the ABI breakage introduced by CPU SPMD algorithms in the PRs #3507, #3585 Make the binding type of the destructors (~compute_input(), ~partial_compute_input(), ~train_input(), etc.) stably "WEAK" and not dependent on the compiler's logic by adding their non-default implementations. Also fixes `set_responses` method in KNN.

Fixes the ABI breakage introduced by CPU SPMD algorithms in the PRs #3507, #3585 Make the binding type of the destructors (~compute_input(), ~partial_compute_input(), ~train_input(), etc.) stably "WEAK" and not dependent on the compiler's logic by adding their non-default implementations. Also fixes `set_responses` method in KNN. (cherry picked from commit 0a5f4cb)

Fixes the ABI breakage introduced by CPU SPMD algorithms in the PRs #3507, #3585 Make the binding type of the destructors (~compute_input(), ~partial_compute_input(), ~train_input(), etc.) stably "WEAK" and not dependent on the compiler's logic by adding their non-default implementations. Also fixes `set_responses` method in KNN. (cherry picked from commit 0a5f4cb) Co-authored-by: Victoriya Fedotova <victoriya.s.fedotova@intel.com>

Vika-F · 2026-04-09T14:31:33Z

/intelci: run

Copilot

Pull request overview

Adds CPU-side distributed (SPMD) PCA training support (covariance method) and enables the existing SPMD PCA integration test on CPU, alongside new MPI/oneCCL sample programs demonstrating the feature.

Changes:

Enable PCA SPMD integration testing on CPU by removing the CPU skip in the SPMD test.
Route PCA CPU training dispatch through a universal (single-node + SPMD) kernel path.
Implement CPU SPMD covariance-path aggregation (allreduce of partials) and add MPI/CCL PCA distributed samples.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
samples/oneapi/cpp/mpi/sources/pca_distr_mpi.cpp	New MPI sample demonstrating distributed PCA training on CPU.
samples/oneapi/cpp/ccl/sources/pca_distr_ccl.cpp	New oneCCL sample demonstrating distributed PCA training on CPU.
cpp/oneapi/dal/algo/pca/test/spmd.cpp	Enables running PCA SPMD integration test on CPU.
cpp/oneapi/dal/algo/pca/detail/train_ops.cpp	Switches PCA CPU dispatch to a universal SPMD-capable kernel dispatcher.
cpp/oneapi/dal/algo/pca/backend/cpu/train_kernel_cov.cpp	Adds CPU SPMD aggregation logic for covariance-based PCA training.
cpp/oneapi/dal/algo/pca/backend/cpu/finalize_train_kernel_cov.cpp	Adjusts covariance/correlation finalize behavior and variance extraction logic.

Copilot · 2026-04-10T12:43:03Z

+    /// Use a local (non-SPMD) context so the finalize kernel does NOT
+    /// dispatch to the SPMD path — the data is already aggregated.
+    const context_cpu local_ctx;
+    return pca::backend::finalize_train_kernel_cpu<Float, method::cov, task_t>{}(local_ctx,


context_cpu local_ctx; drops the CPU-extension configuration carried by the incoming ctx (e.g., from spmd_host_policy::get_local()), so CPU dispatch in finalize_train_kernel_cpu may use different ISA settings than the rest of the training call. Consider calling the finalize kernel with the original ctx (it does not need additional collectives), or otherwise ensure the same enabled CPU extensions are preserved when creating a local context.

Suggested change

/// Use a local (non-SPMD) context so the finalize kernel does NOT

/// dispatch to the SPMD path — the data is already aggregated.

const context_cpu local_ctx;

return pca::backend::finalize_train_kernel_cpu<Float, method::cov, task_t>{}(local_ctx,

/// Finalize on the already aggregated local data while preserving

/// the CPU-extension configuration carried by the original context.

return pca::backend::finalize_train_kernel_cpu<Float, method::cov, task_t>{}(ctx,

Copilot · 2026-04-10T12:43:04Z

+            const auto cp =
+                row_accessor<const Float>(input.get_partial_crossproduct()).pull({ 0, -1 });
+            const Float inv_nm1 = Float(1) / (row_count - 1);
+            Float* vars = arr_vars.get_mutable_data();


In the zscore branch, inv_nm1 is computed as 1 / (row_count - 1). For row_count == 1 this will divide by zero and propagate inf/NaN into variances. Please add an explicit guard (e.g., require row_count > 1 for z-score normalization, or define the intended behavior for the single-row case and handle it here).

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Vika-F · 2026-04-29T10:51:03Z

/intelci: run

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-29T10:57:25Z

+    auto nobs_nd = pr::table2ndarray<Float>(nobs_local);
+    auto crossproduct_nd = pr::table2ndarray<Float>(crossproduct_local);
+    auto sums_nd = pr::table2ndarray<Float>(sums_local);
+
+    auto nobs_ary = dal::array<Float>::wrap(nobs_nd.get_mutable_data(), nobs_nd.get_count());
+    auto crossproduct_ary =
+        dal::array<Float>::wrap(crossproduct_nd.get_mutable_data(), crossproduct_nd.get_count());
+    auto sums_ary = dal::array<Float>::wrap(sums_nd.get_mutable_data(), sums_nd.get_count());


The arrays created with dal::array<Float>::wrap(...get_mutable_data(), ...) are non-owning (empty_delete). Since pr::table2ndarray() allocates and owns a temporary buffer, those buffers are freed when nobs_nd/crossproduct_nd/sums_nd go out of scope, leaving nobs_table/crossproduct_table/sums_table (and the returned aggregated partial result) with dangling pointers. Use an owning dal::array for the reduced data (e.g., pull into dal::array variables that outlive the tables, or allocate/copy into dal::array::empty/zeros and reduce those), and wrap the owning arrays into tables.

Copilot · 2026-04-29T10:57:26Z

+    /// Compute partial crossproduct, sums, and nobs on each rank
+    partial_train_input<task_t> partial_input({}, data);
+    auto partial_result =
+        pca::backend::partial_train_kernel_cpu<Float, method::cov, task_t>{}(ctx,
+                                                                             desc,


params is passed into call_daal_spmd_kernel() but never used. Beyond the unused-parameter warning risk, this means SPMD CPU training ignores train_parameters hyperparameters that are applied in the single-node path via convert_parameters(params). Either plumb params through the partial/finalize flow (so hyperparameters affect SPMD as well) or explicitly mark/justify it as unused and ensure behavior is consistent by design.

Suggested change

/// Compute partial crossproduct, sums, and nobs on each rank

partial_train_input<task_t> partial_input({}, data);

auto partial_result =

pca::backend::partial_train_kernel_cpu<Float, method::cov, task_t>{}(ctx,

desc,

/// Compute partial crossproduct, sums, and nobs on each rank.

/// Forward train parameters to keep SPMD behavior consistent with

/// the single-node path.

partial_train_input<task_t> partial_input({}, data);

auto partial_result =

pca::backend::partial_train_kernel_cpu<Float, method::cov, task_t>{}(ctx,

desc,

params,

Copilot · 2026-04-29T10:57:26Z

+            const auto cp =
+                row_accessor<const Float>(input.get_partial_crossproduct()).pull({ 0, -1 });
+            const Float inv_nm1 = (row_count > 1) ? Float(1) / (row_count - 1) : Float(0);
+            Float* vars = arr_vars.get_mutable_data();
+            for (std::int64_t i = 0; i < column_count; ++i) {
+                vars[i] = cp[i * column_count + i] * inv_nm1;
+            }


In the zscore branch this pulls the entire crossproduct table into a temporary contiguous array just to read the diagonal (row_accessor<...>().pull({0,-1})). For large feature counts this is an O(p^2) host copy and may offset the intended performance gain. Since the partial crossproduct produced by the CPU kernels is a homogeneous table, consider reading the diagonal directly from its underlying buffer (or otherwise avoid materializing the full matrix) when computing variances.

DDJHB added 3 commits February 20, 2026 04:32

feat: cpu spmd pca base

2d1cbfd

Merge branch 'main' into dev_pca_cpu_spmd

9c8f375

feat: added samples

b49a151

DDJHB added the enhancement label Mar 29, 2026

Vika-F mentioned this pull request Apr 2, 2026

Fix *_input classes destructors visibility #3591

Merged

10 tasks

mergify Bot mentioned this pull request Apr 9, 2026

Fix *_input classes destructors visibility (backport #3591) #3601

Merged

10 tasks

Merge branch 'main' into dev_pca_cpu_spmd

dab2e5f

Vika-F marked this pull request as ready for review April 10, 2026 12:33

Vika-F requested review from Alexandr-Solovev, avolkov-intel, david-cortes-intel, ethanglaser and icfaust as code owners April 10, 2026 12:33

Copilot AI review requested due to automatic review settings April 10, 2026 12:33

Copilot started reviewing on behalf of Vika-F April 10, 2026 12:33 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Vika-F and others added 5 commits April 14, 2026 12:42

Change [rank_id] to .at(rank_id) in pca_distr_mpi.cpp sample

fcc6da1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Change [rank_id] to .at(rank_id) in pca_distr_ccl.cpp sample

36efa49

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'main' into dev_pca_cpu_spmd

237c4b3

Fix division by zero

39e3cc7

Merge branch 'main' into dev_pca_cpu_spmd

af4971e

Vika-F requested a review from Copilot April 29, 2026 10:51

Copilot started reviewing on behalf of Vika-F April 29, 2026 10:51 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Remove unicode chars

d0e5ad4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: PCA CPU SPMD support#3585

ENH: PCA CPU SPMD support#3585
DDJHB wants to merge 10 commits into
uxlfoundation:mainfrom
DDJHB:dev_pca_cpu_spmd

DDJHB commented Mar 29, 2026 •

edited by Vika-F

Loading

Uh oh!

DDJHB commented Mar 29, 2026

Uh oh!

Vika-F commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

Vika-F commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DDJHB commented Mar 29, 2026 • edited by Vika-F Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

DDJHB commented Mar 29, 2026

Uh oh!

Vika-F commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Vika-F commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DDJHB commented Mar 29, 2026 •

edited by Vika-F

Loading