CNF-23818: machineconfigpools: avoid RTE pods duplication when single node matching multiple MCPs by Tal-or · Pull Request #3995 · openshift-kni/numaresources-operator

Tal-or · 2026-05-12T07:50:32Z

Problem Statement
When a node's labels match multiple MachineConfigPools (e.g. a custom pool that extends the worker role), the current DaemonSet's NodeSelector — which copies the MCP's own nodeSelector — causes multiple RTE DaemonSets to target the same node, resulting in duplicate pods.

Root Cause
The MCP nodeSelectors are inherently overlapping: for example, a node with both node-role.kubernetes.io/worker and node-role.kubernetes.io/workerB labels matches the worker MCP and the workerB MCP simultaneously. Using those selectors directly as DaemonSet's NodeSelectors guarantees
collisions.

Suggested Approach
Place a single, NRO-owned label on each node that encodes its primary pool. The DaemonSet's NodeSelector then targets this label instead of the MCP's nodeSelector, making placement 1:1 by construction.

We reuse MCO's own priority logic (master > custom > worker) to determine each node's primary pool, so the behavior is consistent with how MCO itself decides under what MCP the node belongs.

NOTE:
This is not relevant for HyperShift since on HyperShift we're setting the DaemonSet's NodeSelector as the NodePool name which are exclusive for each node, meaning a node can't have more that a single NodePool label.

AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak titzhak@redhat.com

openshift-ci · 2026-05-12T07:50:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Tal-or

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Tal-or]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-12T07:52:49Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces a primary pool labeling system for nodes and integrates it into the NUMA resources operator controller. Nodes receive a label identifying their primary MachineConfigPool using MCO priority rules, DaemonSet selectors migrate to use that label, and the controller watches nodes to maintain labels during reconciliation while cleaning them up on CR deletion.

Changes

Node Primary Pool Labeling System

Layer / File(s)	Summary
Label key definitions `internal/api/labels/labels.go`	Exports constants `NodePrimaryPool` and `NodeFinalizer` used throughout the labeling and finalizer system.
MachineConfigPool classification library `pkg/machineconfigpools/pools.go`, `pkg/machineconfigpools/pools_test.go`	New package exports functions to classify nodes against MachineConfigPools using label selectors and MCO priority rules (master > custom > worker). Tests cover pool matching, priority precedence, multiple custom pool rejection, and edge cases.
Node labeling implementation `internal/nodes/labels.go`, `internal/nodes/labels_test.go`	Implements `LabelForTrees` to label nodes based on their primary MachineConfigPool and remove stale labels, plus `RemoveAllLabels` for finalizer cleanup. Tests verify primary pool selection, stale label removal, unmanaged pool filtering, platform-specific behavior (HyperShift no-op), and idempotence.
DaemonSet pod selector migration `pkg/objectstate/rte/machineconfigpool.go`	DaemonSet pod `nodeSelector` now unconditionally uses the primary pool label instead of copying MCP node selector, removing dependency on MCP having a valid selector.
Controller node watching and reconciliation `internal/controller/numaresourcesoperator_controller.go`, `config/rbac/role.yaml`	Controller watches Node resources and triggers reconciliation on node label changes. During reconciliation, nodes are labeled via `LabelForTrees`; during CR deletion, the finalizer handler calls `RemoveAllLabels`. RBAC updated to grant node get/list/watch/patch permissions.
E2E assertions `test/e2e/install/install_test.go`	Adds OpenShift-only e2e checks validating the DaemonSet NodeSelector uses the primary-pool label, labeled nodes exist and have non-empty values, and labels are removed after operator deletion; includes helper to list nodes by the label.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main objective: avoiding RTE pod duplication when nodes match multiple MCPs by using primary pool labeling.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description clearly relates to the changeset, explaining the problem of duplicate RTE pods when nodes match multiple MachineConfigPools and detailing the proposed solution of adding NRO-owned labels to encode primary pool membership.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

internal/controller/numaresourcesoperator_controller_test.go (1)

77-97: ⚡ Quick win

Consider enhancing the fake-node helper to handle NodeSelector.MatchExpressions as a defensive measure.

The helper currently only copies MatchLabels from MCPs. While all current test fixtures use only MatchLabels, the function would miss MatchExpressions if they're added in the future, potentially making tests brittle. Adding support for both selector types would make the helper more resilient to changes in MCP definitions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/controller/numaresourcesoperator_controller.go`:
- Around line 867-883: In filterTreesForPrimaryPools, avoid calling
mcpools.HasNodesForPool repeatedly for the same MachineConfigPool across trees:
precompute a map[string]bool (and map[string]error if you want to preserve error
logging) by iterating all unique MCPs from the input (use mcp.Name as the key)
and calling HasNodesForPool once per unique pool, then in the existing loop over
tree.MachineConfigPools consult that map to decide whether to append or log;
keep the current error logging behavior (klog.V(...).InfoS) when the precomputed
call returned an error and reuse the stored bool result for has/no-has decisions
instead of re-invoking HasNodesForPool.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 55d876f5-586b-48be-b43d-bc26fe315d2b

📥 Commits

Reviewing files that changed from the base of the PR and between bd67f71 and 9b1247f.

📒 Files selected for processing (4)

internal/controller/numaresourcesoperator_controller.go
internal/controller/numaresourcesoperator_controller_test.go
pkg/machineconfigpools/pools.go
pkg/machineconfigpools/pools_test.go

ffromani

skimmed over it, it seems a good idea worth looking into

openshift-ci-robot · 2026-05-14T07:01:19Z

@Tal-or: This pull request references CNF-23818 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the ticket to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Add a new package that replicates the Machine Config Operator's logic for
determining the primary MachineConfigPool for a given node. MCO uses a
priority-based algorithm (master > custom > worker) to assign exactly one
primary pool per node, even when a node's labels match multiple MCP selectors.

This is important to avoid RTE pods duplication when the node has labels that are matching multiple MCPs.

AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak titzhak@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Tal-or · 2026-05-19T07:41:11Z

/retest

Tal-or · 2026-05-28T11:46:56Z

/hold

This requires more thinking, the existing approach is not good enough

openshift-ci-robot · 2026-05-31T16:30:37Z

@Tal-or: This pull request references CNF-23818 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the ticket to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Problem Statement
When a node's labels match multiple MachineConfigPools (e.g. a custom pool that extends the worker role), the current DaemonSet's NodeSelector — which copies the MCP's own nodeSelector — causes multiple RTE DaemonSets to target the same node, resulting in duplicate pods.

Root Cause
The MCP nodeSelectors are inherently overlapping: for example, a node with both node-role.kubernetes.io/worker and node-role.kubernetes.io/workerB labels matches the worker MCP and the workerB MCP simultaneously. Using those selectors directly as DaemonSet's NodeSelectors guarantees
collisions.

Suggested Approach
Place a single, NRO-owned label on each node that encodes its primary pool. The DaemonSet's NodeSelector then targets this label instead of the MCP's nodeSelector, making placement 1:1 by construction.

We reuse MCO's own priority logic (master > custom > worker) to determine each node's primary pool, so the behavior is consistent with how MCO itself decides under what MCP the node belongs.

AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak titzhak@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Introduce internal/api/labels with the NRO-owned node label (numa-operator.openshift.io/primary-pool) and finalizer constants. Add pkg/machineconfigpools with MCO-compatible pool-selection logic (ListPools, GetPoolsForNode, GetPrimaryPoolForNode, GetNodesForPool, HasNodesForPool) that determines each node's primary MCP following MCO's master > custom > worker priority. These building blocks are used by the node-labeling logic (next commit) to ensure each node gets exactly one DaemonSet. AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

Tal-or · 2026-05-31T17:21:38Z

/jira refresh

openshift-ci-robot · 2026-05-31T17:21:42Z

@Tal-or: This pull request references CNF-23818 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-31T17:28:46Z

@Tal-or: This pull request references CNF-23818 which is a valid jira issue.

Details

In response to this:

Problem Statement
When a node's labels match multiple MachineConfigPools (e.g. a custom pool that extends the worker role), the current DaemonSet's NodeSelector — which copies the MCP's own nodeSelector — causes multiple RTE DaemonSets to target the same node, resulting in duplicate pods.

Root Cause
The MCP nodeSelectors are inherently overlapping: for example, a node with both node-role.kubernetes.io/worker and node-role.kubernetes.io/workerB labels matches the worker MCP and the workerB MCP simultaneously. Using those selectors directly as DaemonSet's NodeSelectors guarantees
collisions.

Suggested Approach
Place a single, NRO-owned label on each node that encodes its primary pool. The DaemonSet's NodeSelector then targets this label instead of the MCP's nodeSelector, making placement 1:1 by construction.

We reuse MCO's own priority logic (master > custom > worker) to determine each node's primary pool, so the behavior is consistent with how MCO itself decides under what MCP the node belongs.

NOTE:
This is not relevant for HyperShift since on HyperShift we're setting the DaemonSet's NodeSelector as the NodePool name which are exclusive for each node, meaning a node can't have more that a single NodePool label.

AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak titzhak@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Tal-or · 2026-06-02T08:16:11Z

/retest

AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

When a node's MCO primary pool is not in the NRO NodeGroups (e.g. a custom MCP like workerB), fall back to the best matching managed pool (e.g. worker) instead of silently dropping the node from RTE coverage. This preserves the previous behavior where all nodes matching a managed MCP's NodeSelector receive an RTE pod. AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

…aemonSet targeting AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

…labeling AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>

AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

openshift-ci · 2026-06-02T11:46:18Z

@Tal-or: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ci-install-e2e-compact	`183d387`	link	false	`/test ci-install-e2e-compact`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Tal-or · 2026-06-02T13:09:19Z

CI E2E - for tools depends on #4311

openshift-ci Bot requested review from shajmakh and swatisehgal May 12, 2026 07:50

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread internal/controller/numaresourcesoperator_controller.go Outdated

ffromani reviewed May 12, 2026

View reviewed changes

Comment thread pkg/machineconfigpools/pools.go

Comment thread internal/controller/numaresourcesoperator_controller_test.go Outdated

Tal-or changed the title ~~machineconfigpools: avoid RTE pods duplication when single node matching multiple MCPs~~ CNF-23818: machineconfigpools: avoid RTE pods duplication when single node matching multiple MCPs May 14, 2026

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2026

Tal-or force-pushed the primary_pool_selection branch from 9b1247f to 64907fe Compare May 31, 2026 16:22

Tal-or force-pushed the primary_pool_selection branch 3 times, most recently from ea23eca to e608a4f Compare May 31, 2026 17:21

Tal-or force-pushed the primary_pool_selection branch from e608a4f to 15fb24e Compare June 1, 2026 08:53

Tal-or force-pushed the primary_pool_selection branch from 15fb24e to 049d1a4 Compare June 2, 2026 08:18

Tal-or added 5 commits June 2, 2026 12:10

internal/nodes: implement primary-pool node labeling

eed5442

AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

controller: integrate node labeling, finalizer, and NRO label-based D…

ef998cc

…aemonSet targeting AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

rbac: grant update on NUMAResourcesOperator and node permissions for …

5af07c9

…labeling AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>

e2e: verify NRO primary-pool node labels on deploy and cleanup on delete

183d387

AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>

Tal-or force-pushed the primary_pool_selection branch from 049d1a4 to 183d387 Compare June 2, 2026 10:24

Conversation

Tal-or commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented May 14, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tal-or commented May 19, 2026

Uh oh!

Tal-or commented May 28, 2026

Uh oh!

openshift-ci-robot commented May 31, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tal-or commented May 31, 2026

Uh oh!

openshift-ci-robot commented May 31, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented May 31, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tal-or commented Jun 2, 2026

Uh oh!

openshift-ci Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tal-or commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tal-or commented May 12, 2026 •

edited

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

openshift-ci-robot commented May 14, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 31, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 31, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 31, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci Bot commented Jun 2, 2026 •

edited

Loading