CNF-23818: machineconfigpools: avoid RTE pods duplication when single node matching multiple MCPs#3995
CNF-23818: machineconfigpools: avoid RTE pods duplication when single node matching multiple MCPs#3995Tal-or wants to merge 6 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Tal-or The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces a primary pool labeling system for nodes and integrates it into the NUMA resources operator controller. Nodes receive a label identifying their primary MachineConfigPool using MCO priority rules, DaemonSet selectors migrate to use that label, and the controller watches nodes to maintain labels during reconciliation while cleaning them up on CR deletion. ChangesNode Primary Pool Labeling System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/controller/numaresourcesoperator_controller_test.go (1)
77-97: ⚡ Quick winConsider enhancing the fake-node helper to handle
NodeSelector.MatchExpressionsas a defensive measure.The helper currently only copies
MatchLabelsfrom MCPs. While all current test fixtures use onlyMatchLabels, the function would missMatchExpressionsif they're added in the future, potentially making tests brittle. Adding support for both selector types would make the helper more resilient to changes in MCP definitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/controller/numaresourcesoperator_controller.go`:
- Around line 867-883: In filterTreesForPrimaryPools, avoid calling
mcpools.HasNodesForPool repeatedly for the same MachineConfigPool across trees:
precompute a map[string]bool (and map[string]error if you want to preserve error
logging) by iterating all unique MCPs from the input (use mcp.Name as the key)
and calling HasNodesForPool once per unique pool, then in the existing loop over
tree.MachineConfigPools consult that map to decide whether to append or log;
keep the current error logging behavior (klog.V(...).InfoS) when the precomputed
call returned an error and reuse the stored bool result for has/no-has decisions
instead of re-invoking HasNodesForPool.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 55d876f5-586b-48be-b43d-bc26fe315d2b
📒 Files selected for processing (4)
internal/controller/numaresourcesoperator_controller.gointernal/controller/numaresourcesoperator_controller_test.gopkg/machineconfigpools/pools.gopkg/machineconfigpools/pools_test.go
ffromani
left a comment
There was a problem hiding this comment.
skimmed over it, it seems a good idea worth looking into
|
@Tal-or: This pull request references CNF-23818 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the ticket to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/hold This requires more thinking, the existing approach is not good enough |
9b1247f to
64907fe
Compare
|
@Tal-or: This pull request references CNF-23818 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the ticket to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Introduce internal/api/labels with the NRO-owned node label (numa-operator.openshift.io/primary-pool) and finalizer constants. Add pkg/machineconfigpools with MCO-compatible pool-selection logic (ListPools, GetPoolsForNode, GetPrimaryPoolForNode, GetNodesForPool, HasNodesForPool) that determines each node's primary MCP following MCO's master > custom > worker priority. These building blocks are used by the node-labeling logic (next commit) to ensure each node gets exactly one DaemonSet. AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>
ea23eca to
e608a4f
Compare
|
/jira refresh |
|
@Tal-or: This pull request references CNF-23818 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@Tal-or: This pull request references CNF-23818 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
e608a4f to
15fb24e
Compare
|
/retest |
15fb24e to
049d1a4
Compare
AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>
When a node's MCO primary pool is not in the NRO NodeGroups (e.g. a custom MCP like workerB), fall back to the best matching managed pool (e.g. worker) instead of silently dropping the node from RTE coverage. This preserves the previous behavior where all nodes matching a managed MCP's NodeSelector receive an RTE pod. AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>
…aemonSet targeting AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>
…labeling AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>
AIA: Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <titzhak@redhat.com>
049d1a4 to
183d387
Compare
|
@Tal-or: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
CI E2E - for tools depends on #4311 |
Problem Statement
When a node's labels match multiple
MachineConfigPools(e.g. a custom pool that extends the worker role), the currentDaemonSet'sNodeSelector— which copies the MCP's own nodeSelector — causes multiple RTEDaemonSetsto target the same node, resulting in duplicate pods.Root Cause
The MCP nodeSelectors are inherently overlapping: for example, a node with both
node-role.kubernetes.io/workerandnode-role.kubernetes.io/workerBlabels matches the worker MCP and the workerB MCP simultaneously. Using those selectors directly asDaemonSet'sNodeSelectorsguaranteescollisions.
Suggested Approach
Place a single, NRO-owned label on each node that encodes its primary pool. The
DaemonSet'sNodeSelectorthen targets this label instead of the MCP's nodeSelector, making placement 1:1 by construction.We reuse MCO's own priority logic (master > custom > worker) to determine each node's primary pool, so the behavior is consistent with how MCO itself decides under what MCP the node belongs.
NOTE:
This is not relevant for HyperShift since on HyperShift we're setting the
DaemonSet'sNodeSelectoras theNodePoolname which are exclusive for each node, meaning a node can't have more that a singleNodePoollabel.AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak titzhak@redhat.com