CNF-23239: controller: nrop: per-tree processing and progress by ffromani · Pull Request #3998 · openshift-kni/numaresources-operator

ffromani · 2026-05-12T09:06:28Z

implement "vertical split" vs current "horizontal split"
the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. IOW, the slowest NodeGroup blocks all the other NodeGroups at the stage it is stuck at.

With the approach proposed here, "vertical split", each NodeGroup advances independently on the reconciliation step, and only the global reported status is set pulled to the slowest state; IOW, with N nodegroups, N-1 can be Available and settled and 1 still Progressing; the global state will still be Progressing, but changes to the other NodeGroups can happen independently.

To simplify the implementation, a single goroutine handles all the NodeGroup reconciliation, so factually a NodeGroup can still influence the others, but this (bar more serious bugs we should fix anyway) can only create reconciliation delays, a far better stance than the issues caused by the horizontal split.

backportability

This PR is broken down in commits which are intentionally shaped to make the backport down to the distant 4.18 as low risk as possible.
Compatibility layers are added and then removed in the context of the same PR explicitly and only in order
to enable gradual testing, not because their own merit.

review notice

Adding and removing code in the same PR makes the review harder, even though the final tree state is identical.
It may be useful to review the final tree state (e.g. git diff ..HEAD) in addition to individual commits, since the end result is identical to a cleaner, main-only series.
The most independent commits are intentionally pushed towards the beginning of the series to enable this review mode as much as possible.

LLM-assisted backport effort summary

Branch	Effort	Notes
4.22	minimal	Essentially clean cherry-picks
4.21	minimal	Essentially clean cherry-picks (after PR #4025 lands)
4.20	very low	Minor controller conflicts (after PR #3801 backport lands)
4.19	low	Minor conflicts; `rte.go` diverges more (after PR #3801 backport lands)
4.18	high	Path renames, no `MachineConfigsState`, `machineconfigpool.go` heavily diverged (see below)

4.18 effort breakdown

Commits	Effort	Nature
1 (Name() helper)	very low	Path adjustment: `api/v1/helper/` → `api/numaresourcesoperator/v1/helper/`
2-3 (conditioninfo, step helpers)	minimal	`step.go` identical; `conditioninfo.go` nearly identical
4 (move helper around)	low	Path adjustment: `internal/controller/` → `controllers/`
5 (rte + compat wrappers)	hard/very hard	Key refactoring; `machineconfigpool.go` is 134 lines (vs 244 on main), no `MachineConfigObjectState` type, no `MachineConfigsState` method; `rte.go` is 436 lines (vs 379) with 288 diff lines
6 (signature refactoring)	low	Path adjustment only
7 (add per-tree functions)	mid/hard	New code, but depends on compat wrappers from commit 5 being adapted first
8 (wire per-tree functions)	hard	Controller rewrite; building blocks from 5-7 must be proven first
9-10 (narrow helpers, rename)	very low	Mechanical signature changes
11 (remove compat wrappers)	mid	Adaptation for different `machineconfigpool.go` baseline
12 (vertical processing)	mid	Controller change; same pattern as main but different baseline

4.21/4.20/4.19 prerequisite

These branches require PR #3801 ("OCPBUGS-84226: per-pool MachineConfig state
with paused MCP awareness") to be backported first. PR #4025 covers 4.21;
4.20 and 4.19 need equivalent backports. Once that prerequisite lands,
these branches have the same MachineConfigObjectState type and
MachineConfigsState signature as main, making the cherry-picks
straightforward.

LLM-assisted risk assessment

The proposed split should significantly reduce backport risk compared to
a simpler, monolithic approach:

Commits 1-3 (Name helper, UpdateMessage, step helpers) are trivial
cherry-picks on all branches. Done and verified quickly.
Commit 4 (move helper) is a pure reorder, low risk.
Commit 5 (rte compat wrappers) is the key risk point — it refactors
the objectstate layer but adds compat wrappers so the controller continues
working unchanged. Existing controller tests validate the compat wrappers
without any controller changes.
Commit 6 (signature refactoring) is mechanical prep work.
Commit 7 (add per-tree functions) adds dead code. If it compiles,
it is correct.
Commit 8 (wire per-tree functions) is the core behavioral change.
By this point all building blocks (1-7) are proven and verified.
Commits 9-10 (narrow helpers, rename) are mechanical signature
narrowing, safe after commit 8 removed the old callers.
Commit 11 (remove compat wrappers) is mechanical cleanup.
Commit 12 (vertical processing) is the final behavioral change,
switching from horizontal to vertical orchestration.

The compat wrappers in commit 5 are the key risk reduction: they allow
validating the rte.go refactoring on any branch (including 4.18, the
hardest target) without touching the controller at all. Once that is
proven, the controller changes build on a solid foundation.

ffromani · 2026-05-12T09:06:37Z

/hold

openshift-ci · 2026-05-12T09:06:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-12T09:08:27Z

Warning

Review limit reached

@ffromani, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 46 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: bdc79f9e-53a2-429d-b272-e2835a2f1c62

📥 Commits

Reviewing files that changed from the base of the PR and between ef94726 and 5e69487.

📒 Files selected for processing (11)

api/v1/helper/nodegroup/nodegroup.go
api/v1/helper/nodegroup/nodegroup_test.go
internal/controller/numaresourcesoperator_controller.go
internal/controller/numaresourcesoperator_controller_test.go
internal/reconcile/step.go
internal/reconcile/step_test.go
pkg/objectstate/rte/machineconfigpool.go
pkg/objectstate/rte/machineconfigpool_test.go
pkg/objectstate/rte/rte.go
pkg/status/conditioninfo/conditioninfo.go
pkg/status/conditioninfo/conditioninfo_test.go

📝 Walkthrough

Walkthrough

This PR refactors the NUMAResourcesOperator controller from all-trees-at-once reconciliation to per-tree iteration. It separates tree-agnostic object state (RBAC, metrics) from per-tree state derivation, adds Step state predicates and message updates, and replaces multi-phase reconciliation with a single orchestration that applies tree-agnostic resources, then reconciles each tree and aggregates results.

Changes

Per-tree reconciliation refactor with tree-agnostic object state

Layer / File(s)	Summary
Tree naming utility `api/v1/helper/nodegroup/nodegroup.go`, `api/v1/helper/nodegroup/nodegroup_test.go`	`Tree.Name()` returns the associated MCP pool name with safe fallbacks (`<nil>`, `<unknown>`) for nil values.
Step state and message infrastructure `internal/reconcile/step.go`, `internal/reconcile/step_test.go`	`Ongoing()` checks if requeue is pending, `Failed()` checks for errors, and `UpdateMessage()` prepends messages to conditions while preserving result/error state.
Condition message updates `pkg/status/conditioninfo/conditioninfo.go`, `pkg/status/conditioninfo/conditioninfo_test.go`	`ConditionInfo.UpdateMessage()` overwrites/augments existing messages; differs from `WithMessage` which only sets when empty.
Object state tree-agnostic and per-tree separation `pkg/objectstate/rte/rte.go`	`FromClientTreeAgnostic()` replaces `FromClient()`, building core RBAC/metrics without trees. `TreeAgnostic()` replaces `State()`. New `PerTree()` and `PerTreeState()` methods derive per-tree instances on demand.
Machine config pool handling per-tree `pkg/objectstate/rte/machineconfigpool.go`, `pkg/objectstate/rte/machineconfigpool_test.go`	`MachineConfigsState()` now takes explicit `tree` parameter instead of iterating `em.trees` internally. Tests refactored to call per-tree instead of batch.
Controller per-tree reconciliation orchestration `internal/controller/numaresourcesoperator_controller.go`	Replaces multi-phase flow with per-tree orchestration: API reconciliation (early-stop), tree-agnostic setup, per-tree machine-config/daemonset sync, and aggregation of paused MCPs and readiness status into single `Step`.
Controller reconciliation test updates `internal/controller/numaresourcesoperator_controller_test.go`	Two MachineConfigPool entries during wait phase (instead of one), paused-condition checks replacing `Available` assertions, and new async convergence test verifying staged DaemonSet creation as trees become ready.

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'CNF-23239: controller: nrop: per-tree processing and progress' clearly and specifically summarizes the main change: implementing per-tree processing with independent progress tracking in the NROP controller.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description is directly related to the changeset, clearly explaining the shift from 'horizontal split' to 'vertical split' reconciliation and detailing the changes across multiple files and commits.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch per-tree-processing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Tal-or

I know it's still WIP but already adding my comments so they won't slip.

Tal-or · 2026-05-12T10:48:23Z

+	result.mcpStatuses = syncMachineConfigPoolNodeGroupConfigStatuses(result.mcpStatuses, singleTreeSlice)

-	return intreconcile.StepSuccess()
+	result.step = intreconcile.StepSuccess()


if we have pausedMCPNames, we should not set this StepSuccess() but StepOngoing() so controller wont continue to update DS for this specific tree

StepOngoing() wants to requeue the request though. I'm not 100% sure this is the right approach.

StepSuccess continues to update the DS, and doing that while the MCP paused, might causes to misalignment with security context/ SELinux policy.

Requeue the request for later is the lesser evil option IMO.
Another option is not continue and not requeue, a new request will come in anyway once the MCP moved to paused: false because we're watching it.
Not sure if we have a step that does that though.

we can probably use StepOngoing with 0 value; this should report the right condition without causing a requeue

openshift-ci-robot · 2026-05-14T06:55:15Z

@ffromani: This pull request references CNF-23239 which is a valid jira issue.

Details

In response to this:

WIP: implement "vertical split" vs current "horizontal split"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-26T14:25:50Z

@ffromani: This pull request references CNF-23239 which is a valid jira issue.

Details

In response to this:

implement "vertical split" vs current "horizontal split"
the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. IOW, the slowest NodeGroup blocks all the other NodeGroups at the stage it is stuck at.

With the approach proposed here, "vertical split", each NodeGroup advances independently on the reconciliation step, and only the global reported status is set pulled to the slowest state; IOW, with N nodegroups, N-1 can be Available and settled and 1 still Progressing; the global state will still be Progressing, but changes to the other NodeGroups can happen independently.

To simplify the implementation, a single goroutine handles all the NodeGroup reconciliation, so factually a NodeGroup can still influence the others, but this (bar more serious bugs we should fix anyway) can only create reconciliation delays, a far better stance than the issues caused by the horizontal split.

backportability

This PR is broken down in commits in a backport-friendly way.
Compatibility layers are added and then removed in the context of the same PR
to implify the backport process and to enable gradual testing, not because
these compatibility layers are necessary or useful besides the backporting process.

LLM-assisted backport effort summary

Branch Effort Notes

4.22 ~1 hour Essentially clean cherry-picks

4.21 ~2 hours Minor conflicts

4.20 ~3 hours Minor conflicts

4.19 ~5 hours Moderate conflicts, same architecture

4.18 ~2.5-3 days Significant adaptation (see below)

Total ~3.5-4.5 days

4.18 effort breakdown

Commits Effort Nature

1-3 (utilities) ~1-2 hours Import path adjustments

4 (reordering) ~2 hours Different baseline

5 (rte + compat) ~1 day Key refactoring, testable in isolation

6 (dead code) ~half day Adaptation for missing features

7 (wiring) ~1 day Controller rewrite, building blocks proven

8-9 (cleanup) ~2-3 hours Mechanical

LLM-assisted risk assessment

The proposed split should significantly reduce backport risk compared to
a simpler, monolithic approach:

Commits 1-4 are trivial cherry-picks. Done and verified quickly.

Commit 5 is the rte refactoring validated independently -- existing
controller tests prove the compat wrappers work. If something breaks,
the bug is isolated to the rte changes.

Commit 6 adds dead code. If it compiles, it is correct.

Commit 7 is the only commit requiring real adaptation, but by this
point all building blocks (1-6) can be proven and be verified.

Commits 8-9 are mechanical.

The compat wrappers in commit 5 are the key risk reduction: they allow
validating the rte.go refactoring in 4.18 (the hardest piece) without
touching the controller at all. Once that is proven, the controller
changes build on a solid foundation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ffromani · 2026-05-26T14:26:09Z

/hold cancel

Tal-or

Usually it's helpful to break the PR into smaller commits, but here (especially in the few last commits) some of the newer commits are overriding/updating the logic of their previous ones.

I know it was made to convey the steps clearly but it's bit confusing because now I see that I commented on lines of code that were deleted/moved in later commits.

Anyway lets address what we have right now and I'll have another round later on this week.

Tal-or · 2026-05-26T15:22:47Z

 }

 func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]MachineConfigObjectState, sets.Set[string]) {
+	var allRet []MachineConfigObjectState


Since this area is going through a major refactoring anyway, it would be nice to make the names of variables and functions more accurate - it's the MachineConfigPool* that we're checking/updating and not MachineConfig*
as we all aware those are related though different objects.

I agree about improving, thing is this function want to compute the existing state of MachineConfig objects, and compute the desired state of MachineConfig, which are indeed related to MachineConfigPools, but the pool is the scoping mechanism.

IOW, we want to reconcile MachineConfigs, and return ObjectState about these.

Tal-or · 2026-05-26T16:01:18Z

+		daemonSets:     make(map[string]daemonSetManifest),
+		machineConfigs: make(map[string]machineConfigManifest),


If that's PerTree how come more than a single DamonSet and/or MachineConfig?

Mostly, because I'm reusing ExistingManifests which predates the split and had 1:N relationship because the horizontal partitioning.

We can probably simplify further as followup of this PR

Tal-or · 2026-05-26T16:04:13Z

 	return nropv1.MachineConfigPool{Name: name}
 }
+
+type treeReconcileResult struct {


slice of DSs and MCPs implies that this is for all trees so should be treesReconcileResult in plural or returning a slice of treeReconcileResult one per tree

this is meant to be a reconcile result of a single tree (therefore perTreeReconcileResult is probably better), but the problem here is we can't yet enforce the 1:1 mapping between trees and MCPs (https://github.com/openshift-kni/numaresources-operator/blob/main/internal/api/annotations/annotations.go#L25).
Well, there's an argument about doing that in 5.0 - we had 3 EUS version past 4.18 already, but then should be done as prerequisite of this PR, and would make backporting significantly more complex.
This is why I'm still carrying this forward and postponing to 5.2.

Tal-or · 2026-05-26T16:27:34Z

+		mcResult := treeReconcileResult{}
+		if r.Platform == platform.OpenShift {
+			mcResult = r.reconcileResourceMachineConfig(ctx, instance, treeExisting, tree)
+			if mcResult.step.EarlyStop() /* || mcResult.pausedMCPNames.Len() > 0*/ {


forsaken code comment?

no, this is is a gap on my side. Let me have another cleanup pass. Restoring WIP for the time being.

ffromani · 2026-05-26T16:38:07Z

/retitle WIP: CNF-23239: controller: nrop: per-tree processing and progress

thanks for the review. I'll need to do more cleanup

openshift-ci-robot · 2026-05-26T17:04:35Z

@ffromani: This pull request references CNF-23239 which is a valid jira issue.

Details

In response to this:

implement "vertical split" vs current "horizontal split"
the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. IOW, the slowest NodeGroup blocks all the other NodeGroups at the stage it is stuck at.

With the approach proposed here, "vertical split", each NodeGroup advances independently on the reconciliation step, and only the global reported status is set pulled to the slowest state; IOW, with N nodegroups, N-1 can be Available and settled and 1 still Progressing; the global state will still be Progressing, but changes to the other NodeGroups can happen independently.

To simplify the implementation, a single goroutine handles all the NodeGroup reconciliation, so factually a NodeGroup can still influence the others, but this (bar more serious bugs we should fix anyway) can only create reconciliation delays, a far better stance than the issues caused by the horizontal split.

backportability

This PR is broken down in commits in a backport-friendly way.
Compatibility layers are added and then removed in the context of the same PR
to implify the backport process and to enable gradual testing, not because
these compatibility layers are necessary or useful besides the backporting process.

LLM-assisted backport effort summary

Branch Effort Notes

4.22 minimal Essentially clean cherry-picks

4.21 very low Minor conflicts

4.20 very low Minor conflicts

4.19 low/mid Moderate conflicts, same architecture

4.18 high Significant adaptation (see below)

4.18 effort breakdown

Commits Effort Nature

1-3 (utilities) minimal/very low Import path adjustments

4 (reordering) very low Different baseline

5 (rte + compat) hard/very gard Key refactoring, testable in isolation

6 (dead code) mid/hard Adaptation for missing features

7 (wiring) hard Controller rewrite, building blocks proven

8-9 (cleanup) very low Mechanical

LLM-assisted risk assessment

The proposed split should significantly reduce backport risk compared to
a simpler, monolithic approach:

Commits 1-4 are trivial cherry-picks. Done and verified quickly.

Commit 5 is the rte refactoring validated independently -- existing
controller tests prove the compat wrappers work. If something breaks,
the bug is isolated to the rte changes.

Commit 6 adds dead code. If it compiles, it is correct.

Commit 7 is the only commit requiring real adaptation, but by this
point all building blocks (1-6) can be proven and be verified.

Commits 8-9 are mechanical.

The compat wrappers in commit 5 are the key risk reduction: they allow
validating the rte.go refactoring in 4.18 (the hardest piece) without
touching the controller at all. Once that is proven, the controller
changes build on a solid foundation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-26T17:04:48Z

@ffromani: This pull request references CNF-23239 which is a valid jira issue.

Details

In response to this:

implement "vertical split" vs current "horizontal split"
the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. IOW, the slowest NodeGroup blocks all the other NodeGroups at the stage it is stuck at.

With the approach proposed here, "vertical split", each NodeGroup advances independently on the reconciliation step, and only the global reported status is set pulled to the slowest state; IOW, with N nodegroups, N-1 can be Available and settled and 1 still Progressing; the global state will still be Progressing, but changes to the other NodeGroups can happen independently.

To simplify the implementation, a single goroutine handles all the NodeGroup reconciliation, so factually a NodeGroup can still influence the others, but this (bar more serious bugs we should fix anyway) can only create reconciliation delays, a far better stance than the issues caused by the horizontal split.

backportability

This PR is broken down in commits in a backport-friendly way.
Compatibility layers are added and then removed in the context of the same PR
to implify the backport process and to enable gradual testing, not because
these compatibility layers are necessary or useful besides the backporting process.

LLM-assisted backport effort summary

Branch Effort Notes

4.22 minimal Essentially clean cherry-picks

4.21 very low Minor conflicts

4.20 very low Minor conflicts

4.19 low/mid Moderate conflicts, same architecture

4.18 high Significant adaptation (see below)

4.18 effort breakdown

Commits Effort Nature

1-3 (utilities) minimal/very low Import path adjustments

4 (reordering) very low Different baseline

5 (rte + compat) hard/very hard Key refactoring, testable in isolation

6 (dead code) mid/hard Adaptation for missing features

7 (wiring) hard Controller rewrite, building blocks proven

8-9 (cleanup) very low Mechanical

LLM-assisted risk assessment

The proposed split should significantly reduce backport risk compared to
a simpler, monolithic approach:

Commits 1-4 are trivial cherry-picks. Done and verified quickly.

Commit 5 is the rte refactoring validated independently -- existing
controller tests prove the compat wrappers work. If something breaks,
the bug is isolated to the rte changes.

Commit 6 adds dead code. If it compiles, it is correct.

Commit 7 is the only commit requiring real adaptation, but by this
point all building blocks (1-6) can be proven and be verified.

Commits 8-9 are mechanical.

The compat wrappers in commit 5 are the key risk reduction: they allow
validating the rte.go refactoring in 4.18 (the hardest piece) without
touching the controller at all. Once that is proven, the controller
changes build on a solid foundation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-27T12:31:22Z

@ffromani: This pull request references CNF-23239 which is a valid jira issue.

Details

In response to this:

implement "vertical split" vs current "horizontal split"
the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. IOW, the slowest NodeGroup blocks all the other NodeGroups at the stage it is stuck at.

With the approach proposed here, "vertical split", each NodeGroup advances independently on the reconciliation step, and only the global reported status is set pulled to the slowest state; IOW, with N nodegroups, N-1 can be Available and settled and 1 still Progressing; the global state will still be Progressing, but changes to the other NodeGroups can happen independently.

To simplify the implementation, a single goroutine handles all the NodeGroup reconciliation, so factually a NodeGroup can still influence the others, but this (bar more serious bugs we should fix anyway) can only create reconciliation delays, a far better stance than the issues caused by the horizontal split.

backportability

This PR is broken down in commits in a backport-friendly way.
Compatibility layers are added and then removed in the context of the same PR explicitly and only in order
to simplify the backport process and to enable gradual testing, not because their own merit.

LLM-assisted backport effort summary

Branch Effort Notes

4.22 minimal Essentially clean cherry-picks

4.21 minimal Essentially clean cherry-picks (after PR #4025 lands)

4.20 very low Minor controller conflicts (after PR #3801 backport lands)

4.19 low Minor conflicts; rte.go diverges more (after PR #3801 backport lands)

4.18 high Path renames, no MachineConfigsState, machineconfigpool.go heavily diverged (see below)

4.18 effort breakdown

Commits Effort Nature

1 (Name() helper) very low Path adjustment: api/v1/helper/ → api/numaresourcesoperator/v1/helper/

2-3 (conditioninfo, step helpers) minimal step.go identical; conditioninfo.go nearly identical

4 (move helper around) low Path adjustment: internal/controller/ → controllers/

5 (rte + compat wrappers) hard/very hard Key refactoring; machineconfigpool.go is 134 lines (vs 244 on main), no MachineConfigObjectState type, no MachineConfigsState method; rte.go is 436 lines (vs 379) with 288 diff lines

6 (signature refactoring) low Path adjustment only

7 (add per-tree functions) mid/hard New code, but depends on compat wrappers from commit 5 being adapted first

8 (wire per-tree functions) hard Controller rewrite; building blocks from 5-7 must be proven first

9-10 (narrow helpers, rename) very low Mechanical signature changes

11 (remove compat wrappers) mid Adaptation for different machineconfigpool.go baseline

12 (vertical processing) mid Controller change; same pattern as main but different baseline

4.21/4.20/4.19 prerequisite

These branches require PR #3801 ("OCPBUGS-84226: per-pool MachineConfig state
with paused MCP awareness") to be backported first. PR #4025 covers 4.21;
4.20 and 4.19 need equivalent backports. Once that prerequisite lands,
these branches have the same MachineConfigObjectState type and
MachineConfigsState signature as main, making the cherry-picks
straightforward.

LLM-assisted risk assessment

The proposed split should significantly reduce backport risk compared to
a simpler, monolithic approach:

Commits 1-3 (Name helper, UpdateMessage, step helpers) are trivial
cherry-picks on all branches. Done and verified quickly.

Commit 4 (move helper) is a pure reorder, low risk.

Commit 5 (rte compat wrappers) is the key risk point — it refactors
the objectstate layer but adds compat wrappers so the controller continues
working unchanged. Existing controller tests validate the compat wrappers
without any controller changes.

Commit 6 (signature refactoring) is mechanical prep work.

Commit 7 (add per-tree functions) adds dead code. If it compiles,
it is correct.

Commit 8 (wire per-tree functions) is the core behavioral change.
By this point all building blocks (1-7) are proven and verified.

Commits 9-10 (narrow helpers, rename) are mechanical signature
narrowing, safe after commit 8 removed the old callers.

Commit 11 (remove compat wrappers) is mechanical cleanup.

Commit 12 (vertical processing) is the final behavioral change,
switching from horizontal to vertical orchestration.

The compat wrappers in commit 5 are the key risk reduction: they allow
validating the rte.go refactoring on any branch (including 4.18, the
hardest target) without touching the controller at all. Once that is
proven, the controller changes build on a solid foundation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-27T12:58:25Z

@ffromani: This pull request references CNF-23239 which is a valid jira issue.

Details

In response to this:

implement "vertical split" vs current "horizontal split"
the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. IOW, the slowest NodeGroup blocks all the other NodeGroups at the stage it is stuck at.

With the approach proposed here, "vertical split", each NodeGroup advances independently on the reconciliation step, and only the global reported status is set pulled to the slowest state; IOW, with N nodegroups, N-1 can be Available and settled and 1 still Progressing; the global state will still be Progressing, but changes to the other NodeGroups can happen independently.

To simplify the implementation, a single goroutine handles all the NodeGroup reconciliation, so factually a NodeGroup can still influence the others, but this (bar more serious bugs we should fix anyway) can only create reconciliation delays, a far better stance than the issues caused by the horizontal split.

backportability

This PR is broken down in commits which are intentionally shaped to make the backport down to the distant 4.18 as low risk as possible.
Compatibility layers are added and then removed in the context of the same PR explicitly and only in order
to enable gradual testing, not because their own merit.

review notice

Adding and removing code in the same PR makes the review harder, even though the final tree state is identical.
It may be useful to leverage the intended property the final state is identical as a cleaner, main-only series and review the final tree state in addition to individual commits.
The most independent commits are intentionally pushed towards the beginning of the series to enable this review mode as much as possible.

LLM-assisted backport effort summary

Branch Effort Notes

4.22 minimal Essentially clean cherry-picks

4.21 minimal Essentially clean cherry-picks (after PR #4025 lands)

4.20 very low Minor controller conflicts (after PR #3801 backport lands)

4.19 low Minor conflicts; rte.go diverges more (after PR #3801 backport lands)

4.18 high Path renames, no MachineConfigsState, machineconfigpool.go heavily diverged (see below)

4.18 effort breakdown

Commits Effort Nature

1 (Name() helper) very low Path adjustment: api/v1/helper/ → api/numaresourcesoperator/v1/helper/

2-3 (conditioninfo, step helpers) minimal step.go identical; conditioninfo.go nearly identical

4 (move helper around) low Path adjustment: internal/controller/ → controllers/

5 (rte + compat wrappers) hard/very hard Key refactoring; machineconfigpool.go is 134 lines (vs 244 on main), no MachineConfigObjectState type, no MachineConfigsState method; rte.go is 436 lines (vs 379) with 288 diff lines

6 (signature refactoring) low Path adjustment only

7 (add per-tree functions) mid/hard New code, but depends on compat wrappers from commit 5 being adapted first

8 (wire per-tree functions) hard Controller rewrite; building blocks from 5-7 must be proven first

9-10 (narrow helpers, rename) very low Mechanical signature changes

11 (remove compat wrappers) mid Adaptation for different machineconfigpool.go baseline

12 (vertical processing) mid Controller change; same pattern as main but different baseline

4.21/4.20/4.19 prerequisite

These branches require PR #3801 ("OCPBUGS-84226: per-pool MachineConfig state
with paused MCP awareness") to be backported first. PR #4025 covers 4.21;
4.20 and 4.19 need equivalent backports. Once that prerequisite lands,
these branches have the same MachineConfigObjectState type and
MachineConfigsState signature as main, making the cherry-picks
straightforward.

LLM-assisted risk assessment

The proposed split should significantly reduce backport risk compared to
a simpler, monolithic approach:

Commits 1-3 (Name helper, UpdateMessage, step helpers) are trivial
cherry-picks on all branches. Done and verified quickly.

Commit 4 (move helper) is a pure reorder, low risk.

Commit 5 (rte compat wrappers) is the key risk point — it refactors
the objectstate layer but adds compat wrappers so the controller continues
working unchanged. Existing controller tests validate the compat wrappers
without any controller changes.

Commit 6 (signature refactoring) is mechanical prep work.

Commit 7 (add per-tree functions) adds dead code. If it compiles,
it is correct.

Commit 8 (wire per-tree functions) is the core behavioral change.
By this point all building blocks (1-7) are proven and verified.

Commits 9-10 (narrow helpers, rename) are mechanical signature
narrowing, safe after commit 8 removed the old callers.

Commit 11 (remove compat wrappers) is mechanical cleanup.

Commit 12 (vertical processing) is the final behavioral change,
switching from horizontal to vertical orchestration.

The compat wrappers in commit 5 are the key risk reduction: they allow
validating the rte.go refactoring on any branch (including 4.18, the
hardest target) without touching the controller at all. Once that is
proven, the controller changes build on a solid foundation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ffromani · 2026-05-27T12:59:27Z

reviewable again

openshift-ci-robot · 2026-05-27T13:00:34Z

@ffromani: This pull request references CNF-23239 which is a valid jira issue.

Details

In response to this:

implement "vertical split" vs current "horizontal split"
the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. IOW, the slowest NodeGroup blocks all the other NodeGroups at the stage it is stuck at.

With the approach proposed here, "vertical split", each NodeGroup advances independently on the reconciliation step, and only the global reported status is set pulled to the slowest state; IOW, with N nodegroups, N-1 can be Available and settled and 1 still Progressing; the global state will still be Progressing, but changes to the other NodeGroups can happen independently.

To simplify the implementation, a single goroutine handles all the NodeGroup reconciliation, so factually a NodeGroup can still influence the others, but this (bar more serious bugs we should fix anyway) can only create reconciliation delays, a far better stance than the issues caused by the horizontal split.

backportability

This PR is broken down in commits which are intentionally shaped to make the backport down to the distant 4.18 as low risk as possible.
Compatibility layers are added and then removed in the context of the same PR explicitly and only in order
to enable gradual testing, not because their own merit.

review notice

Adding and removing code in the same PR makes the review harder, even though the final tree state is identical.
It may be useful to review the final tree state (e.g. git diff ..HEAD) in addition to individual commits, since the end result is identical to a cleaner, main-only series.
The most independent commits are intentionally pushed towards the beginning of the series to enable this review mode as much as possible.

LLM-assisted backport effort summary

Branch Effort Notes

4.22 minimal Essentially clean cherry-picks

4.21 minimal Essentially clean cherry-picks (after PR #4025 lands)

4.20 very low Minor controller conflicts (after PR #3801 backport lands)

4.19 low Minor conflicts; rte.go diverges more (after PR #3801 backport lands)

4.18 high Path renames, no MachineConfigsState, machineconfigpool.go heavily diverged (see below)

4.18 effort breakdown

Commits Effort Nature

1 (Name() helper) very low Path adjustment: api/v1/helper/ → api/numaresourcesoperator/v1/helper/

2-3 (conditioninfo, step helpers) minimal step.go identical; conditioninfo.go nearly identical

4 (move helper around) low Path adjustment: internal/controller/ → controllers/

5 (rte + compat wrappers) hard/very hard Key refactoring; machineconfigpool.go is 134 lines (vs 244 on main), no MachineConfigObjectState type, no MachineConfigsState method; rte.go is 436 lines (vs 379) with 288 diff lines

6 (signature refactoring) low Path adjustment only

7 (add per-tree functions) mid/hard New code, but depends on compat wrappers from commit 5 being adapted first

8 (wire per-tree functions) hard Controller rewrite; building blocks from 5-7 must be proven first

9-10 (narrow helpers, rename) very low Mechanical signature changes

11 (remove compat wrappers) mid Adaptation for different machineconfigpool.go baseline

12 (vertical processing) mid Controller change; same pattern as main but different baseline

4.21/4.20/4.19 prerequisite

These branches require PR #3801 ("OCPBUGS-84226: per-pool MachineConfig state
with paused MCP awareness") to be backported first. PR #4025 covers 4.21;
4.20 and 4.19 need equivalent backports. Once that prerequisite lands,
these branches have the same MachineConfigObjectState type and
MachineConfigsState signature as main, making the cherry-picks
straightforward.

LLM-assisted risk assessment

The proposed split should significantly reduce backport risk compared to
a simpler, monolithic approach:

Commits 1-3 (Name helper, UpdateMessage, step helpers) are trivial
cherry-picks on all branches. Done and verified quickly.

Commit 4 (move helper) is a pure reorder, low risk.

Commit 5 (rte compat wrappers) is the key risk point — it refactors
the objectstate layer but adds compat wrappers so the controller continues
working unchanged. Existing controller tests validate the compat wrappers
without any controller changes.

Commit 6 (signature refactoring) is mechanical prep work.

Commit 7 (add per-tree functions) adds dead code. If it compiles,
it is correct.

Commit 8 (wire per-tree functions) is the core behavioral change.
By this point all building blocks (1-7) are proven and verified.

Commits 9-10 (narrow helpers, rename) are mechanical signature
narrowing, safe after commit 8 removed the old callers.

Commit 11 (remove compat wrappers) is mechanical cleanup.

Commit 12 (vertical processing) is the final behavioral change,
switching from horizontal to vertical orchestration.

The compat wrappers in commit 5 are the key risk reduction: they allow
validating the rte.go refactoring on any branch (including 4.18, the
hardest target) without touching the controller at all. Once that is
proven, the controller changes build on a solid foundation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

internal/reconcile/step_test.go (1)
45-61: ⚡ Quick win

Add coverage for StepOngoing(0) semantics

Please add a regression test asserting that StepOngoing(0) is still considered ongoing (and not failed). This is a critical path for paused MCP handling in the controller.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/reconcile/step_test.go` around lines 45 - 61, Add a regression test
that calls StepOngoing(0) and asserts it is ongoing and not failed: create a new
test (e.g., TestStepOngoingZeroIsOngoing) that constructs st := StepOngoing(0)
and uses assert.True(t, st.Ongoing()) and assert.False(t, st.Failed()) to verify
the semantics; place it alongside the existing
TestStepOngoingIsOngoing/TestStepFailedIsFailed/TestStepSuccessIsNeitherOngoingNorFailed
tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/controller/numaresourcesoperator_controller_test.go`:
- Around line 2285-2297: Add assertions that the pending tree's DaemonSet is
still absent and the global Available condition is false: attempt to Get the
DaemonSet for mcp1 (using objectnames.GetComponentName(nro.Name, mcp1.Name) and
testNamespace) and expect a NotFound error, then fetch nro and verify
getConditionByType(nro.Status.Conditions, status.ConditionAvailable) is not nil
and its Status equals metav1.ConditionFalse. This complements the existing
checks that pool2 DS exists and Progressing is true.

In `@internal/controller/numaresourcesoperator_controller.go`:
- Around line 822-824: The code currently only logs objState.Error and
continues; change this to fail fast: when objState.Error != nil, log the error
with context and immediately stop processing by returning or propagating the
error from the enclosing reconcile/processing function (do not continue to apply
manifests). Ensure you wrap or annotate objState.Error with a clear message
(e.g., "failed building manifest/state for <object>") before returning so
callers can handle requeueing or surface the failure.

In `@internal/reconcile/step.go`:
- Around line 46-49: The Ongoing() method currently returns true only when
Result.RequeueAfter > 0 which incorrectly treats StepOngoing(0) as not ongoing;
update Step.Ongoing() to treat zero as an ongoing/ progressing value (e.g.,
return rs.Result.RequeueAfter >= 0) or otherwise compare against the defined
sentinel for "no requeue" so that StepOngoing(0) is classified as ongoing;
change the condition in the Ongoing() implementation that references
Result.RequeueAfter so zero is considered progressing (and ensure this aligns
with the StepOngoing(0) semantics).

---

Nitpick comments:
In `@internal/reconcile/step_test.go`:
- Around line 45-61: Add a regression test that calls StepOngoing(0) and asserts
it is ongoing and not failed: create a new test (e.g.,
TestStepOngoingZeroIsOngoing) that constructs st := StepOngoing(0) and uses
assert.True(t, st.Ongoing()) and assert.False(t, st.Failed()) to verify the
semantics; place it alongside the existing
TestStepOngoingIsOngoing/TestStepFailedIsFailed/TestStepSuccessIsNeitherOngoingNorFailed
tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: cd1941fa-fd9e-4e86-911b-8ba816943886

📥 Commits

Reviewing files that changed from the base of the PR and between a81ec4d and ef94726.

📒 Files selected for processing (11)

api/v1/helper/nodegroup/nodegroup.go
api/v1/helper/nodegroup/nodegroup_test.go
internal/controller/numaresourcesoperator_controller.go
internal/controller/numaresourcesoperator_controller_test.go
internal/reconcile/step.go
internal/reconcile/step_test.go
pkg/objectstate/rte/machineconfigpool.go
pkg/objectstate/rte/machineconfigpool_test.go
pkg/objectstate/rte/rte.go
pkg/status/conditioninfo/conditioninfo.go
pkg/status/conditioninfo/conditioninfo_test.go

ffromani

I acknowledge the review difficulties. Not sure how to help further unless we want to completely pivot direction and make a clean-as-possible main-only, minimal PR. That would make for a trivial backport down to 4.21 or so. Past that, it will both become hard and touch branches deeper and deeper in stable state. Which alternatives do we have?

ffromani · 2026-05-27T08:59:35Z

 }

 func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]MachineConfigObjectState, sets.Set[string]) {
+	var allRet []MachineConfigObjectState


I agree about improving, thing is this function want to compute the existing state of MachineConfig objects, and compute the desired state of MachineConfig, which are indeed related to MachineConfigPools, but the pool is the scoping mechanism.

IOW, we want to reconcile MachineConfigs, and return ObjectState about these.

ffromani · 2026-05-27T09:02:51Z

+		daemonSets:     make(map[string]daemonSetManifest),
+		machineConfigs: make(map[string]machineConfigManifest),


Mostly, because I'm reusing ExistingManifests which predates the split and had 1:N relationship because the horizontal partitioning.

We can probably simplify further as followup of this PR

ffromani · 2026-05-27T09:09:31Z

 	return nropv1.MachineConfigPool{Name: name}
 }
+
+type treeReconcileResult struct {


this is meant to be a reconcile result of a single tree (therefore perTreeReconcileResult is probably better), but the problem here is we can't yet enforce the 1:1 mapping between trees and MCPs (https://github.com/openshift-kni/numaresources-operator/blob/main/internal/api/annotations/annotations.go#L25).
Well, there's an argument about doing that in 5.0 - we had 3 EUS version past 4.18 already, but then should be done as prerequisite of this PR, and would make backporting significantly more complex.
This is why I'm still carrying this forward and postponing to 5.2.

ffromani · 2026-05-28T08:54:58Z

/test ci-e2e-install-hypershift

Exposing the name is useful for logging purposes; no logic is planned upon the name of the objects. Signed-off-by: Francesco Romani <fromani@redhat.com>

this allow to update an existing message, because, by design, WithMessage() want to avoid overwriting. Will be used to summarize the status in the upcoming "vertical" split of the NodeGroups reconciliation AI-attribution: AIA PAI CeNc Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

add helpers which we will use in the "vertical split" of nodegroup reconciliation. Naming explanation: the split is about how the reconciliation makes progress. In the model we had till now, retroactively called "horizontal split", each NodeGroup must make progress on each reconciliation stage before any of these can progress to the next stage. With the upcoming proposed approach, the "vertical split", each NodeGroup will advance independently on the reconciliation step, and only the global reported status is set pulled to the slowest state. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

to reduce the diff noise. No changes in behavior, trivial code movement. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

Add per-tree variants of the key ExistingManifests methods: - TreeAgnostic(mf): returns tree-independent object states - PerTreeState(mf, tree): returns object states for a single tree - FromClientTreeAgnostic(): creates ExistingManifests without tree data - PerTree(ctx, cli, tree): creates a per-tree ExistingManifests clone - MachineConfigsStateForTree(mf, tree): per-tree machine config state The existing State(), FromClient(), and MachineConfigsState() are kept as compatibility wrappers that delegate to the new per-tree functions, so the controller remains unchanged. AI-attribution: AIA PAI CeNc Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

trivial signature change to enable upcoming refactoring. No intended change in behavior. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

Add the building blocks for per-tree reconciliation without changing existing code paths: - perTreeResult type to capture per-tree outcomes - reconcilePerTreeMachineConfig: per-tree machine config reconciliation - reconcilePerTreeDaemonSet: per-tree daemonset reconciliation - setupTreeAgnosticManifests: extracted tree-agnostic manifest setup - applyObjects: generic object state applier - collectDaemonSets, reduceTreeResults, shouldReplaceStep, treeSummaryMessage: aggregation helpers These functions are not yet wired into the reconcile loop; they will replace the existing all-trees-at-once functions in a follow-up. AI-attribution: AIA PAI CeNc Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

Rewrite reconcileResource to use the per-tree reconcile functions introduced in the previous commit instead of the old horizontal functions. Use FromClientTreeAgnostic for tree-agnostic resource setup and PerTree for per-tree processing. Move dangling resource cleanup into reconcileResource and use horizontal orchestration: all trees complete machine config step first, then all trees complete daemonset step. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

Change syncMachineConfigPoolsStatuses and syncMachineConfigPoolNodeGroupConfigStatuses from variadic trees ...nodegroupv1.Tree to single tree nodegroupv1.Tree, removing the outer tree iteration loop from each. These functions are now only called from reconcilePerTreeMachineConfig which already passes a single tree. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

…eConfig Rename syncMachineConfigs to syncMachineConfig and change its signature from variadic trees ...nodegroupv1.Tree to single tree nodegroupv1.Tree. Switch the internal call from MachineConfigsState to MachineConfigsStateForTree. Also narrow validateMachineConfigLabels from []nodegroupv1.Tree to single nodegroupv1.Tree, removing the outer tree loop. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

Remove the horizontal compat wrappers (State, FromClient, MachineConfigsState without tree param) and the trees field from ExistingManifests, now that the controller uses the per-tree API directly. Rename MachineConfigsStateForTree to MachineConfigsState and update the controller call site and tests accordingly. AI-attribution: AIA PAI CeNc Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

Change reconcileResource from horizontal orchestration (all trees complete machine config step, then all trees complete daemonset step) to vertical processing (each tree progresses through all steps independently). This means a single slow nodegroup no longer blocks progress for the others. Move tree normalization from the pre-processing loop in Reconcile into the per-tree loop in reconcileResource, where it naturally belongs alongside the rest of per-tree processing. Add a test verifying that a tree whose MCP is ready creates its DaemonSet even while another tree's MCP is still pending. AI-attribution: AIA PAI CeNc Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

address coderabbit AI comments AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

openshift-ci · 2026-07-07T13:17:00Z

@ffromani: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ci-e2e-install-hypershift	`5e69487`	link	true	`/test ci-e2e-install-hypershift`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci · 2026-07-11T21:31:04Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 12, 2026

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2026

openshift-ci Bot requested review from Tal-or and swatisehgal May 12, 2026 09:06

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026

ffromani force-pushed the per-tree-processing branch from b9f2280 to 6aeebce Compare May 12, 2026 09:07

ffromani force-pushed the per-tree-processing branch from 6aeebce to 7a66a48 Compare May 12, 2026 09:15

Tal-or reviewed May 12, 2026

View reviewed changes

Tal-or changed the title ~~WIP: controller: nrop: per-tree processing and progress~~ CNF-23239: WIP: controller: nrop: per-tree processing and progress May 14, 2026

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2026

ffromani force-pushed the per-tree-processing branch from 7a66a48 to bc3c6fe Compare May 15, 2026 13:01

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2026

ffromani force-pushed the per-tree-processing branch from bc3c6fe to 9628673 Compare May 26, 2026 14:03

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 26, 2026

ffromani force-pushed the per-tree-processing branch from 9628673 to c508fc8 Compare May 26, 2026 14:12

ffromani changed the title ~~CNF-23239: WIP: controller: nrop: per-tree processing and progress~~ CNF-23239: controller: nrop: per-tree processing and progress May 26, 2026

openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 26, 2026

Tal-or reviewed May 26, 2026

View reviewed changes

openshift-ci Bot changed the title ~~CNF-23239: controller: nrop: per-tree processing and progress~~ WIP: [CNF-23239: controller: nrop: per-tree processing and progress](https://github.com/openshift-kni/numaresources-operator/pull/3998#top) May 26, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026

ffromani changed the title ~~WIP: [CNF-23239: controller: nrop: per-tree processing and progress](https://github.com/openshift-kni/numaresources-operator/pull/3998#top)~~ WIP: CNF-23239: controller: nrop: per-tree processing and progress May 26, 2026

ffromani force-pushed the per-tree-processing branch 2 times, most recently from a847972 to ee816e3 Compare May 27, 2026 12:30

ffromani force-pushed the per-tree-processing branch from ee816e3 to da33cd8 Compare May 27, 2026 12:52

ffromani force-pushed the per-tree-processing branch from da33cd8 to ef94726 Compare May 27, 2026 12:59

ffromani changed the title ~~WIP: CNF-23239: controller: nrop: per-tree processing and progress~~ CNF-23239: controller: nrop: per-tree processing and progress May 27, 2026

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 27, 2026

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread internal/controller/numaresourcesoperator_controller_test.go

Comment thread internal/controller/numaresourcesoperator_controller.go

Comment thread internal/reconcile/step.go

ffromani commented May 27, 2026

View reviewed changes

ffromani added 13 commits July 7, 2026 14:22

api: helper: add Name() helper for nodegroup.Tree

b1e6b5a

Exposing the name is useful for logging purposes; no logic is planned upon the name of the objects. Signed-off-by: Francesco Romani <fromani@redhat.com>

controller: chore: move helper around

482e9fb

to reduce the diff noise. No changes in behavior, trivial code movement. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

nrop: chore: change signature to enable future refactoring

433837f

trivial signature change to enable upcoming refactoring. No intended change in behavior. AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

review: address comments

5e69487

address coderabbit AI comments AI-attribution: AIA PAI Ce Hin R claude-4.6-opus-1M v1.0 Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the per-tree-processing branch from ef94726 to 5e69487 Compare July 7, 2026 12:33

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 11, 2026

		daemonSets: make(map[string]daemonSetManifest),
		machineConfigs: make(map[string]machineConfigManifest),

Uh oh!

Conversation

ffromani commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

backportability

review notice

LLM-assisted backport effort summary

4.18 effort breakdown

4.21/4.20/4.19 prerequisite

LLM-assisted risk assessment

Uh oh!

ffromani commented May 12, 2026

Uh oh!

openshift-ci Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

Tal-or left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tal-or May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented May 14, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented May 26, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

backportability

LLM-assisted backport effort summary

4.18 effort breakdown

LLM-assisted risk assessment

Uh oh!

ffromani commented May 26, 2026

Uh oh!

Tal-or left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani commented May 26, 2026

Uh oh!

openshift-ci-robot commented May 26, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

backportability

LLM-assisted backport effort summary

4.18 effort breakdown

LLM-assisted risk assessment

Uh oh!

openshift-ci-robot commented May 26, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

backportability

ffromani commented May 12, 2026 •

edited

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Tal-or May 17, 2026 •

edited

Loading

openshift-ci-robot commented May 14, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 26, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 26, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 26, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 27, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 27, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 27, 2026 •

edited by openshift-ci Bot

Loading