Skip to content

OCPBUGS-79418: ctrl: sched: add topologySpreadConstraints to deployment#4177

Merged
openshift-merge-bot[bot] merged 5 commits into
openshift-kni:mainfrom
shajmakh:tsc-for-ha-sched
May 29, 2026
Merged

OCPBUGS-79418: ctrl: sched: add topologySpreadConstraints to deployment#4177
openshift-merge-bot[bot] merged 5 commits into
openshift-kni:mainfrom
shajmakh:tsc-for-ha-sched

Conversation

@shajmakh
Copy link
Copy Markdown
Member

@shajmakh shajmakh commented May 25, 2026

To evenly spread the pods across the nodes in a balanced way taking
into account the new replicaset pods while rollout.

previous discussions:
(reverted) #4021
(cancelled due to regressions) #4028

@openshift-ci openshift-ci Bot requested review from mrniranjan and swatisehgal May 25, 2026 13:53
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Scheduler pod placement shifts from affinity rules to hostname-based topology spread constraints. Explicit scheduler replica counts are now validated against auto-detected eligible nodes. Deployment-completion waiting is refactored to accept explicit replica counts and validate generation and per-field replica readiness independently with detailed logging.

Changes

Scheduler topology spread and replica validation

Layer / File(s) Summary
TopologySpreadConstraints pod-placement helper
pkg/objectupdate/sched/sched.go, pkg/objectupdate/sched/sched_test.go
DeploymentTopologySpreadConstraints replaces affinity-based placement; it configures hostname-scoped spread with DoNotSchedule, MaxSkew=1, and pod-template-hash matching, returning error if template has no labels. Unit tests verify error cases and constraint structure.
Controller replica detection and validation
internal/controller/numaresourcesscheduler_controller.go, internal/controller/numaresourcesscheduler_controller_test.go
computeSchedulerReplicas now detects eligible nodes by platform role before validating explicit spec.replicas against that count, returning error if exceeded. Scheduler Deployment sync applies DeploymentTopologySpreadConstraints instead of affinity. Controller tests cover success paths, validation failure, and degradation on node scale-down.
Deployment-completion wait logic
internal/wait/deployment.go
ForDeploymentCompleteWithReplicas handles polling with an explicit expected-replica count. IsDeploymentComplete refactored to accept generation and expected-replicas parameters, performing generation check first, then validating Replicas, UpdatedReplicas, and AvailableReplicas independently with per-field logging. ForDeploymentComplete delegates through the new variant.
Scheduler e2e topology spread tests
test/e2e/sched/sched_test.go
Comprehensive e2e suite validates topology spread behavior: autodetection of replica count from platform nodes, constraint structure verification, replica toggling (0/explicit/auto) and deployment identity stability, distinct-node placement, and cordon-induced topology-failure scenario (pod Pending with Unschedulable). Adds deployment status logging helper.
Serial e2e wait signature updates
test/e2e/serial/tests/configuration.go, test/e2e/serial/tests/scheduler_removal.go, test/e2e/serial/tests/workload_placement_nodelabel.go
Updates wait.IsDeploymentComplete calls to use the new generation- and replica-aware signature, passing deployment.Generation, status, and expected replica count.

Sequence Diagram(s)

sequenceDiagram
  participant NUMAResourcesScheduler
  participant Controller as computeSchedulerReplicas
  participant NodeAPI
  participant KubeAPI
  NUMAResourcesScheduler->>Controller: spec.Replicas (may be nil)
  Controller->>NodeAPI: List nodes by platform role
  NodeAPI-->>Controller: eligible node count
  alt explicit replicas set
    Controller->>Controller: validate spec.Replicas ≤ node count
    alt validation fails
      Controller-->>NUMAResourcesScheduler: error
    else validation passes
      Controller-->>NUMAResourcesScheduler: spec.Replicas
    end
  else autodetect
    Controller-->>NUMAResourcesScheduler: detected node count
  end
  NUMAResourcesScheduler->>KubeAPI: patch Deployment with TopologySpreadConstraints
  KubeAPI-->>NUMAResourcesScheduler: Deployment updated
Loading

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding topologySpreadConstraints to the scheduler deployment, which is reflected throughout all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description accurately describes the changes: adding topologySpreadConstraints to evenly spread scheduler pods across nodes during rollouts, with clear context from previous discussions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/controller/numaresourcesscheduler_controller.go`:
- Around line 279-285: The current replica validation rejects spec.replicas: 0
when replicaCount is 0; update the logic in the block handling
schedSpec.Replicas so that an explicit zero is always accepted: first check if
*schedSpec.Replicas == 0 and if so return schedSpec.Replicas, nil; otherwise
keep the existing guard that errors when *schedSpec.Replicas > replicaCount
(using the existing replicaCount and nodeRoleDescription symbols) and return the
validated value. This preserves the existing maximum check but allows
intentional scale-to-zero.

In `@test/e2e/sched/sched_test.go`:
- Around line 392-396: The current test compares Deployment.ResourceVersion
immediately (using schedDpPostUpdate.ResourceVersion ==
schedDp.ResourceVersion), which is flaky; replace that assertion with a
Consistently check that the deployment's spec and generation do not change over
a short interval. Specifically, fetch the deployment via
podlist.With(...).DeploymentByOwnerReference(ctx, nroSchedObj.UID) inside a
Gomega Consistently block and assert that schedDpPostUpdate.Spec (or relevant
spec fields) and schedDpPostUpdate.Generation remain equal to the original
schedDp.Spec and schedDp.Generation for the duration; remove the ResourceVersion
equality assertion. Apply the same replacement for the similar check around
schedDpPostUpdate/schedDp at lines 405-409.
- Around line 485-493: The cleanup uses an uninitialized pointer `var node
*corev1.Node`, causing nil deref when calling `e2eclient.Client.Get` and
accessing `node.Spec.Unschedulable`; allocate a concrete Node before use (e.g.,
declare `node := &corev1.Node{}`) in the `DeferCleanup` closure, then call
`e2eclient.Client.Get(ctx, client.ObjectKey{Name: targetNodeName}, node)` and
update `node.Spec.Unschedulable = false` followed by
`e2eclient.Client.Update(ctx, node)` inside the `Eventually` block so
`needToUncordon`, `targetNodeName`, `DeferCleanup`, and the get/update calls
work safely.
- Around line 503-518: In the Eventually polling loop that calls
podlist.With(...).ByDeployment and assigns to recreatedPod, guard against nil
before dereferencing recreatedPod by moving or conditionalizing the klog.InfoS
call and any access to recreatedPod.Namespace/Name until after you assert
recreatedPod is not nil (or wrap the logging in an if recreatedPod != nil
block); ensure g.Expect(recreatedPod).ToNot(BeNil()) still runs in the loop so
the loop can retry when no pending pod is found and only after that check use
recreatedPod fields for logging or further assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: bb00a70c-f80a-4212-b730-569aa2e631d4

📥 Commits

Reviewing files that changed from the base of the PR and between b06519a and 9a45f1f.

📒 Files selected for processing (5)
  • internal/controller/numaresourcesscheduler_controller.go
  • internal/controller/numaresourcesscheduler_controller_test.go
  • pkg/objectupdate/sched/sched.go
  • pkg/objectupdate/sched/sched_test.go
  • test/e2e/sched/sched_test.go

Comment thread internal/controller/numaresourcesscheduler_controller.go
Comment thread test/e2e/sched/sched_test.go Outdated
Comment thread test/e2e/sched/sched_test.go Outdated
Comment thread test/e2e/sched/sched_test.go
@ffromani
Copy link
Copy Markdown
Member

/cc

@openshift-ci openshift-ci Bot requested a review from ffromani May 25, 2026 14:10
@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch from 9a45f1f to aad497d Compare May 26, 2026 05:42
@shajmakh
Copy link
Copy Markdown
Member Author

requires #4193

@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch from aad497d to b3da392 Compare May 26, 2026 06:32
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/objectupdate/sched/sched.go`:
- Around line 107-110: The code only rejects nil dp.Spec.Template.Labels but
allows an empty map which yields an empty MatchLabels selector; update the
validation to treat empty maps as invalid by checking both nil and len(labels)
== 0 (i.e., if labels == nil || len(labels) == 0) and return an error before
constructing the selector/MatchLabels so you never build a LabelSelector with an
empty MatchLabels.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 738f118f-0f41-492e-9e1e-9f31e89f752d

📥 Commits

Reviewing files that changed from the base of the PR and between 9a45f1f and b3da392.

📒 Files selected for processing (5)
  • internal/controller/numaresourcesscheduler_controller.go
  • internal/controller/numaresourcesscheduler_controller_test.go
  • pkg/objectupdate/sched/sched.go
  • pkg/objectupdate/sched/sched_test.go
  • test/e2e/sched/sched_test.go

Comment thread pkg/objectupdate/sched/sched.go
Copy link
Copy Markdown
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Do we have a test which actually verify the happy path? 3 control-plane nodes, check we autodetect 3 replicas and each one lands on a different node.

Comment thread pkg/objectupdate/sched/sched.go Outdated
Comment thread pkg/objectupdate/sched/sched_test.go Outdated
@shajmakh
Copy link
Copy Markdown
Member Author

Good work. Do we have a test which actually verify the happy path? 3 control-plane nodes, check we autodetect 3 replicas and each one lands on a different node.

we have that as a step in one of the tests:
b3da392#diff-6e473fac5187d020f14665d506a6553280fc9db1866488805157ada1a93e82b0R479

@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch 3 times, most recently from 2207af4 to c2e260d Compare May 26, 2026 08:46
@shajmakh
Copy link
Copy Markdown
Member Author

/retest

@shajmakh shajmakh changed the title ctrl: sched: add topologySpreadConstraints to deployment OCPBUGS-79418: ctrl: sched: add topologySpreadConstraints to deployment May 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@shajmakh: This pull request references Jira Issue OCPBUGS-79418, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rshemtov13

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

To evenly spread the pods across the nodes in a a balanced way taking
into account the new replicaset pods while rollout.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from rshemtov13 May 26, 2026 11:50
Copy link
Copy Markdown
Collaborator

@Tal-or Tal-or left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, left minor comment

Comment thread pkg/objectupdate/sched/sched.go
Copy link
Copy Markdown
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add one more e2e test (suggested inline) and we can merge

Comment thread internal/controller/numaresourcesscheduler_controller_test.go Outdated
Comment thread pkg/objectupdate/sched/sched_test.go Outdated
Comment thread test/e2e/sched/sched_test.go
@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch from c2e260d to 098333c Compare May 27, 2026 06:45
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/wait/deployment.go`:
- Around line 53-63: areDeploymentReplicasAvailable currently always returns
true, which makes IsDeploymentComplete succeed as soon as ObservedGeneration
advances; change areDeploymentReplicasAvailable to perform a real availability
check: compare newStatus.Replicas, newStatus.UpdatedReplicas and
newStatus.AvailableReplicas against the desired replicas value and return false
if any required condition is not met (log the mismatches with klog.InfoS as
already used), returning true only when the availability predicate is satisfied;
if strict equality is too strict for your rollout semantics, implement a clear
predicate (e.g., UpdatedReplicas >= replicas and AvailableReplicas >=
minDesired) instead of unconditional true so IsDeploymentComplete behaves
correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a48fb4bf-6aa6-4125-8967-f83a3a69196d

📥 Commits

Reviewing files that changed from the base of the PR and between b3da392 and 098333c.

📒 Files selected for processing (6)
  • internal/controller/numaresourcesscheduler_controller.go
  • internal/controller/numaresourcesscheduler_controller_test.go
  • internal/wait/deployment.go
  • pkg/objectupdate/sched/sched.go
  • pkg/objectupdate/sched/sched_test.go
  • test/e2e/sched/sched_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • internal/controller/numaresourcesscheduler_controller.go
  • pkg/objectupdate/sched/sched.go
  • test/e2e/sched/sched_test.go
  • internal/controller/numaresourcesscheduler_controller_test.go
  • pkg/objectupdate/sched/sched_test.go

Comment thread internal/wait/deployment.go
shajmakh added 2 commits May 27, 2026 10:25
PodAntiAffinity is a legacy way to handle HA scenarios and in project
state it caused deadlocks and regressions.

This reverts commit b4e7fd1.

Signed-off-by: Shereen Haj <shajmakh@redhat.com>
It's reasonable to have high-availability enabled to have scheduler pods
number matches the control-plane nodes' number or workers on hypershift.
but a number greater than that makes no point. This commit degrades the
scheduler status in case a higher number of replicas is requested.

Signed-off-by: Shereen Haj <shajmakh@redhat.com>
@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch 3 times, most recently from cc90aeb to 01d534d Compare May 27, 2026 15:17
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/wait/deployment.go (1)

27-40: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This waiter can still report success before a no-scale rollout is observed.

IsDeploymentComplete(dp.Generation, ...) only proves the status is not older than the pre-existing Deployment generation. When the rollout is triggered indirectly and keeps the same replica count, callers like test/e2e/sched/sched_test.go Line 80 and Line 135 pass a Deployment fetched before the CR update, so ObservedGeneration == dp.Generation is already true and this can return before the new image/spec is actually rolled out.

That weakens the new “generation-aware” contract for exactly the same-replica updates this PR adds. The waiter needs a way to observe a newer Deployment generation for indirect rollouts, or a separate helper for “wait for reconcile after parent object change” instead of reusing the stale generation from the caller.

Also applies to: 66-72

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/wait/deployment.go` around lines 27 - 40, The waiter currently can
return success using the caller's stale dp.Generation; change
ForDeploymentCompleteWithReplicas (and the similar block at 66-72) so it will
not treat the Deployment as complete until a newer generation has been observed:
require that either updatedDp.Generation > dp.Generation or
updatedDp.Status.ObservedGeneration > dp.Generation (i.e., the controller has
observed a newer generation) before allowing IsDeploymentComplete to return
success; in practice, check the
updatedDp.Generation/updatedDp.Status.ObservedGeneration and only call/accept
IsDeploymentComplete when the newer generation has been observed, otherwise keep
polling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/sched/sched_test.go`:
- Around line 653-660: The helper infoUponDeploymentCompleteFailure logs
deployment status but only iterates pods when
podlist.With(e2eclient.Client).ByDeployment(...) returns an error, which is
inverted; change the control flow so that after calling
podlist.With(...).ByDeployment(ctx, *schedDp) you iterate and log each pod
(using pods range and klog.InfoS with "podName"/"podStatus") when err == nil,
and handle the error case by logging the retrieval error (include err) instead;
update the branches around the call to ByDeployment to reflect this corrected
logic.

In `@test/e2e/serial/tests/scheduler_removal.go`:
- Around line 162-163: The wait call mistakenly calls Interval twice which
overwrites the first interval; update the chain that calls
wait.With(fxt.Client).Interval(2*time.Second).Interval(30*time.Second).ForDeploymentCompleteWithReplicas(...)
so the second Interval becomes a Timeout (e.g., Timeout(30*time.Second)) to
bound the overall wait while keeping a 2s poll cadence; target the wait.With /
Interval / Timeout / ForDeploymentCompleteWithReplicas chain in
scheduler_removal.go and ensure the poll interval remains 2s and the overall
timeout is 30s.

In `@test/e2e/serial/tests/workload_placement.go`:
- Line 286: After triggering a rollout and assigning updatedDp, change calls
that pass the original dp into the generation-aware waiter to instead pass the
updated deployment object returned from the update; specifically update the
three calls to
wait.With(fxt.Client).Interval(...).Timeout(...).ForDeploymentCompleteWithReplicas(...)
so they use updatedDp (the result of the update) rather than dp to ensure the
waiter observes the new generation when waiting for replicas, keeping references
to function ForDeploymentCompleteWithReplicas and variables dp/updatedDp to
locate the sites.

---

Outside diff comments:
In `@internal/wait/deployment.go`:
- Around line 27-40: The waiter currently can return success using the caller's
stale dp.Generation; change ForDeploymentCompleteWithReplicas (and the similar
block at 66-72) so it will not treat the Deployment as complete until a newer
generation has been observed: require that either updatedDp.Generation >
dp.Generation or updatedDp.Status.ObservedGeneration > dp.Generation (i.e., the
controller has observed a newer generation) before allowing IsDeploymentComplete
to return success; in practice, check the
updatedDp.Generation/updatedDp.Status.ObservedGeneration and only call/accept
IsDeploymentComplete when the newer generation has been observed, otherwise keep
polling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 41902836-39ef-489b-a0b6-58348bb7bdfb

📥 Commits

Reviewing files that changed from the base of the PR and between 098333c and 4f252a0.

📒 Files selected for processing (14)
  • internal/controller/numaresourcesscheduler_controller.go
  • internal/controller/numaresourcesscheduler_controller_test.go
  • internal/wait/deployment.go
  • pkg/objectupdate/sched/sched.go
  • pkg/objectupdate/sched/sched_test.go
  • test/e2e/sched/sched_test.go
  • test/e2e/serial/tests/configuration.go
  • test/e2e/serial/tests/non_regression.go
  • test/e2e/serial/tests/resource_accounting.go
  • test/e2e/serial/tests/scheduler_removal.go
  • test/e2e/serial/tests/workload_overhead.go
  • test/e2e/serial/tests/workload_placement.go
  • test/e2e/serial/tests/workload_placement_nodelabel.go
  • test/e2e/serial/tests/workload_placement_tmpol.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/objectupdate/sched/sched_test.go
  • pkg/objectupdate/sched/sched.go
  • internal/controller/numaresourcesscheduler_controller.go
  • internal/controller/numaresourcesscheduler_controller_test.go

Comment thread test/e2e/sched/sched_test.go
Comment thread test/e2e/serial/tests/scheduler_removal.go Outdated
Comment thread test/e2e/serial/tests/workload_placement.go Outdated
@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch from b1ee4d6 to ef1ff13 Compare May 28, 2026 07:11
Comment thread internal/wait/deployment.go Outdated
Comment on lines +53 to +75
func areDeploymentReplicasAvailable(newStatus *appsv1.DeploymentStatus, replicas int32) bool {
return newStatus.UpdatedReplicas == replicas &&
newStatus.Replicas == replicas &&
newStatus.AvailableReplicas == replicas
if newStatus.Replicas != replicas {
klog.InfoS("newStatus.Replicas", "expected", replicas, "found", newStatus.Replicas)
return false
}
if newStatus.UpdatedReplicas != replicas {
klog.InfoS("newStatus.UpdatedReplicas", "expected", replicas, "found", newStatus.UpdatedReplicas)
return false
}
if newStatus.AvailableReplicas != replicas {
klog.InfoS("newStatus.AvailableReplicas", "expected", replicas, "found", newStatus.AvailableReplicas)
return false
}
return true
}

func IsDeploymentComplete(dp *appsv1.Deployment, newStatus *appsv1.DeploymentStatus) bool {
return areDeploymentReplicasAvailable(newStatus, *(dp.Spec.Replicas)) &&
newStatus.ObservedGeneration >= dp.Generation
if newStatus.ObservedGeneration < dp.Generation {
klog.InfoS("generation is older than expected to indicate the deployment is complete", "old", dp.Generation, "new", newStatus.ObservedGeneration)
return false
}

return areDeploymentReplicasAvailable(newStatus, *(dp.Spec.Replicas))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit heavy handed. Currently this is called only in test code, but the moment we pull in production code we will have log spam. There are options, but all require more work to fix. Can we simplify this commit or drop it entirely?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was needed for debugging, it essentially describes why the functions fail which is why we have the last fixing commit so this is very helpful details for troubleshooting the tests.
If ever we decide to use that in production, we need to be aware of the old use which is initially meant for tests. we can anyway create a production version and improve it then as we see fit.
Thinking about the backport, I think it's better to keep the old function ForDeploymentComplete to reduce any tests conflicts which will then simplify the commit a bit in terms of the adaptations across the codebase. would that sound good enough?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discouraging production use could be as simple as move this code in a subtree of test/ perhaps test/internal/wait/... but I won't do in this PR.

A way forward would be post a followup PR with first the trivial move and then adding the logs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually let's move forward and let's revisit later

Comment thread test/e2e/sched/sched_test.go Outdated
// not enough to guarantee that the deployment is ready with the NEW replica, thus need to cover that by
// additional checks as the context requires
// ForDeploymentCompleteWithReplicas waits for the deployment to be complete and have the specified number of replicas.
func (wt Waiter) ForDeploymentCompleteWithReplicas(ctx context.Context, dp *appsv1.Deployment, newExpectedReplicas int32) (*appsv1.Deployment, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good addition. If most of the codebase actually uses the pattern

ForDeploymentCompleteWithReplicas(ctx, dp, *dp.Spec.Replicas)

and at glance it seems the case, it's worth keeping the old API ForDeploymentComplete

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally updated the codebase because ForDeploymentComplete without having the correct expected replicas count does not sound complete and can be confusing. Internally in the function, replicas check is a core one and in order to avoid using the incorrect function for deployment waiter I overrode the old API. This was actually an old bug as noted in the omitted comment; the new version is expected to handle all deployments now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so let's just rename ForDeploymentCompleteWithReplicas to ForDeploymentComplete

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the latest push I restored ForDeploymentComplete to keep the diff as less as possible to reduce conflicts while backporting. I can address changing the whole code base in a followup PR including #4177 (comment)

Comment thread internal/wait/deployment.go
Copy link
Copy Markdown
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

we can defer the remaining comments to followups

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shajmakh
Copy link
Copy Markdown
Member Author

/hold
for updates to make the automated backport process smoother

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 28, 2026
@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch from ef1ff13 to e3a5016 Compare May 28, 2026 12:17
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 28, 2026
shajmakh added 3 commits May 28, 2026 15:21
To evenly spread the pods across the nodes in a a balanced way taking
into account the new replicaset pods while rollout.

Signed-off-by: Shereen Haj <shajmakh@redhat.com>
The check includes many verification steps that only when all passed the
function pass. Sometimes the check fails without extra details of what
exactly was the step that failed. This commit adds debugging logs to
provide these details.

Signed-off-by: Shereen Haj <shajmakh@redhat.com>
So far the old function was waiting for the deployment to be ready with
the same replicas count as the old one. It is however possible that the
new deployment version includes a different replicas field hence should
expect to a different replicas count. The old version would fail on such
case where the old count vs the new one are not the same.
This commit fixes this issue by introducing new function that accepts the
expected new replicas count and compare against it.
The old function API still exists but is redirected to use the new
function with same old behavior.
Existing tests are not affected by this.

Signed-off-by: Shereen Haj <shajmakh@redhat.com>
@shajmakh shajmakh force-pushed the tsc-for-ha-sched branch from e3a5016 to f30a924 Compare May 28, 2026 12:21
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@openshift-ci-robot
Copy link
Copy Markdown

@shajmakh: This pull request references Jira Issue OCPBUGS-79418, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rshemtov13

Details

In response to this:

To evenly spread the pods across the nodes in a balanced way taking
into account the new replicaset pods while rollout.

previous discussions:
(reverted) #4021
(cancelled due to regressions) #4028

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2026
@shajmakh
Copy link
Copy Markdown
Member Author

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2026
@shajmakh
Copy link
Copy Markdown
Member Author

/cherry-pick release-4.22,release-4.21, release-4.20

@openshift-cherrypick-robot
Copy link
Copy Markdown

@shajmakh: once the present PR merges, I will cherry-pick it on top of release-4.22,release-4.21, in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.22,release-4.21, release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit 2f6eb77 into openshift-kni:main May 29, 2026
21 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@shajmakh: Jira Issue OCPBUGS-79418: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-79418 has been moved to the MODIFIED state.

Details

In response to this:

To evenly spread the pods across the nodes in a balanced way taking
into account the new replicaset pods while rollout.

previous discussions:
(reverted) #4021
(cancelled due to regressions) #4028

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@shajmakh: cannot checkout release-4.22,release-4.21,: error checking out "release-4.22,release-4.21,": exit status 1 error: pathspec 'release-4.22,release-4.21,' did not match any file(s) known to git

Details

In response to this:

/cherry-pick release-4.22,release-4.21, release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shajmakh
Copy link
Copy Markdown
Member Author

/cherry-pick release-4.22 release-4.21 release-4.20

@openshift-cherrypick-robot
Copy link
Copy Markdown

@shajmakh: new pull request created: #4265

Details

In response to this:

/cherry-pick release-4.22 release-4.21 release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants