conditional use of ObservedGeneration by ffromani · Pull Request #2732 · openshift-kni/numaresources-operator

ffromani · 2025-12-18T10:13:34Z

in PR #2431 we added reporting of ObservedGeneration in the NUMAResourcesOperator.Status. This is a good change and closes a known gap, but we can't just use the value unconditionally in e2e tests, because a long tail of old versions (pre #2722) won't expose this value, which will create a storm of false negatives tests.

Ignoring the value till all the version supports it seems excessive; reverting the tests is wrong, reverting the change seems worse. The best way forward seems to be a more granular check, introduced in this PR.

This creates a challenge for future tests. The best option is probably to use ObservedGeneration only in the new tests and stop refactoring of existing codebase. Either way, we will need to be more careful in the future.

openshift-ci · 2025-12-18T10:13:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shajmakh · 2025-12-19T12:39:01Z

+	// TODO: until all the version reports ObservedGeneration
+	if serialconfig.Config != nil && !slices.Contains(serialconfig.Config.ClusterTopics.Active, "operobsgen") {
+		return true
+	}


I think this should be removed, we still want to verify the condition is available

not sure I got you right: we check the condition before, don't we?

this API version is very vaguely correlated with the operator version. Change the value to emphasize this fact. Signed-off-by: Francesco Romani <fromani@redhat.com>

the fact it was implicit was an implementation accident we can now rectify. Signed-off-by: Francesco Romani <fromani@redhat.com>

cluster topics are expected not to change mid-run of the serial tests, so collecting them once at startup seems like a good compromise. The topics will be available through the shared serial config object. Signed-off-by: Francesco Romani <fromani@redhat.com>

coderabbitai · 2026-05-14T09:41:13Z

Warning

Rate limit exceeded

@ffromani has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 6 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 4d2a3d60-950e-49a5-bad6-0ff8e57d9908

📥 Commits

Reviewing files that changed from the base of the PR and between b2dbbae and cffa82f.

📒 Files selected for processing (2)

internal/api/features/_topics.json
test/e2e/serial/tests/configuration.go

📝 Walkthrough

Walkthrough

This PR extends E2E test infrastructure to discover and respect operator feature capabilities. It registers a new "operobsgen" feature topic, introduces in-cluster feature introspection, threads context through test fixtures, and updates test assertions to gracefully handle operators that may lack certain features.

Changes

Feature Topic Introspection for E2E Tests

Layer / File(s)	Summary
Feature topic registry update `internal/api/features/_topics.json`, `internal/api/features/types.go`	Active topics list now includes "operobsgen"; feature API version bumped from v5.0.0 to v1.0.0.
Cluster feature introspection `test/e2e/serial/config/infra.go`	New `FetchClusterTopics` helper executes `--inspect-features` inside the operator pod, unmarshals and validates the result, and gracefully returns empty result if the operator lacks introspection support.
Test fixture context threading `test/internal/fixture/fixture.go`	`SetupWithOptions` now accepts explicit `context.Context` parameter; `Setup` passes `context.Background()` to preserve existing behavior.
E2E config cluster topics integration `test/e2e/serial/config/fixture.go`	Added `ClusterTopics` field to `E2EConfig` struct; refactored `NewFixtureWithOptions` to accept and thread context through fixture and client operations; integrated `FetchClusterTopics` call during setup to populate cluster feature information.
Feature-aware test validation `test/e2e/serial/tests/configuration.go`	Updated `isNROOperSyncedAt` and new `isNROOperUpToDate` helpers to conditionally bypass `ObservedGeneration` checks when "operobsgen" is not in cluster topics; added slices import and minor formatting in daemonset verification.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'conditional use of ObservedGeneration' directly and clearly describes the main change: introducing conditional logic to handle ObservedGeneration availability across different operator versions.
Description check	✅ Passed	The description is directly related to the changeset, explaining the context of why conditional ObservedGeneration usage is needed and the rationale for this approach over alternatives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch operobsgen-topic

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/serial/config/infra.go`:
- Around line 231-233: The code currently swallows all errors from the
CommandOnPod call by returning features.NewTopicInfo(), nil; instead, change the
error handling so only known “feature unsupported” conditions are treated as
non-fatal and all other errors are returned (or wrapped) to fail the test.
Locate the CommandOnPod invocation and replace the unconditional if err != nil {
return features.NewTopicInfo(), nil } with logic that inspects the error (type
or message) and for genuine unsupported indicators returns
features.NewTopicInfo(), nil, but otherwise returns nil,
fmt.Errorf("introspection failed: %w", err) (or simply return the error) so real
cluster/test failures aren’t masked.

In `@test/e2e/serial/tests/configuration.go`:
- Around line 1862-1868: The call to the undefined helper isNROOperAvailableAt
in isNROOperUpToDate is causing the build to fail; either replace that call with
the existing helper that matches its intent (e.g., isNROOperAvailable) or
implement a new helper named isNROOperAvailableAt(status
*nropv1.NUMAResourcesOperatorStatus, gen int64) that checks availability against
the provided generation; modify isNROOperUpToDate to call the defined helper
(isNROOperAvailable or the newly implemented isNROOperAvailableAt) so the
function compiles and performs the same generation-aware availability check.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 6d3dd813-6e38-466c-b2ff-b91e934117d2

📥 Commits

Reviewing files that changed from the base of the PR and between ce752f9 and b2dbbae.

📒 Files selected for processing (6)

internal/api/features/_topics.json
internal/api/features/types.go
test/e2e/serial/config/fixture.go
test/e2e/serial/config/infra.go
test/e2e/serial/tests/configuration.go
test/internal/fixture/fixture.go

coderabbitai · 2026-05-14T09:44:56Z

+	if err != nil {
+		return features.NewTopicInfo(), nil
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow all introspection execution failures.

This currently treats every CommandOnPod failure as “feature unsupported”, which can mask real cluster/test failures and silently disable the stricter checks.

Suggested fix

import ( "context" "encoding/json" "fmt" "os" + "strings" "sync" "time" @@ stdout, stderr, err := remoteexec.CommandOnPod(ctx, fxt.K8sClient, pod, cmdline...) - // older version may miss introspection support that's fine if err != nil { - return features.NewTopicInfo(), nil + serr := strings.ToLower(stderr) + // older versions may miss introspection support; only ignore that class of errors + if strings.Contains(serr, "inspect-features") && (strings.Contains(serr, "unknown") || strings.Contains(serr, "not found")) { + return features.NewTopicInfo(), nil + } + return features.NewTopicInfo(), fmt.Errorf("cannot inspect cluster features: %w (stderr=%q)", err, stderr) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if err != nil {

return features.NewTopicInfo(), nil

}

import (

"context"

"encoding/json"

"fmt"

"os"

"strings"

"sync"

"time"

// ... rest of imports

)

stdout, stderr, err := remoteexec.CommandOnPod(ctx, fxt.K8sClient, pod, cmdline...)

if err != nil {

serr := strings.ToLower(stderr)

// older versions may miss introspection support; only ignore that class of errors

if strings.Contains(serr, "inspect-features") && (strings.Contains(serr, "unknown") || strings.Contains(serr, "not found")) {

return features.NewTopicInfo(), nil

}

return features.NewTopicInfo(), fmt.Errorf("cannot inspect cluster features: %w (stderr=%q)", err, stderr)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/serial/config/infra.go` around lines 231 - 233, The code currently swallows all errors from the CommandOnPod call by returning features.NewTopicInfo(), nil; instead, change the error handling so only known “feature unsupported” conditions are treated as non-fatal and all other errors are returned (or wrapped) to fail the test. Locate the CommandOnPod invocation and replace the unconditional if err != nil { return features.NewTopicInfo(), nil } with logic that inspects the error (type or message) and for genuine unsupported indicators returns features.NewTopicInfo(), nil, but otherwise returns nil, fmt.Errorf("introspection failed: %w", err) (or simply return the error) so real cluster/test failures aren’t masked.

report if the NUMAResourcesOperator object exposes the ObservedGeneration in status, so we can consume the value in the e2e tests and have more robust runs. Signed-off-by: Francesco Romani <fromani@redhat.com>

We can't reliably depend on isNROOperAvailableAt because older versions (pre 4.21) don't expose the value, so all the checks will fail (observedGeneration stuck at 0). So we need to write tests carefully supporting both flows. Signed-off-by: Francesco Romani <fromani@redhat.com>

Tal-or · 2026-05-14T10:56:34Z

/retest

Tal-or

This PR has good intentions but if I may, I would like to suggest different approach:

backport #2722 all the way to 4.18 - the change is relatively small and scoped.
change the check in the isNROOperSyncedAt function:
from:

if cond.ObservedGeneration != gen {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}

to:
if cond.ObservedGeneration != gen && cond.ObservedGeneration != 0 {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}

if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

Tal-or · 2026-05-14T10:58:08Z

+	// The payload (e.g. active topics) may and will changes even
+	// within the same operator version; e.g. is totally legit
+	// to add more topics during a Z stream.
+	Version = "v1.0.0"


Will be really hard to track, or i'll have to be more blunt we'll simply forget to update this value.
Without an automated check it won't fly IMVHO, which I'm sure will also add more complexity.

likely correct, but using the version as before is pointless. We can just remove the version at that point. Either it's dependable signal or should be ignored.

ffromani · 2026-05-14T11:20:11Z

This PR has good intentions but if I may, I would like to suggest different approach:
1. backport #2722 all the way to 4.18 - the change is relatively small and scoped.

2. change  the check in the `isNROOperSyncedAt` function:
   from:
if cond.ObservedGeneration != gen {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}
to:
if cond.ObservedGeneration != gen && cond.ObservedGeneration != 0 {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}
if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

It's an interesting alternative. I'm ok backporting #2722 down to 4.18, but I'm not sure that is the oldest supported version, and therefore makes this PR useless.

Tal-or · 2026-05-14T11:22:31Z

if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

It's an interesting alternative. I'm ok backporting #2722 down to 4.18, but I'm not sure that is the oldest supported version, and therefore makes this PR useless.

yes but adding also suggestion 2 will do.

ffromani · 2026-05-14T11:40:23Z

if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

It's an interesting alternative. I'm ok backporting #2722 down to 4.18, but I'm not sure that is the oldest supported version, and therefore makes this PR useless.

yes but adding also suggestion 2 will do.

True. But here's the value proposition of this PR: don't do guesswork with ambiguous valuye (0), but rather get an explicit signal.

ffromani · 2026-05-26T13:50:47Z

in the end #4063 made this unnecessary

openshift-ci Bot requested review from shajmakh and swatisehgal December 18, 2025 10:13

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025

ffromani mentioned this pull request Dec 18, 2025

release-4.20: status: set ObservedGeneration in all status conds #2722

Merged

ffromani force-pushed the operobsgen-topic branch from 8e664e1 to c439e46 Compare December 18, 2025 10:24

shajmakh reviewed Dec 19, 2025

View reviewed changes

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2026

ffromani added 3 commits May 14, 2026 11:39

intapi: features: fix and clarify version

682ace0

this API version is very vaguely correlated with the operator version. Change the value to emphasize this fact. Signed-off-by: Francesco Romani <fromani@redhat.com>

e2e: fixture: use explicit context

173152b

the fact it was implicit was an implementation accident we can now rectify. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the operobsgen-topic branch from c439e46 to b2dbbae Compare May 14, 2026 09:40

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

ffromani added 2 commits May 14, 2026 11:50

topics: add and use operobsgen

8802e54

report if the NUMAResourcesOperator object exposes the ObservedGeneration in status, so we can consume the value in the e2e tests and have more robust runs. Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the operobsgen-topic branch from b2dbbae to cffa82f Compare May 14, 2026 09:50

Tal-or reviewed May 14, 2026

View reviewed changes

-	if err != nil {
-		return features.NewTopicInfo(), nil
-	}
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"os"
+	"strings"
+	"sync"
+	"time"
+	// ... rest of imports
+)
+	stdout, stderr, err := remoteexec.CommandOnPod(ctx, fxt.K8sClient, pod, cmdline...)
+	if err != nil {
+		serr := strings.ToLower(stderr)
+		// older versions may miss introspection support; only ignore that class of errors
+		if strings.Contains(serr, "inspect-features") && (strings.Contains(serr, "unknown") || strings.Contains(serr, "not found")) {
+			return features.NewTopicInfo(), nil
+		}
+		return features.NewTopicInfo(), fmt.Errorf("cannot inspect cluster features: %w (stderr=%q)", err, stderr)
+	}

Conversation

ffromani commented Dec 18, 2025

Uh oh!

openshift-ci Bot commented Dec 18, 2025

Uh oh!

shajmakh Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

ffromani May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tal-or commented May 14, 2026

Uh oh!

Tal-or left a comment

Choose a reason for hiding this comment

Uh oh!

Tal-or May 14, 2026

Choose a reason for hiding this comment

Uh oh!

ffromani May 14, 2026

Choose a reason for hiding this comment

Uh oh!

ffromani commented May 14, 2026

Uh oh!

Tal-or commented May 14, 2026 • edited by ffromani Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ffromani commented May 14, 2026

Uh oh!

ffromani commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Tal-or commented May 14, 2026 •

edited by ffromani

Loading