Skip to content

conditional use of ObservedGeneration#2732

Open
ffromani wants to merge 5 commits into
mainfrom
operobsgen-topic
Open

conditional use of ObservedGeneration#2732
ffromani wants to merge 5 commits into
mainfrom
operobsgen-topic

Conversation

@ffromani
Copy link
Copy Markdown
Member

in PR #2431 we added reporting of ObservedGeneration in the NUMAResourcesOperator.Status. This is a good change and closes a known gap, but we can't just use the value unconditionally in e2e tests, because a long tail of old versions (pre #2722) won't expose this value, which will create a storm of false negatives tests.

Ignoring the value till all the version supports it seems excessive; reverting the tests is wrong, reverting the change seems worse. The best way forward seems to be a more granular check, introduced in this PR.

This creates a challenge for future tests. The best option is probably to use ObservedGeneration only in the new tests and stop refactoring of existing codebase. Either way, we will need to be more careful in the future.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Dec 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025
Comment thread test/e2e/serial/tests/configuration.go Outdated
Comment on lines +1533 to +1536
// TODO: until all the version reports ObservedGeneration
if serialconfig.Config != nil && !slices.Contains(serialconfig.Config.ClusterTopics.Active, "operobsgen") {
return true
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be removed, we still want to verify the condition is available

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I got you right: we check the condition before, don't we?

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2026
ffromani added 3 commits May 14, 2026 11:39
this API version is very vaguely correlated with
the operator version. Change the value to emphasize this fact.

Signed-off-by: Francesco Romani <fromani@redhat.com>
the fact it was implicit was an implementation accident
we can now rectify.

Signed-off-by: Francesco Romani <fromani@redhat.com>
cluster topics are expected not to change mid-run
of the serial tests, so collecting them once
at startup seems like a good compromise.

The topics will be available through the shared
serial config object.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the operobsgen-topic branch from c439e46 to b2dbbae Compare May 14, 2026 09:40
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Warning

Rate limit exceeded

@ffromani has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 6 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 4d2a3d60-950e-49a5-bad6-0ff8e57d9908

📥 Commits

Reviewing files that changed from the base of the PR and between b2dbbae and cffa82f.

📒 Files selected for processing (2)
  • internal/api/features/_topics.json
  • test/e2e/serial/tests/configuration.go
📝 Walkthrough

Walkthrough

This PR extends E2E test infrastructure to discover and respect operator feature capabilities. It registers a new "operobsgen" feature topic, introduces in-cluster feature introspection, threads context through test fixtures, and updates test assertions to gracefully handle operators that may lack certain features.

Changes

Feature Topic Introspection for E2E Tests

Layer / File(s) Summary
Feature topic registry update
internal/api/features/_topics.json, internal/api/features/types.go
Active topics list now includes "operobsgen"; feature API version bumped from v5.0.0 to v1.0.0.
Cluster feature introspection
test/e2e/serial/config/infra.go
New FetchClusterTopics helper executes --inspect-features inside the operator pod, unmarshals and validates the result, and gracefully returns empty result if the operator lacks introspection support.
Test fixture context threading
test/internal/fixture/fixture.go
SetupWithOptions now accepts explicit context.Context parameter; Setup passes context.Background() to preserve existing behavior.
E2E config cluster topics integration
test/e2e/serial/config/fixture.go
Added ClusterTopics field to E2EConfig struct; refactored NewFixtureWithOptions to accept and thread context through fixture and client operations; integrated FetchClusterTopics call during setup to populate cluster feature information.
Feature-aware test validation
test/e2e/serial/tests/configuration.go
Updated isNROOperSyncedAt and new isNROOperUpToDate helpers to conditionally bypass ObservedGeneration checks when "operobsgen" is not in cluster topics; added slices import and minor formatting in daemonset verification.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'conditional use of ObservedGeneration' directly and clearly describes the main change: introducing conditional logic to handle ObservedGeneration availability across different operator versions.
Description check ✅ Passed The description is directly related to the changeset, explaining the context of why conditional ObservedGeneration usage is needed and the rationale for this approach over alternatives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch operobsgen-topic

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/serial/config/infra.go`:
- Around line 231-233: The code currently swallows all errors from the
CommandOnPod call by returning features.NewTopicInfo(), nil; instead, change the
error handling so only known “feature unsupported” conditions are treated as
non-fatal and all other errors are returned (or wrapped) to fail the test.
Locate the CommandOnPod invocation and replace the unconditional if err != nil {
return features.NewTopicInfo(), nil } with logic that inspects the error (type
or message) and for genuine unsupported indicators returns
features.NewTopicInfo(), nil, but otherwise returns nil,
fmt.Errorf("introspection failed: %w", err) (or simply return the error) so real
cluster/test failures aren’t masked.

In `@test/e2e/serial/tests/configuration.go`:
- Around line 1862-1868: The call to the undefined helper isNROOperAvailableAt
in isNROOperUpToDate is causing the build to fail; either replace that call with
the existing helper that matches its intent (e.g., isNROOperAvailable) or
implement a new helper named isNROOperAvailableAt(status
*nropv1.NUMAResourcesOperatorStatus, gen int64) that checks availability against
the provided generation; modify isNROOperUpToDate to call the defined helper
(isNROOperAvailable or the newly implemented isNROOperAvailableAt) so the
function compiles and performs the same generation-aware availability check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 6d3dd813-6e38-466c-b2ff-b91e934117d2

📥 Commits

Reviewing files that changed from the base of the PR and between ce752f9 and b2dbbae.

📒 Files selected for processing (6)
  • internal/api/features/_topics.json
  • internal/api/features/types.go
  • test/e2e/serial/config/fixture.go
  • test/e2e/serial/config/infra.go
  • test/e2e/serial/tests/configuration.go
  • test/internal/fixture/fixture.go

Comment on lines +231 to +233
if err != nil {
return features.NewTopicInfo(), nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow all introspection execution failures.

This currently treats every CommandOnPod failure as “feature unsupported”, which can mask real cluster/test failures and silently disable the stricter checks.

Suggested fix
 import (
 	"context"
 	"encoding/json"
 	"fmt"
 	"os"
+	"strings"
 	"sync"
 	"time"
@@
 	stdout, stderr, err := remoteexec.CommandOnPod(ctx, fxt.K8sClient, pod, cmdline...)
-	// older version may miss introspection support that's fine
 	if err != nil {
-		return features.NewTopicInfo(), nil
+		serr := strings.ToLower(stderr)
+		// older versions may miss introspection support; only ignore that class of errors
+		if strings.Contains(serr, "inspect-features") && (strings.Contains(serr, "unknown") || strings.Contains(serr, "not found")) {
+			return features.NewTopicInfo(), nil
+		}
+		return features.NewTopicInfo(), fmt.Errorf("cannot inspect cluster features: %w (stderr=%q)", err, stderr)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
return features.NewTopicInfo(), nil
}
import (
"context"
"encoding/json"
"fmt"
"os"
"strings"
"sync"
"time"
// ... rest of imports
)
stdout, stderr, err := remoteexec.CommandOnPod(ctx, fxt.K8sClient, pod, cmdline...)
if err != nil {
serr := strings.ToLower(stderr)
// older versions may miss introspection support; only ignore that class of errors
if strings.Contains(serr, "inspect-features") && (strings.Contains(serr, "unknown") || strings.Contains(serr, "not found")) {
return features.NewTopicInfo(), nil
}
return features.NewTopicInfo(), fmt.Errorf("cannot inspect cluster features: %w (stderr=%q)", err, stderr)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/serial/config/infra.go` around lines 231 - 233, The code currently
swallows all errors from the CommandOnPod call by returning
features.NewTopicInfo(), nil; instead, change the error handling so only known
“feature unsupported” conditions are treated as non-fatal and all other errors
are returned (or wrapped) to fail the test. Locate the CommandOnPod invocation
and replace the unconditional if err != nil { return features.NewTopicInfo(),
nil } with logic that inspects the error (type or message) and for genuine
unsupported indicators returns features.NewTopicInfo(), nil, but otherwise
returns nil, fmt.Errorf("introspection failed: %w", err) (or simply return the
error) so real cluster/test failures aren’t masked.

Comment thread test/e2e/serial/tests/configuration.go Outdated
ffromani added 2 commits May 14, 2026 11:50
report if the NUMAResourcesOperator object exposes
the ObservedGeneration in status, so we can consume
the value in the e2e tests and have more robust runs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
We can't reliably depend on isNROOperAvailableAt
because older versions (pre 4.21) don't expose the value,
so all the checks will fail (observedGeneration stuck at 0).
So we need to write tests carefully supporting both flows.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the operobsgen-topic branch from b2dbbae to cffa82f Compare May 14, 2026 09:50
@Tal-or
Copy link
Copy Markdown
Collaborator

Tal-or commented May 14, 2026

/retest

Copy link
Copy Markdown
Collaborator

@Tal-or Tal-or left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has good intentions but if I may, I would like to suggest different approach:

  1. backport #2722 all the way to 4.18 - the change is relatively small and scoped.
  2. change the check in the isNROOperSyncedAt function:
    from:
if cond.ObservedGeneration != gen {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}
to:
if cond.ObservedGeneration != gen && cond.ObservedGeneration != 0 {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}

if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

// The payload (e.g. active topics) may and will changes even
// within the same operator version; e.g. is totally legit
// to add more topics during a Z stream.
Version = "v1.0.0"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be really hard to track, or i'll have to be more blunt we'll simply forget to update this value.
Without an automated check it won't fly IMVHO, which I'm sure will also add more complexity.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely correct, but using the version as before is pointless. We can just remove the version at that point. Either it's dependable signal or should be ignored.

@ffromani
Copy link
Copy Markdown
Member Author

This PR has good intentions but if I may, I would like to suggest different approach:

1. backport #2722 all the way to 4.18 - the change is relatively small and scoped.

2. change  the check in the `isNROOperSyncedAt` function:
   from:
if cond.ObservedGeneration != gen {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}
to:
if cond.ObservedGeneration != gen && cond.ObservedGeneration != 0 {
		return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
	}

if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

It's an interesting alternative. I'm ok backporting #2722 down to 4.18, but I'm not sure that is the oldest supported version, and therefore makes this PR useless.

@Tal-or
Copy link
Copy Markdown
Collaborator

Tal-or commented May 14, 2026

if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

It's an interesting alternative. I'm ok backporting #2722 down to 4.18, but I'm not sure that is the oldest supported version, and therefore makes this PR useless.

yes but adding also suggestion 2 will do.

@ffromani
Copy link
Copy Markdown
Member Author

if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.

It's an interesting alternative. I'm ok backporting #2722 down to 4.18, but I'm not sure that is the oldest supported version, and therefore makes this PR useless.

yes but adding also suggestion 2 will do.

True. But here's the value proposition of this PR: don't do guesswork with ambiguous valuye (0), but rather get an explicit signal.

@ffromani
Copy link
Copy Markdown
Member Author

in the end #4063 made this unnecessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants