conditional use of ObservedGeneration#2732
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8e664e1 to
c439e46
Compare
| // TODO: until all the version reports ObservedGeneration | ||
| if serialconfig.Config != nil && !slices.Contains(serialconfig.Config.ClusterTopics.Active, "operobsgen") { | ||
| return true | ||
| } |
There was a problem hiding this comment.
I think this should be removed, we still want to verify the condition is available
There was a problem hiding this comment.
not sure I got you right: we check the condition before, don't we?
this API version is very vaguely correlated with the operator version. Change the value to emphasize this fact. Signed-off-by: Francesco Romani <fromani@redhat.com>
the fact it was implicit was an implementation accident we can now rectify. Signed-off-by: Francesco Romani <fromani@redhat.com>
cluster topics are expected not to change mid-run of the serial tests, so collecting them once at startup seems like a good compromise. The topics will be available through the shared serial config object. Signed-off-by: Francesco Romani <fromani@redhat.com>
c439e46 to
b2dbbae
Compare
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR extends E2E test infrastructure to discover and respect operator feature capabilities. It registers a new "operobsgen" feature topic, introduces in-cluster feature introspection, threads context through test fixtures, and updates test assertions to gracefully handle operators that may lack certain features. ChangesFeature Topic Introspection for E2E Tests
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/serial/config/infra.go`:
- Around line 231-233: The code currently swallows all errors from the
CommandOnPod call by returning features.NewTopicInfo(), nil; instead, change the
error handling so only known “feature unsupported” conditions are treated as
non-fatal and all other errors are returned (or wrapped) to fail the test.
Locate the CommandOnPod invocation and replace the unconditional if err != nil {
return features.NewTopicInfo(), nil } with logic that inspects the error (type
or message) and for genuine unsupported indicators returns
features.NewTopicInfo(), nil, but otherwise returns nil,
fmt.Errorf("introspection failed: %w", err) (or simply return the error) so real
cluster/test failures aren’t masked.
In `@test/e2e/serial/tests/configuration.go`:
- Around line 1862-1868: The call to the undefined helper isNROOperAvailableAt
in isNROOperUpToDate is causing the build to fail; either replace that call with
the existing helper that matches its intent (e.g., isNROOperAvailable) or
implement a new helper named isNROOperAvailableAt(status
*nropv1.NUMAResourcesOperatorStatus, gen int64) that checks availability against
the provided generation; modify isNROOperUpToDate to call the defined helper
(isNROOperAvailable or the newly implemented isNROOperAvailableAt) so the
function compiles and performs the same generation-aware availability check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 6d3dd813-6e38-466c-b2ff-b91e934117d2
📒 Files selected for processing (6)
internal/api/features/_topics.jsoninternal/api/features/types.gotest/e2e/serial/config/fixture.gotest/e2e/serial/config/infra.gotest/e2e/serial/tests/configuration.gotest/internal/fixture/fixture.go
| if err != nil { | ||
| return features.NewTopicInfo(), nil | ||
| } |
There was a problem hiding this comment.
Don’t swallow all introspection execution failures.
This currently treats every CommandOnPod failure as “feature unsupported”, which can mask real cluster/test failures and silently disable the stricter checks.
Suggested fix
import (
"context"
"encoding/json"
"fmt"
"os"
+ "strings"
"sync"
"time"
@@
stdout, stderr, err := remoteexec.CommandOnPod(ctx, fxt.K8sClient, pod, cmdline...)
- // older version may miss introspection support that's fine
if err != nil {
- return features.NewTopicInfo(), nil
+ serr := strings.ToLower(stderr)
+ // older versions may miss introspection support; only ignore that class of errors
+ if strings.Contains(serr, "inspect-features") && (strings.Contains(serr, "unknown") || strings.Contains(serr, "not found")) {
+ return features.NewTopicInfo(), nil
+ }
+ return features.NewTopicInfo(), fmt.Errorf("cannot inspect cluster features: %w (stderr=%q)", err, stderr)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err != nil { | |
| return features.NewTopicInfo(), nil | |
| } | |
| import ( | |
| "context" | |
| "encoding/json" | |
| "fmt" | |
| "os" | |
| "strings" | |
| "sync" | |
| "time" | |
| // ... rest of imports | |
| ) | |
| stdout, stderr, err := remoteexec.CommandOnPod(ctx, fxt.K8sClient, pod, cmdline...) | |
| if err != nil { | |
| serr := strings.ToLower(stderr) | |
| // older versions may miss introspection support; only ignore that class of errors | |
| if strings.Contains(serr, "inspect-features") && (strings.Contains(serr, "unknown") || strings.Contains(serr, "not found")) { | |
| return features.NewTopicInfo(), nil | |
| } | |
| return features.NewTopicInfo(), fmt.Errorf("cannot inspect cluster features: %w (stderr=%q)", err, stderr) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/serial/config/infra.go` around lines 231 - 233, The code currently
swallows all errors from the CommandOnPod call by returning
features.NewTopicInfo(), nil; instead, change the error handling so only known
“feature unsupported” conditions are treated as non-fatal and all other errors
are returned (or wrapped) to fail the test. Locate the CommandOnPod invocation
and replace the unconditional if err != nil { return features.NewTopicInfo(),
nil } with logic that inspects the error (type or message) and for genuine
unsupported indicators returns features.NewTopicInfo(), nil, but otherwise
returns nil, fmt.Errorf("introspection failed: %w", err) (or simply return the
error) so real cluster/test failures aren’t masked.
report if the NUMAResourcesOperator object exposes the ObservedGeneration in status, so we can consume the value in the e2e tests and have more robust runs. Signed-off-by: Francesco Romani <fromani@redhat.com>
We can't reliably depend on isNROOperAvailableAt because older versions (pre 4.21) don't expose the value, so all the checks will fail (observedGeneration stuck at 0). So we need to write tests carefully supporting both flows. Signed-off-by: Francesco Romani <fromani@redhat.com>
b2dbbae to
cffa82f
Compare
|
/retest |
Tal-or
left a comment
There was a problem hiding this comment.
This PR has good intentions but if I may, I would like to suggest different approach:
- backport #2722 all the way to 4.18 - the change is relatively small and scoped.
- change the check in the
isNROOperSyncedAtfunction:
from:
if cond.ObservedGeneration != gen {
return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
}
to:
if cond.ObservedGeneration != gen && cond.ObservedGeneration != 0 {
return fmt.Errorf("condition at %d expected %d", cond.ObservedGeneration, gen)
}
if cond.ObservedGeneration == 0 it's likely because the operator is running the old code so don't fail the test on this part.
| // The payload (e.g. active topics) may and will changes even | ||
| // within the same operator version; e.g. is totally legit | ||
| // to add more topics during a Z stream. | ||
| Version = "v1.0.0" |
There was a problem hiding this comment.
Will be really hard to track, or i'll have to be more blunt we'll simply forget to update this value.
Without an automated check it won't fly IMVHO, which I'm sure will also add more complexity.
There was a problem hiding this comment.
likely correct, but using the version as before is pointless. We can just remove the version at that point. Either it's dependable signal or should be ignored.
It's an interesting alternative. I'm ok backporting #2722 down to 4.18, but I'm not sure that is the oldest supported version, and therefore makes this PR useless. |
yes but adding also suggestion |
True. But here's the value proposition of this PR: don't do guesswork with ambiguous valuye (0), but rather get an explicit signal. |
|
in the end #4063 made this unnecessary |
in PR #2431 we added reporting of
ObservedGenerationin theNUMAResourcesOperator.Status. This is a good change and closes a known gap, but we can't just use the value unconditionally in e2e tests, because a long tail of old versions (pre #2722) won't expose this value, which will create a storm of false negatives tests.Ignoring the value till all the version supports it seems excessive; reverting the tests is wrong, reverting the change seems worse. The best way forward seems to be a more granular check, introduced in this PR.
This creates a challenge for future tests. The best option is probably to use ObservedGeneration only in the new tests and stop refactoring of existing codebase. Either way, we will need to be more careful in the future.