OCPNODE-4443: Add runc upgradeable guard to block upgrades on RHEL 10 streams#5891
OCPNODE-4443: Add runc upgradeable guard to block upgrades on RHEL 10 streams#5891bitoku wants to merge 1 commit intoopenshift:mainfrom
Conversation
Add an upgradeable guard that prevents cluster upgrades when any MachineConfigPool targeting a RHEL 10 / CentOS 10 OS image stream has runc as its default container runtime. Extend cfeEvalRunc() to also detect runc set via raw MachineConfig CRI-O drop-in files, not just ContainerRuntimeConfig CRs. Assisted-by: Claude Code <https://claude.com/claude-code>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@bitoku: This pull request references OCPNODE-4443 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe changes add runc container runtime upgrade detection and blocking logic. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/hold |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bitoku The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/operator/status_test.go (1)
1487-1518: Good test coverage for RHEL 9 exemption.This test case properly verifies that pools on RHEL 9 streams are not blocked even when runc is configured. This is critical for ensuring the upgrade guard only affects RHEL 10 / CentOS 10 targets.
Consider adding a similar test case for CentOS 10 to verify that both
osimagestream.StreamNameRHEL10andosimagestream.StreamNameCentOS10trigger the upgrade block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/status_test.go` around lines 1487 - 1518, Add a test parallel to "runc not blocked on rhel-9 stream" that sets the OSImageStream.Status.DefaultStream to osimagestream.StreamNameCentOS10 (and optionally another for osimagestream.StreamNameRHEL10) and asserts that optr.checkRuncUpgradeableGuard() returns upgradeBlock == true, non-empty message, and no error; reuse the same Operator setup (mcLister with rendered-worker config containing the runc drop-in, osImageStreamLister, fgHandler) and reference the Operator.checkRuncUpgradeableGuard method, the osImageStreamLister/osisIndexer setup, and mockMCPLister/mockMCLister used in the existing test to locate where to add the new test(s).pkg/operator/status.go (1)
506-514: Consider whether TOML parse failures should be non-fatal.Currently, a malformed TOML drop-in in any pool's rendered MachineConfig will cause the entire detection to fail, potentially preventing the operator from evaluating runc status for all pools.
If the TOML is malformed due to a user-provided MachineConfig, consider logging a warning and continuing rather than returning an error. However, if you expect all drop-ins to be well-formed since they're generated by the controller, the current behavior is appropriate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/status.go` around lines 506 - 514, The TOML decoding failure currently returns an error and aborts detection; change this to log a warning and continue so one malformed CRI-O drop-in doesn't fail the whole routine: in the block that decodes into crioRuntimeConfig using toml.NewDecoder(bytes.NewReader(fileData)).Decode(&cfg), replace the return of fmt.Errorf(...) with a non-fatal log (e.g., klog.Warningf or the package's logger) that includes path and mc.Name and the decode error, then skip using cfg for that drop-in and continue processing to allow effectiveRuntime evaluation from other drop-ins; keep the existing behavior (return error) only if you determine the drop-ins should always be trusted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/operator/status_test.go`:
- Around line 1487-1518: Add a test parallel to "runc not blocked on rhel-9
stream" that sets the OSImageStream.Status.DefaultStream to
osimagestream.StreamNameCentOS10 (and optionally another for
osimagestream.StreamNameRHEL10) and asserts that
optr.checkRuncUpgradeableGuard() returns upgradeBlock == true, non-empty
message, and no error; reuse the same Operator setup (mcLister with
rendered-worker config containing the runc drop-in, osImageStreamLister,
fgHandler) and reference the Operator.checkRuncUpgradeableGuard method, the
osImageStreamLister/osisIndexer setup, and mockMCPLister/mockMCLister used in
the existing test to locate where to add the new test(s).
In `@pkg/operator/status.go`:
- Around line 506-514: The TOML decoding failure currently returns an error and
aborts detection; change this to log a warning and continue so one malformed
CRI-O drop-in doesn't fail the whole routine: in the block that decodes into
crioRuntimeConfig using toml.NewDecoder(bytes.NewReader(fileData)).Decode(&cfg),
replace the return of fmt.Errorf(...) with a non-fatal log (e.g., klog.Warningf
or the package's logger) that includes path and mc.Name and the decode error,
then skip using cfg for that drop-in and continue processing to allow
effectiveRuntime evaluation from other drop-ins; keep the existing behavior
(return error) only if you determine the drop-ins should always be trusted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 326ee946-c807-400a-9eb7-82bffd5370e1
📒 Files selected for processing (2)
pkg/operator/status.gopkg/operator/status_test.go
|
@bitoku: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Assisted-by: Claude Code https://claude.com/claude-code
- What I did
Block cluster upgrades if any MachineConfigPool targeting a RHEL 10 or CentOS 10 OS image stream has runc configured as the default container runtime. Each pool's effective stream is resolved via the OSImageStream API, so upgraded clusters that remain on RHCOS 9 are not affected.
- How to verify it
e2e, runc cluster upgrade to rhcos 10.
- Description for the changelog
Block cluster upgrades if any MachineConfigPool targeting a RHEL 10 or CentOS 10 OS image stream has runc configured as the default container runtime.
Summary by CodeRabbit
New Features
Tests