Skip to content

OCPNODE-4443: Add runc upgradeable guard to block upgrades on RHEL 10 streams#5891

Open
bitoku wants to merge 1 commit intoopenshift:mainfrom
bitoku:atokubi/runc-upgradeable-guard
Open

OCPNODE-4443: Add runc upgradeable guard to block upgrades on RHEL 10 streams#5891
bitoku wants to merge 1 commit intoopenshift:mainfrom
bitoku:atokubi/runc-upgradeable-guard

Conversation

@bitoku
Copy link
Copy Markdown
Contributor

@bitoku bitoku commented Apr 28, 2026

Assisted-by: Claude Code https://claude.com/claude-code

- What I did

Block cluster upgrades if any MachineConfigPool targeting a RHEL 10 or CentOS 10 OS image stream has runc configured as the default container runtime. Each pool's effective stream is resolved via the OSImageStream API, so upgraded clusters that remain on RHCOS 9 are not affected.

- How to verify it

e2e, runc cluster upgrade to rhcos 10.

- Description for the changelog

Block cluster upgrades if any MachineConfigPool targeting a RHEL 10 or CentOS 10 OS image stream has runc configured as the default container runtime.

Summary by CodeRabbit

  • New Features

    • Cluster upgrades are now blocked when runc is detected as the effective container runtime for RHEL 10/CentOS 10 node pools, with detailed messaging explaining the restriction.
    • Enhanced container runtime detection from multiple configuration sources.
  • Tests

    • Added comprehensive test coverage for the new upgrade gating mechanism and runtime detection logic.

Add an upgradeable guard that prevents cluster upgrades when any
MachineConfigPool targeting a RHEL 10 / CentOS 10 OS image stream
has runc as its default container runtime. Extend cfeEvalRunc() to
also detect runc set via raw MachineConfig CRI-O drop-in files,
not just ContainerRuntimeConfig CRs.

Assisted-by: Claude Code <https://claude.com/claude-code>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 28, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 28, 2026

@bitoku: This pull request references OCPNODE-4443 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Assisted-by: Claude Code https://claude.com/claude-code

- What I did

Block cluster upgrades if any MachineConfigPool targeting a RHEL 10 or CentOS 10 OS image stream has runc configured as the default container runtime. Each pool's effective stream is resolved via the OSImageStream API, so upgraded clusters that remain on RHCOS 9 are not affected.

- How to verify it

e2e, runc cluster upgrade to rhcos 10.

- Description for the changelog

Block cluster upgrades if any MachineConfigPool targeting a RHEL 10 or CentOS 10 OS image stream has runc configured as the default container runtime.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Walkthrough

The changes add runc container runtime upgrade detection and blocking logic. A new checkRuncUpgradeableGuard() function prevents cluster upgrades when runc is the effective runtime on RHEL 10/CentOS 10 pools, detected from both ContainerRuntimeConfig objects and CRI-O drop-in configuration files in rendered MachineConfig data.

Changes

Cohort / File(s) Summary
Runc Upgrade Guard Implementation
pkg/operator/status.go
Added checkRuncUpgradeableGuard() function with CRI-O TOML parsing logic to detect runc as effective container runtime across pools. Modified cfeEvalRunc to evaluate both ContainerRuntimeConfig CRs and rendered MachineConfig ignition drop-in files under /etc/crio/crio.conf.d/. Sets upgrade block with reason RuncDefaultRuntimeConfigured when runc is detected on RHEL 10/CentOS 10 pools, with OS image stream fallback logic and OCP version-dependent documentation links.
Runc Detection Test Coverage
pkg/operator/status_test.go
Added comprehensive unit tests for runc upgrade gating including mock helpers for constructing MachineConfig CRIO drop-ins, simulating pool selection, and validating upgrade blocks. Covers lower-level detection (detectRuncInRenderedConfigs), evaluation logic (cfeEvalRunc with ContainerRuntimeConfig handling), and feature-stream-aware scenarios using OSImageStream lister.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning New tests for runc upgrade guard lack meaningful failure messages on multiple assertions, violating the assertion messaging requirement. Add meaningful failure messages to all assertions lacking them, such as converting assert.NoError(t, err) to include diagnostic text explaining what failed and why.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a runc upgradeable guard to block upgrades on RHEL 10 streams. It matches the PR objectives and the file changes (detecting runc configuration and blocking upgrades conditionally).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in newly added test functions are static string literals without dynamic values like UUIDs, timestamps, or runtime-generated identifiers.
Microshift Test Compatibility ✅ Passed The new tests in status_test.go are standard Go unit tests, not Ginkgo e2e tests, so the MicroShift-incompatible API check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed New tests are standard Go unit tests in pkg/operator/status_test.go, not Ginkgo e2e tests, making this check inapplicable.
Topology-Aware Scheduling Compatibility ✅ Passed Pull request introduces status evaluation and upgrade gating logic with no pod scheduling constraints or topology assumptions across OpenShift topologies.
Ote Binary Stdout Contract ✅ Passed The pull request modifies standard operator library code and unit tests, not OTE binary code. No process-level stdout writes violate the OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The pull request adds only standard Go unit tests using the testing package, not Ginkgo e2e tests, making the IPv6/disconnected network check inapplicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@bitoku
Copy link
Copy Markdown
Contributor Author

bitoku commented Apr 28, 2026

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bitoku
Once this PR has been reviewed and has the lgtm label, please assign umohnani8 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/operator/status_test.go (1)

1487-1518: Good test coverage for RHEL 9 exemption.

This test case properly verifies that pools on RHEL 9 streams are not blocked even when runc is configured. This is critical for ensuring the upgrade guard only affects RHEL 10 / CentOS 10 targets.

Consider adding a similar test case for CentOS 10 to verify that both osimagestream.StreamNameRHEL10 and osimagestream.StreamNameCentOS10 trigger the upgrade block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/status_test.go` around lines 1487 - 1518, Add a test parallel to
"runc not blocked on rhel-9 stream" that sets the
OSImageStream.Status.DefaultStream to osimagestream.StreamNameCentOS10 (and
optionally another for osimagestream.StreamNameRHEL10) and asserts that
optr.checkRuncUpgradeableGuard() returns upgradeBlock == true, non-empty
message, and no error; reuse the same Operator setup (mcLister with
rendered-worker config containing the runc drop-in, osImageStreamLister,
fgHandler) and reference the Operator.checkRuncUpgradeableGuard method, the
osImageStreamLister/osisIndexer setup, and mockMCPLister/mockMCLister used in
the existing test to locate where to add the new test(s).
pkg/operator/status.go (1)

506-514: Consider whether TOML parse failures should be non-fatal.

Currently, a malformed TOML drop-in in any pool's rendered MachineConfig will cause the entire detection to fail, potentially preventing the operator from evaluating runc status for all pools.

If the TOML is malformed due to a user-provided MachineConfig, consider logging a warning and continuing rather than returning an error. However, if you expect all drop-ins to be well-formed since they're generated by the controller, the current behavior is appropriate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/status.go` around lines 506 - 514, The TOML decoding failure
currently returns an error and aborts detection; change this to log a warning
and continue so one malformed CRI-O drop-in doesn't fail the whole routine: in
the block that decodes into crioRuntimeConfig using
toml.NewDecoder(bytes.NewReader(fileData)).Decode(&cfg), replace the return of
fmt.Errorf(...) with a non-fatal log (e.g., klog.Warningf or the package's
logger) that includes path and mc.Name and the decode error, then skip using cfg
for that drop-in and continue processing to allow effectiveRuntime evaluation
from other drop-ins; keep the existing behavior (return error) only if you
determine the drop-ins should always be trusted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/operator/status_test.go`:
- Around line 1487-1518: Add a test parallel to "runc not blocked on rhel-9
stream" that sets the OSImageStream.Status.DefaultStream to
osimagestream.StreamNameCentOS10 (and optionally another for
osimagestream.StreamNameRHEL10) and asserts that
optr.checkRuncUpgradeableGuard() returns upgradeBlock == true, non-empty
message, and no error; reuse the same Operator setup (mcLister with
rendered-worker config containing the runc drop-in, osImageStreamLister,
fgHandler) and reference the Operator.checkRuncUpgradeableGuard method, the
osImageStreamLister/osisIndexer setup, and mockMCPLister/mockMCLister used in
the existing test to locate where to add the new test(s).

In `@pkg/operator/status.go`:
- Around line 506-514: The TOML decoding failure currently returns an error and
aborts detection; change this to log a warning and continue so one malformed
CRI-O drop-in doesn't fail the whole routine: in the block that decodes into
crioRuntimeConfig using toml.NewDecoder(bytes.NewReader(fileData)).Decode(&cfg),
replace the return of fmt.Errorf(...) with a non-fatal log (e.g., klog.Warningf
or the package's logger) that includes path and mc.Name and the decode error,
then skip using cfg for that drop-in and continue processing to allow
effectiveRuntime evaluation from other drop-ins; keep the existing behavior
(return error) only if you determine the drop-ins should always be trusted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 326ee946-c807-400a-9eb7-82bffd5370e1

📥 Commits

Reviewing files that changed from the base of the PR and between e0916a2 and 5a0e6e8.

📒 Files selected for processing (2)
  • pkg/operator/status.go
  • pkg/operator/status_test.go

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 28, 2026

@bitoku: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants