Skip to content

[WIP] pkg/operator/encryption/controllers: NewKMSPreflightController scaffolding#2170

Draft
p0lyn0mial wants to merge 3 commits into
openshift:masterfrom
p0lyn0mial:kms-preflight-checks-part2
Draft

[WIP] pkg/operator/encryption/controllers: NewKMSPreflightController scaffolding#2170
p0lyn0mial wants to merge 3 commits into
openshift:masterfrom
p0lyn0mial:kms-preflight-checks-part2

Conversation

@p0lyn0mial
Copy link
Copy Markdown
Contributor

@p0lyn0mial p0lyn0mial commented Apr 21, 2026

Summary by CodeRabbit

  • New Features
    • Added a KMS preflight controller that performs periodic readiness checks for encryption, surfaces degraded/error status to the operator, and records events so administrators can observe preflight failures and readiness progress before encrypted operations are enabled.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Walkthrough

Adds a new KMS preflight controller constructor and implementation that wires APIServer and operator informers, evaluates operator preconditions and provider readiness, executes preflight checks (stubbed), and updates the EncryptionKMSPreflightControllerDegraded operator condition based on results.

Changes

Cohort / File(s) Summary
KMS Preflight Controller
pkg/operator/encryption/controllers/kms_preflight_controller.go
Added NewKMSPreflightController(...) and kmsPreflightController implementation. Creates a library-go factory.Controller with 1m resync, wires API server and operator informers, registers an event recorder, evaluates shouldRunEncryptionController(...), runs runPreflightChecks(ctx) (currently stubbed), and sets EncryptionKMSPreflightControllerDegraded (True on preflight error, False otherwise).

Sequence Diagram(s)

sequenceDiagram
    participant Controller as Controller (kmsPreflightController)
    participant Preconditions as PreconditionsFn
    participant Provider as Provider
    participant Preflight as runPreflightChecks
    participant Operator as OperatorClient/StatusApply

    Controller->>Preconditions: preconditionsFulfilledFn(ctx)
    Controller->>Provider: ShouldRunEncryptionControllers(ctx)
    alt preconditions/provider not ready or error
        Controller->>Operator: apply degraded = False / clear or return
    else ready
        Controller->>Preflight: runPreflightChecks(ctx)
        alt preflight fails
            Controller->>Operator: apply degraded = True (Error, message)
        else preflight succeeds
            Controller->>Operator: apply degraded = False
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding scaffolding for a new KMS preflight controller factory constructor and its initial implementation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR adds controller scaffolding with no test files or Ginkgo test definitions, so no test names to assess.
Test Structure And Quality ✅ Passed This PR does not include Ginkgo test code to review; it is a WIP scaffolding PR with stub implementation.
Microshift Test Compatibility ✅ Passed The pull request adds a controller implementation with no Ginkgo e2e tests present.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds a controller implementation but no new Ginkgo e2e tests, so SNO compatibility evaluation is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces KMS preflight controller and pod template scaffolding without topology-incompatible scheduling constraints or hardcoded replica assumptions.
Ote Binary Stdout Contract ✅ Passed The kms_preflight_controller.go file contains no stdout-writing violations with no fmt.Print*(), log.Print*(), klog calls, or process-level functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR adds KMS preflight controller implementation, not e2e test code. Ginkgo test patterns were not found, confirming this is production code only.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/operator/encryption/controllers/kms_preflight_controller.go (1)

27-28: Replace TODO comment with exported API doc before merge.

On Line 27, convert the placeholder into a real GoDoc comment for NewKMSPreflightController to document intent and inputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/encryption/controllers/kms_preflight_controller.go` around lines
27 - 28, Replace the TODO comment above NewKMSPreflightController with a proper
GoDoc: write an exported comment that starts with "NewKMSPreflightController"
and briefly describes what the constructor returns, its purpose, and the meaning
of its input parameters (e.g., the controller's responsibilities and any
important preconditions for the arguments); ensure the comment follows GoDoc
style (complete sentence, starts with the function name) so tools and consumers
can discover the API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/encryption/controllers/kms_preflight_controller.go`:
- Around line 91-94: The method kmsPreflightController.runPreflightChecks
currently always returns nil which masks missing validation; replace the
placeholder success with a failing error until real preflight logic is
implemented so the controller fails closed. Update runPreflightChecks to return
a clear sentinel error (e.g. errors.New("preflight checks not implemented")) and
ensure callers of runPreflightChecks (the kmsPreflightController
reconcile/health logic) treat any non-nil error as a failed preflight so the
controller reports unhealthy instead of appearing healthy.

---

Nitpick comments:
In `@pkg/operator/encryption/controllers/kms_preflight_controller.go`:
- Around line 27-28: Replace the TODO comment above NewKMSPreflightController
with a proper GoDoc: write an exported comment that starts with
"NewKMSPreflightController" and briefly describes what the constructor returns,
its purpose, and the meaning of its input parameters (e.g., the controller's
responsibilities and any important preconditions for the arguments); ensure the
comment follows GoDoc style (complete sentence, starts with the function name)
so tools and consumers can discover the API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a0c7ca5a-b59a-4036-bf42-1bee5bc65cd7

📥 Commits

Reviewing files that changed from the base of the PR and between 18e7937 and 41bf1bc.

📒 Files selected for processing (1)
  • pkg/operator/encryption/controllers/kms_preflight_controller.go

Comment thread pkg/operator/encryption/controllers/kms_preflight_controller.go
@p0lyn0mial p0lyn0mial force-pushed the kms-preflight-checks-part2 branch from 41bf1bc to 687510a Compare April 22, 2026 08:58
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/encryption/controllers/kms_preflight_controller.go`:
- Around line 65-67: The deferred status-write currently clobbers any earlier
reconcile error by assigning applyError to err; change this so we preserve both
errors (e.g., use errors.Join or wrap the original err with applyError) instead
of replacing it. Locate the operatorClient.ApplyOperatorStatus call in
kms_preflight_controller.go and, when applyError != nil, set err =
errors.Join(err, applyError) (or err = fmt.Errorf("%w; status apply failed: %v",
err, applyError) if errors.Join isn't available) so the original reconcile
failure and the status-write failure are both preserved.
- Around line 70-76: The current early-return on
shouldRunEncryptionController(...) errors drops updating degradedCondition
(setting it to nil) which leaves stale health; instead, when
shouldRunEncryptionController returns an error, set
KMSPreflightControllerDegraded to True on degradedCondition (use the existing
degradedCondition variable returned/managed in this scope), include the error
details in the condition message/reason (e.g., call
WithStatus(operatorv1.ConditionTrue) and attach err.Error() or a concise
reason), ensure that updated degradedCondition is written back to the operator
status before returning the err so the operator reflects the current failure of
shouldRunEncryptionController.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1770302b-4a21-4ea2-8d61-896989389b70

📥 Commits

Reviewing files that changed from the base of the PR and between 41bf1bc and 687510a.

📒 Files selected for processing (1)
  • pkg/operator/encryption/controllers/kms_preflight_controller.go

Comment on lines +65 to +67
if applyError := c.operatorClient.ApplyOperatorStatus(ctx, c.controllerInstanceName, status); applyError != nil {
err = applyError
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve the original reconcile error when status apply also fails.

The deferred write currently overwrites any earlier error. If preflight evaluation fails and ApplyOperatorStatus(...) also fails, callers only see the status-write error and lose the root cause. Prefer wrapping/joining instead of replacing err.

Suggested change
 		status := applyoperatorv1.OperatorStatus().WithConditions(degradedCondition)
 		if applyError := c.operatorClient.ApplyOperatorStatus(ctx, c.controllerInstanceName, status); applyError != nil {
-			err = applyError
+			if err == nil {
+				err = applyError
+			} else {
+				err = fmt.Errorf("%v; failed to apply operator status: %w", err, applyError)
+			}
 		}
 	}()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/encryption/controllers/kms_preflight_controller.go` around lines
65 - 67, The deferred status-write currently clobbers any earlier reconcile
error by assigning applyError to err; change this so we preserve both errors
(e.g., use errors.Join or wrap the original err with applyError) instead of
replacing it. Locate the operatorClient.ApplyOperatorStatus call in
kms_preflight_controller.go and, when applyError != nil, set err =
errors.Join(err, applyError) (or err = fmt.Errorf("%w; status apply failed: %v",
err, applyError) if errors.Join isn't available) so the original reconcile
failure and the status-write failure are both preserved.

Comment on lines +70 to +76
if ready, err := shouldRunEncryptionController(c.operatorClient, c.preconditionsFulfilledFn, c.provider.ShouldRunEncryptionControllers); err != nil || !ready {
if err != nil {
degradedCondition = nil
} else {
degradedCondition = degradedCondition.WithStatus(operatorv1.ConditionFalse)
}
return err // we will get re-kicked when the operator status updates
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Report readiness-evaluation failures in KMSPreflightControllerDegraded.

On Line 72, dropping the condition update leaves the previous value in place. After a prior preflight failure/success, a later shouldRunEncryptionController(...) error will keep reporting stale health instead of the current controller failure. This path should write KMSPreflightControllerDegraded=True with the error details rather than setting degradedCondition to nil.

Suggested change
 if ready, err := shouldRunEncryptionController(c.operatorClient, c.preconditionsFulfilledFn, c.provider.ShouldRunEncryptionControllers); err != nil || !ready {
 	if err != nil {
-		degradedCondition = nil
+		degradedCondition = degradedCondition.
+			WithStatus(operatorv1.ConditionTrue).
+			WithReason("Error").
+			WithMessage(err.Error())
 	} else {
 		degradedCondition = degradedCondition.WithStatus(operatorv1.ConditionFalse)
 	}
 	return err // we will get re-kicked when the operator status updates
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ready, err := shouldRunEncryptionController(c.operatorClient, c.preconditionsFulfilledFn, c.provider.ShouldRunEncryptionControllers); err != nil || !ready {
if err != nil {
degradedCondition = nil
} else {
degradedCondition = degradedCondition.WithStatus(operatorv1.ConditionFalse)
}
return err // we will get re-kicked when the operator status updates
if ready, err := shouldRunEncryptionController(c.operatorClient, c.preconditionsFulfilledFn, c.provider.ShouldRunEncryptionControllers); err != nil || !ready {
if err != nil {
degradedCondition = degradedCondition.
WithStatus(operatorv1.ConditionTrue).
WithReason("Error").
WithMessage(err.Error())
} else {
degradedCondition = degradedCondition.WithStatus(operatorv1.ConditionFalse)
}
return err // we will get re-kicked when the operator status updates
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/encryption/controllers/kms_preflight_controller.go` around lines
70 - 76, The current early-return on shouldRunEncryptionController(...) errors
drops updating degradedCondition (setting it to nil) which leaves stale health;
instead, when shouldRunEncryptionController returns an error, set
KMSPreflightControllerDegraded to True on degradedCondition (use the existing
degradedCondition variable returned/managed in this scope), include the error
details in the condition message/reason (e.g., call
WithStatus(operatorv1.ConditionTrue) and attach err.Error() or a concise
reason), ensure that updated degradedCondition is written back to the operator
status before returning the err so the operator reflects the current failure of
shouldRunEncryptionController.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
pkg/operator/encryption/controllers/kms_preflight_controller.go (2)

126-132: ⚠️ Potential issue | 🟠 Major

Report readiness-evaluation failures in EncryptionKMSPreflightControllerDegraded.

When shouldRunEncryptionController(...) returns an error, setting degradedCondition = nil (line 128) skips the status update entirely. This leaves stale state from a prior sync instead of reflecting the current failure. This path should write EncryptionKMSPreflightControllerDegraded=True with the error details.

Suggested fix
 if ready, err := shouldRunEncryptionController(c.operatorClient, c.preconditionsFulfilledFn, c.provider.ShouldRunEncryptionControllers); err != nil || !ready {
 	if err != nil {
-		degradedCondition = nil
+		degradedCondition = degradedCondition.
+			WithStatus(operatorv1.ConditionTrue).
+			WithReason("Error").
+			WithMessage(err.Error())
 	} else {
 		degradedCondition = degradedCondition.WithStatus(operatorv1.ConditionFalse)
 	}
 	return err // we will get re-kicked when the operator status updates
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/encryption/controllers/kms_preflight_controller.go` around lines
126 - 132, The current error path in shouldRunEncryptionController(...) clears
degradedCondition (degradedCondition = nil) which prevents updating
EncryptionKMSPreflightControllerDegraded; instead, when
shouldRunEncryptionController returns an error you should set degradedCondition
=
degradedCondition.WithStatus(operatorv1.ConditionTrue).WithMessage(err.Error())
(or equivalent) so the controller status writes
EncryptionKMSPreflightControllerDegraded=True with the error details before
returning err; update the branch in the function containing
shouldRunEncryptionController, degradedCondition and the return to set and
retain the degradedCondition with the error info rather than nil so the failure
is reflected in status.

121-123: ⚠️ Potential issue | 🟡 Minor

Preserve the original reconcile error when status apply also fails.

The deferred write currently overwrites any earlier error. If preflight evaluation fails and ApplyOperatorStatus(...) also fails, callers only see the status-write error and lose the root cause.

Suggested fix using error joining
 		status := applyoperatorv1.OperatorStatus().WithConditions(degradedCondition)
 		if applyError := c.operatorClient.ApplyOperatorStatus(ctx, c.controllerInstanceName, status); applyError != nil {
-			err = applyError
+			if err == nil {
+				err = applyError
+			} else {
+				err = fmt.Errorf("%v; failed to apply operator status: %w", err, applyError)
+			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/encryption/controllers/kms_preflight_controller.go` around lines
121 - 123, The deferred status-write currently overwrites the original reconcile
error stored in err when c.operatorClient.ApplyOperatorStatus(ctx,
c.controllerInstanceName, status) returns applyError; change the defer logic to
preserve and combine errors instead of replacing them — e.g., if both err and
applyError are non-nil set err = errors.Join(err, applyError), if only
applyError is non-nil set err = applyError, otherwise leave err unchanged;
update imports to include "errors" if needed so the ApplyOperatorStatus defer
merges the errors rather than discarding the original reconcile failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/operator/encryption/controllers/kms_preflight_controller.go`:
- Around line 126-132: The current error path in
shouldRunEncryptionController(...) clears degradedCondition (degradedCondition =
nil) which prevents updating EncryptionKMSPreflightControllerDegraded; instead,
when shouldRunEncryptionController returns an error you should set
degradedCondition =
degradedCondition.WithStatus(operatorv1.ConditionTrue).WithMessage(err.Error())
(or equivalent) so the controller status writes
EncryptionKMSPreflightControllerDegraded=True with the error details before
returning err; update the branch in the function containing
shouldRunEncryptionController, degradedCondition and the return to set and
retain the degradedCondition with the error info rather than nil so the failure
is reflected in status.
- Around line 121-123: The deferred status-write currently overwrites the
original reconcile error stored in err when
c.operatorClient.ApplyOperatorStatus(ctx, c.controllerInstanceName, status)
returns applyError; change the defer logic to preserve and combine errors
instead of replacing them — e.g., if both err and applyError are non-nil set err
= errors.Join(err, applyError), if only applyError is non-nil set err =
applyError, otherwise leave err unchanged; update imports to include "errors" if
needed so the ApplyOperatorStatus defer merges the errors rather than discarding
the original reconcile failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ae271b2-e73a-4c97-bf82-d81491e0b573

📥 Commits

Reviewing files that changed from the base of the PR and between a815c0f and d8b5e76.

📒 Files selected for processing (1)
  • pkg/operator/encryption/controllers/kms_preflight_controller.go

// The controller creates a ServiceAccount, Role and RoleBinding so that the pod
// can update its own status. These resources are cleaned up when the pod is removed.
// The controller reads these enhanced pod statuses to update its own operator
// status, which is propagated to end users.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism makes sense to me.

Just minor improvement;

Suggested change
// status, which is propagated to end users.
// status and propagates this per provider name to end users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants