Skip to content

[release-4.20] OCPBUGS-79463: Implement mTLS authentication and authorization for CVO metrics endpoint#1374

Open
DavidHurta wants to merge 13 commits intoopenshift:release-4.20from
DavidHurta:backport-metrics-mtls-release-4.20
Open

[release-4.20] OCPBUGS-79463: Implement mTLS authentication and authorization for CVO metrics endpoint#1374
DavidHurta wants to merge 13 commits intoopenshift:release-4.20from
DavidHurta:backport-metrics-mtls-release-4.20

Conversation

@DavidHurta
Copy link
Copy Markdown
Contributor

@DavidHurta DavidHurta commented Apr 13, 2026

Manual backport of #1326 needed due to conflicts in the metrics_test.go file and vendoreded dependencies (e.g., go.sum).

Also a new e3bdf94 commit was needed to handle the different crypto version of the release.

DavidHurta and others added 12 commits April 13, 2026 16:12
…updates

This commit's goal is to prepare the existing code for mTLS support.

In OpenShift, core operators SHOULD require authentication, and they
SHOULD support TLS client certificate authentication [1]. They also
SHOULD support local authorization and SHOULD allow the well-known
metrics scraping identity [1]. To achieve this, an operator must be able
to verify a client's certificate. To do this, the certificate can be
verified using the certificate authority (CA) bundle located in a
ConfigMap in the kube-system namespace [2].

This would entail an implementation of a new controller to watch the
ConfigMap for changes. To avoid such implementation to avoid
potential bugs and future maintenance, my goal is to utilize the
`k8s.io/apiserver/pkg/server/dynamiccertificates` package for this goal
as the package provides a functionality for this specific use case.

While doing so, we can also rework the existing, a bit complex,
implementation and utilize the package for existing use cases as well
to simplify the logic and use an existing, well-tested library.

[1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics
[2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit 2a432a6)
In OpenShift, core operators SHOULD require authentication and they
SHOULD support TLS client certificate authentication [1]. They also
SHOULD support local authorization and SHOULD allow the well-known
metrics scraping identity [1]. To achieve this, an operator must be able
to verify a client's certificate. To do this, the certificate can be
verified using the certificate authority (CA) bundle located at the
client-ca-file key of the kube-system/extension-apiserver-authentication
ConfigMap [2].

Guarantee failed connections when the config from the GetConfigForClient
method is nil to ensure connections are only using the TLS config
from the serving cert controller.

[1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics
[2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit 622e335)
In OpenShift, core operators SHOULD support local authorization and
SHOULD allow the well-known metrics scraping identity
(system:serviceaccount:openshift-monitoring:prometheus-k8s) to access
the /metrics endpoint. They MAY support delegated authorization check
via SubjectAccessReviews. [1]

The well-known metrics scraping identity's client certificate is issued
for the system:serviceaccount:openshift-monitoring:prometheus-k8s
Common Name (CN) and signed by the kubernetes.io/kube-apiserver-client
signer. [2]

Thus, the commit utilizes this fact to check the client's certificate
for this specific CN value. This is also done by the hardcodedauthorizer
package utilized by other OpenShift operators for the metrics
endpoint [3].

We could utilize the existing bearer token authorization as a fallback.
However, I would like to minimize the attack surface. Especially for
security things that we are implementing and testing, rather than
importing from well-established modules.

The commit implements a user information extraction from a
certificate to minimize the needed dependencies.

[1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics
[2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus
[3]: https://pkg.go.dev/github.com/openshift/library-go/pkg/authorization/hardcodedauthorizer

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit 18495ae)
The `o.HyperShift` option is not available in older release branches.
In newer branches, the option can be utilized.

(cherry picked from commit 554fea0)
In HyperShift, the CVO currently needs to have disabled both
authorization and authentication. Ensure the aspects are disabled so as
not break HyperShift.

However, in the future, the authentication will be enabled using mTLS
and a mounted CA bundle file. Thus, authentication needs to be
configurable.

Authorization needs to be configurable as well because HyperShift
allows a custom monitoring stack to scrape hosted control plane
components. In the future in HyperShift, authentication of the metrics
endpoint of the CVO will be enforced; however, the authorization will be
disabled. This commit prepares the code for these changes.

(cherry picked from commit 3519037)
This is done to provide HTTP return values in failures to comply
with the origin test suite.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
(cherry picked from commit 62f88dd)

# Conflicts:
#	pkg/cvo/metrics_test.go
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 91b9cc4d-bf85-45d1-ad7a-edb3e5dc32b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
@DavidHurta DavidHurta changed the title Backport metrics mtls release 4.20 [release-4.20] OCPBUGS-79463: Implement mTLS authentication and authorization for CVO metrics endpoint Apr 13, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@DavidHurta: This pull request references Jira Issue OCPBUGS-79463, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.z) matches configured target version for branch (4.20.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-77256 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-77256 targets the "4.21.z" version, which is one of the valid target versions: 4.21.0, 4.21.z
  • bug has dependents

Requesting review from QA contact:
/cc @dis016

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from dis016 April 13, 2026 15:00
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@DavidHurta: This pull request references Jira Issue OCPBUGS-79463, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.z) matches configured target version for branch (4.20.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-77256 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-77256 targets the "4.21.z" version, which is one of the valid target versions: 4.21.0, 4.21.z
  • bug has dependents

Requesting review from QA contact:
/cc @dis016

Details

In response to this:

Manual backport needed due to conflicts in the metrics_test.go file and vendoreded dependencies (e.g., go.sum).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@DavidHurta
Copy link
Copy Markdown
Contributor Author

/hold

Need to fix the source code for the failing jobs.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026
This version of crypto requires us to specify the expiration in days
using an integer. This is a testing logic, a one day is sufficient.
@DavidHurta
Copy link
Copy Markdown
Contributor Author

Let me push the changes after the current CI run has finished.

@DavidHurta
Copy link
Copy Markdown
Contributor Author

/hold

Need to verify the HyperShift CI.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 14, 2026

@DavidHurta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-techpreview e3bdf94 link true /test e2e-aws-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@DavidHurta
Copy link
Copy Markdown
Contributor Author

/unhold

I am unable to verify that the hosted CVO metrics are being scraped in HyperShift. I have utilized PromeCleus in past testing. However, the needed metrics are not being gathered in 4.20 CI.

I am unholding the PR, as the PR is ready for a review; however, we need to ensure that we do not regress in HyperShift, and the verification should be a part of the overall verification of the PR, as it's not exercised in the executed presubmit CI run.

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2026
@DavidHurta
Copy link
Copy Markdown
Contributor Author

/label backport-risk-assessed
/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci Bot added backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. labels Apr 16, 2026
@DavidHurta
Copy link
Copy Markdown
Contributor Author

The failing techpreview job looks unrelated.

@DavidHurta
Copy link
Copy Markdown
Contributor Author

@dis016, after the PR is approved, will you be able to verify that the CVO metrics can be scraped in HyperShift?

@DavidHurta
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.20-e2e-aws-ovn-serial-1of2 periodic-ci-openshift-release-main-nightly-4.20-e2e-aws-ovn-serial-2of2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 16, 2026

@DavidHurta: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.20-e2e-aws-ovn-serial-1of2
  • periodic-ci-openshift-release-main-nightly-4.20-e2e-aws-ovn-serial-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ccaf1bc0-399e-11f1-8e52-a623cdd10ba5-0

@hongkailiu
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 16, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DavidHurta, hongkailiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [DavidHurta,hongkailiu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants