[release-4.20] OCPBUGS-79463: Implement mTLS authentication and authorization for CVO metrics endpoint#1374
Conversation
…updates This commit's goal is to prepare the existing code for mTLS support. In OpenShift, core operators SHOULD require authentication, and they SHOULD support TLS client certificate authentication [1]. They also SHOULD support local authorization and SHOULD allow the well-known metrics scraping identity [1]. To achieve this, an operator must be able to verify a client's certificate. To do this, the certificate can be verified using the certificate authority (CA) bundle located in a ConfigMap in the kube-system namespace [2]. This would entail an implementation of a new controller to watch the ConfigMap for changes. To avoid such implementation to avoid potential bugs and future maintenance, my goal is to utilize the `k8s.io/apiserver/pkg/server/dynamiccertificates` package for this goal as the package provides a functionality for this specific use case. While doing so, we can also rework the existing, a bit complex, implementation and utilize the package for existing use cases as well to simplify the logic and use an existing, well-tested library. [1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics [2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> (cherry picked from commit 2a432a6)
In OpenShift, core operators SHOULD require authentication and they SHOULD support TLS client certificate authentication [1]. They also SHOULD support local authorization and SHOULD allow the well-known metrics scraping identity [1]. To achieve this, an operator must be able to verify a client's certificate. To do this, the certificate can be verified using the certificate authority (CA) bundle located at the client-ca-file key of the kube-system/extension-apiserver-authentication ConfigMap [2]. Guarantee failed connections when the config from the GetConfigForClient method is nil to ensure connections are only using the TLS config from the serving cert controller. [1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics [2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> (cherry picked from commit 622e335)
…arer token The paths are specified at [1]. [1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md (cherry picked from commit 81f3bb5)
In OpenShift, core operators SHOULD support local authorization and SHOULD allow the well-known metrics scraping identity (system:serviceaccount:openshift-monitoring:prometheus-k8s) to access the /metrics endpoint. They MAY support delegated authorization check via SubjectAccessReviews. [1] The well-known metrics scraping identity's client certificate is issued for the system:serviceaccount:openshift-monitoring:prometheus-k8s Common Name (CN) and signed by the kubernetes.io/kube-apiserver-client signer. [2] Thus, the commit utilizes this fact to check the client's certificate for this specific CN value. This is also done by the hardcodedauthorizer package utilized by other OpenShift operators for the metrics endpoint [3]. We could utilize the existing bearer token authorization as a fallback. However, I would like to minimize the attack surface. Especially for security things that we are implementing and testing, rather than importing from well-established modules. The commit implements a user information extraction from a certificate to minimize the needed dependencies. [1]: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#metrics [2]: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/#exposing-metrics-for-prometheus [3]: https://pkg.go.dev/github.com/openshift/library-go/pkg/authorization/hardcodedauthorizer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> (cherry picked from commit 18495ae)
The `o.HyperShift` option is not available in older release branches. In newer branches, the option can be utilized. (cherry picked from commit 554fea0)
In HyperShift, the CVO currently needs to have disabled both authorization and authentication. Ensure the aspects are disabled so as not break HyperShift. However, in the future, the authentication will be enabled using mTLS and a mounted CA bundle file. Thus, authentication needs to be configurable. Authorization needs to be configurable as well because HyperShift allows a custom monitoring stack to scrape hosted control plane components. In the future in HyperShift, authentication of the metrics endpoint of the CVO will be enforced; however, the authorization will be disabled. This commit prepares the code for these changes. (cherry picked from commit 3519037)
(cherry picked from commit 2f736b5)
(cherry picked from commit 3cd9e22)
(cherry picked from commit e47ae18)
(cherry picked from commit b68827c)
This is done to provide HTTP return values in failures to comply with the origin test suite. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> (cherry picked from commit 62f88dd) # Conflicts: # pkg/cvo/metrics_test.go
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-79463, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-79463, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold Need to fix the source code for the failing jobs. |
This version of crypto requires us to specify the expiration in days using an integer. This is a testing logic, a one day is sufficient.
|
Let me push the changes after the current CI run has finished. |
|
/hold Need to verify the HyperShift CI. |
|
@DavidHurta: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/unhold I am unable to verify that the hosted CVO metrics are being scraped in HyperShift. I have utilized PromeCleus in past testing. However, the needed metrics are not being gathered in 4.20 CI. I am unholding the PR, as the PR is ready for a review; however, we need to ensure that we do not regress in HyperShift, and the verification should be a part of the overall verification of the PR, as it's not exercised in the executed presubmit CI run. |
|
/label backport-risk-assessed |
|
The failing techpreview job looks unrelated. |
|
@dis016, after the PR is approved, will you be able to verify that the CVO metrics can be scraped in HyperShift? |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.20-e2e-aws-ovn-serial-1of2 periodic-ci-openshift-release-main-nightly-4.20-e2e-aws-ovn-serial-2of2 |
|
@DavidHurta: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ccaf1bc0-399e-11f1-8e52-a623cdd10ba5-0 |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DavidHurta, hongkailiu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Manual backport of #1326 needed due to conflicts in the
metrics_test.gofile and vendoreded dependencies (e.g.,go.sum).Also a new e3bdf94 commit was needed to handle the different
cryptoversion of the release.