Skip to content

E2E: verify RTE metrics endpoint TLS compliance#4259

Open
mrniranjan wants to merge 2 commits into
openshift-kni:mainfrom
mrniranjan:rte_tls_extra_tests
Open

E2E: verify RTE metrics endpoint TLS compliance#4259
mrniranjan wants to merge 2 commits into
openshift-kni:mainfrom
mrniranjan:rte_tls_extra_tests

Conversation

@mrniranjan
Copy link
Copy Markdown
Contributor

Add e2e tests that probe each RTE pod's metrics port directly over port-forward to confirm:

  • the TLS handshake succeeds and the negotiated version meets the cluster TLS profile minimum
  • the /metrics endpoint returns a valid response body
  • connections using a version or cipher incompatible with the cluster profile are rejected

Add FetchMetrics helper in test/internal/tls to perform an HTTPS GET /metrics over a port-forwarded connection, reusing the existing port-forward and TLS probing infrastructure.

AI Attribution: AIA Human-AI blend, New content, Human-initiated, Reviewed, Cursor 3.0.16 v1.0

Add e2e tests that probe each RTE pod's metrics port directly over
port-forward to confirm:
- the TLS handshake succeeds and the negotiated version meets the
  cluster TLS profile minimum
- the /metrics endpoint returns a valid response body
- connections using a version or cipher incompatible with the cluster
  profile are rejected

Add FetchMetrics helper in test/internal/tls to perform an HTTPS GET
/metrics over a port-forwarded connection, reusing the existing
port-forward and TLS probing infrastructure.

AI Attribution: AIA Human-AI blend, New content, Human-initiated, Reviewed, Cursor 3.0.16 v1.0

Signed-off-by: Niranjan M.R <mniranja@redhat.com>
@openshift-ci openshift-ci Bot requested a review from shajmakh May 29, 2026 05:42
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrniranjan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested a review from swatisehgal May 29, 2026 05:42
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: d40825bc-5dd0-41c3-a539-9a0208358707

📥 Commits

Reviewing files that changed from the base of the PR and between 292c70e and 837c0d5.

📒 Files selected for processing (2)
  • test/e2e/tls/tls_test.go
  • test/internal/tls/tls.go

📝 Walkthrough

Walkthrough

Adds cluster-TLS-derived helpers and a single-connection HTTPS metrics fetcher, then updates E2E tests to validate RTE pod /metrics endpoints negotiate at-or-above the cluster minimum TLS and reject below-minimum versions or disallowed ciphers when applicable.

Changes

RTE Metrics TLS Compliance Testing

Layer / File(s) Summary
Negative TLS compatibility test updates
test/e2e/tls/tls_test.go
Replace inline TLS-profile/min-version derivation with GetClusterTLSProfile and add conditional probing for a disallowed TLS 1.2 cipher in the negative compatibility test.
RTE metrics endpoint E2E context and discovery
test/e2e/tls/tls_test.go
Replace prior DaemonSet flag alignment test with an ordered “RTE Metrics end point TLS Compliance” context that loads cluster TLS settings, discovers RTE pods from NRO NodeGroups/DaemonSets, and runs positive/negative metrics TLS checks.
FetchMetrics and transport internals
test/internal/tls/tls.go
Add FetchMetrics which port-forwards to a pod metrics port and issues an HTTPS GET /metrics using a custom transport that disables keep-alives, skips verification, and enforces a single dial; reads and returns the full body and errors on non-200 statuses.
Cluster TLS profile helpers and wiring
test/internal/tls/tls.go
Add GetClusterTLSProfile, RTESettingsFromClusterProfile, CheckRTEDaemonSetTLSFlagsMatchClusterProfile, refactor CheckRTEDaemonSetTLSFlags to use getNROWithNodeGroups, and add helpers to convert TLSProfileSpec into *tls.Config using controller-runtime TLS utilities.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main change: adding e2e tests to verify RTE metrics endpoint TLS compliance, which aligns with the changeset's core objective.
Description check ✅ Passed The description is clearly related to the changeset, detailing the new e2e tests for RTE metrics TLS compliance, the FetchMetrics helper, and the specific validations performed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/internal/tls/tls.go (1)

256-275: ⚡ Quick win

Add a request timeout to the metrics HTTP client.

The client and request (context.Background()) have no deadline, so httpCli.Do/io.ReadAll over the port-forward can block indefinitely if the server stalls. The table node has no SpecTimeout, so a hang would only be interrupted at suite level. A client-level timeout bounds this cheaply.

♻️ Bound the request with a timeout
 	return &http.Client{Transport: tr}
+	return &http.Client{Transport: tr, Timeout: 30 * time.Second}

Add the import:

 	"net/http"
 	"strings"
+	"time"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/internal/tls/tls.go` around lines 256 - 275, The metrics HTTP client
returned by buildMetricsHTTPClient lacks any deadline and can hang; set a
client-level timeout (e.g., assign the Timeout field on the returned
*http.Client) so all requests via that client are bounded, and add the time
import; update buildMetricsHTTPClient to construct &http.Client{Transport: tr,
Timeout: <reasonable duration>} so http.Client and buildMetricsHTTPClient (and
the transport tr) enforce a request timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/internal/tls/tls.go`:
- Around line 256-275: The metrics HTTP client returned by
buildMetricsHTTPClient lacks any deadline and can hang; set a client-level
timeout (e.g., assign the Timeout field on the returned *http.Client) so all
requests via that client are bounded, and add the time import; update
buildMetricsHTTPClient to construct &http.Client{Transport: tr, Timeout:
<reasonable duration>} so http.Client and buildMetricsHTTPClient (and the
transport tr) enforce a request timeout.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: cf30fd60-4c84-4d28-8e74-4645014ce7fe

📥 Commits

Reviewing files that changed from the base of the PR and between 5934cef and 292c70e.

📒 Files selected for processing (2)
  • test/e2e/tls/tls_test.go
  • test/internal/tls/tls.go

Comment thread test/e2e/tls/tls_test.go Outdated
Comment on lines +168 to +173
const rteMetricsPort = "2112"
minVersion, profileSpec := getClusterTLSProfile(ctx)
rtePods := getRTEPodsFromNRO(ctx)
for i := range rtePods {
rtePod := &rtePods[i]
By(fmt.Sprintf("Validating TLS metrics behavior on pod %s/%s", rtePod.Namespace, rtePod.Name))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amount of shared code between positive and negative is small, while verifyRTETLSPositive and verifyRTETLSVersionRejected+verifyRTETLSCipherRejected are having most of the verification.
I think the goal of describeTable is lost here and having It() spec per test would serve readability better. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, addressed in latest commit

Comment thread test/e2e/tls/tls_test.go Outdated
Comment on lines +188 to +200
func getClusterTLSProfile(ctx context.Context) (uint16, configv1.TLSProfileSpec) {
GinkgoHelper()
profileSpec, err := ctrltls.FetchAPIServerTLSProfile(ctx, e2eclient.Client)
Expect(err).ToNot(HaveOccurred(), "unable to fetch TLS profile from API Server")
tlsConfigFn, unSupportedCiphers := ctrltls.NewTLSConfigFromProfile(profileSpec)
if len(unSupportedCiphers) > 0 {
klog.InfoS("TLS profile has unsupported ciphers", "ciphers", unSupportedCiphers)
}
cfg := &tls.Config{}
tlsConfigFn(cfg)
klog.InfoS("cluster TLS minimum version", "versxion", tls.VersionName(cfg.MinVersion))
return cfg.MinVersion, profileSpec
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't this been used before by the existing tests? if so, can we make it an test/internal/tls/tls.go function ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in latest commit

Comment thread test/e2e/tls/tls_test.go Outdated
Comment on lines +225 to +271
func verifyRTETLSPositive(rtePod *corev1.Pod, port string, minVersion uint16) {
GinkgoHelper()

By("Verifying TLS Handshake succeeds and negotiated version meets the cluster profile")
// fetch TLS Version and the cipher supported by TLS version
gotVersion, gotCipherId, err := intls.ProbeTLSSettings(e2eclient.K8sClient, rtePod, port)
Expect(err).ToNot(HaveOccurred(), "TLS Handshake failed on pod %q", rtePod.Name)
klog.InfoS("negotiated TLS settings", "version", tls.VersionName(gotVersion), "cipher", tls.CipherSuiteName(gotCipherId))
Expect(gotVersion).To(BeNumerically(">=", minVersion),
"negotiated version %s below minimum %s", tls.VersionName(gotVersion), tls.VersionName(minVersion))
By("Fetching /metrics over TLS via port-fortward")
body, err := intls.FetchMetrics(e2eclient.K8sClient, rtePod, port)
Expect(err).ToNot(HaveOccurred(), "failed to fetch metrics from pod %q", rtePod.Name)
Expect(body).ToNot(BeEmpty(), "metrics respose body should not be empty")
Expect(string(body)).To(ContainSubstring("rte_noderesourcetopology_writes_total"))
}

func verifyRTETLSVersionRejected(rtePod *corev1.Pod, port string, minVersion uint16) {
GinkgoHelper()

belowMinVersion, err := intls.TLSVersionBelow(minVersion)
Expect(err).ToNot(HaveOccurred(), "cannot determine version below %s", tls.VersionName(minVersion))

By(fmt.Sprintf("Verifying TLS %s is rejected", tls.VersionName(belowMinVersion)))
err = intls.ProbeMaxTLSVersion(e2eclient.K8sClient, rtePod, port, belowMinVersion)
Expect(err).To(HaveOccurred(), "should reject TLS %s", tls.VersionName(belowMinVersion))
Expect(errors.Is(err, intls.ErrTLSHandshakeRejected)).To(BeTrue(), "got: %v", err)
}

func verifyRTETLSCipherRejected(rtePod *corev1.Pod, port string, minVersion uint16, profileSpec configv1.TLSProfileSpec) {
GinkgoHelper()

if minVersion == tls.VersionTLS13 {
klog.InfoS("TLS 1.3 ciphers are not individually configurable, skipping cipher rejection test")
return
}

disallowedCipher := intls.FindDisallowedCipher(profileSpec.Ciphers)
if disallowedCipher == "" {
Skip("all known TLS 1.2 ciphers are in the allowed set, nothing to test")
}

By(fmt.Sprintf("Verifying cipher %s is rejected", disallowedCipher))
err := intls.ProbeTLSCipher(e2eclient.K8sClient, rtePod, port, disallowedCipher)
Expect(err).To(HaveOccurred(), "should reject cipher %s", disallowedCipher)
Expect(errors.Is(err, intls.ErrTLSHandshakeRejected)).To(BeTrue(), "got: %v", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are used only once let's have them inlined please.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in latest commit

Move getClusterTLSProfile to internal test utils function as GetClusterTLSProfile
tls.Config logic moved to test/internal/tls/tls.go
scheduler negative test By now names the disallowed cipher (e.g. ECDHE-ECDSA-AES128-SHA256)

AI Attribution: AIA Human-AI blend, New content, Human-initiated, Reviewed, Cursor 3.0.16 v1.0

Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants