E2E: verify RTE metrics endpoint TLS compliance#4259
Conversation
Add e2e tests that probe each RTE pod's metrics port directly over port-forward to confirm: - the TLS handshake succeeds and the negotiated version meets the cluster TLS profile minimum - the /metrics endpoint returns a valid response body - connections using a version or cipher incompatible with the cluster profile are rejected Add FetchMetrics helper in test/internal/tls to perform an HTTPS GET /metrics over a port-forwarded connection, reusing the existing port-forward and TLS probing infrastructure. AI Attribution: AIA Human-AI blend, New content, Human-initiated, Reviewed, Cursor 3.0.16 v1.0 Signed-off-by: Niranjan M.R <mniranja@redhat.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrniranjan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds cluster-TLS-derived helpers and a single-connection HTTPS metrics fetcher, then updates E2E tests to validate RTE pod /metrics endpoints negotiate at-or-above the cluster minimum TLS and reject below-minimum versions or disallowed ciphers when applicable. ChangesRTE Metrics TLS Compliance Testing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/internal/tls/tls.go (1)
256-275: ⚡ Quick winAdd a request timeout to the metrics HTTP client.
The client and request (
context.Background()) have no deadline, sohttpCli.Do/io.ReadAllover the port-forward can block indefinitely if the server stalls. The table node has noSpecTimeout, so a hang would only be interrupted at suite level. A client-level timeout bounds this cheaply.♻️ Bound the request with a timeout
return &http.Client{Transport: tr} + return &http.Client{Transport: tr, Timeout: 30 * time.Second}Add the import:
"net/http" "strings" + "time"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/internal/tls/tls.go` around lines 256 - 275, The metrics HTTP client returned by buildMetricsHTTPClient lacks any deadline and can hang; set a client-level timeout (e.g., assign the Timeout field on the returned *http.Client) so all requests via that client are bounded, and add the time import; update buildMetricsHTTPClient to construct &http.Client{Transport: tr, Timeout: <reasonable duration>} so http.Client and buildMetricsHTTPClient (and the transport tr) enforce a request timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/internal/tls/tls.go`:
- Around line 256-275: The metrics HTTP client returned by
buildMetricsHTTPClient lacks any deadline and can hang; set a client-level
timeout (e.g., assign the Timeout field on the returned *http.Client) so all
requests via that client are bounded, and add the time import; update
buildMetricsHTTPClient to construct &http.Client{Transport: tr, Timeout:
<reasonable duration>} so http.Client and buildMetricsHTTPClient (and the
transport tr) enforce a request timeout.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: cf30fd60-4c84-4d28-8e74-4645014ce7fe
📒 Files selected for processing (2)
test/e2e/tls/tls_test.gotest/internal/tls/tls.go
| const rteMetricsPort = "2112" | ||
| minVersion, profileSpec := getClusterTLSProfile(ctx) | ||
| rtePods := getRTEPodsFromNRO(ctx) | ||
| for i := range rtePods { | ||
| rtePod := &rtePods[i] | ||
| By(fmt.Sprintf("Validating TLS metrics behavior on pod %s/%s", rtePod.Namespace, rtePod.Name)) |
There was a problem hiding this comment.
The amount of shared code between positive and negative is small, while verifyRTETLSPositive and verifyRTETLSVersionRejected+verifyRTETLSCipherRejected are having most of the verification.
I think the goal of describeTable is lost here and having It() spec per test would serve readability better. WDYT?
There was a problem hiding this comment.
agree, addressed in latest commit
| func getClusterTLSProfile(ctx context.Context) (uint16, configv1.TLSProfileSpec) { | ||
| GinkgoHelper() | ||
| profileSpec, err := ctrltls.FetchAPIServerTLSProfile(ctx, e2eclient.Client) | ||
| Expect(err).ToNot(HaveOccurred(), "unable to fetch TLS profile from API Server") | ||
| tlsConfigFn, unSupportedCiphers := ctrltls.NewTLSConfigFromProfile(profileSpec) | ||
| if len(unSupportedCiphers) > 0 { | ||
| klog.InfoS("TLS profile has unsupported ciphers", "ciphers", unSupportedCiphers) | ||
| } | ||
| cfg := &tls.Config{} | ||
| tlsConfigFn(cfg) | ||
| klog.InfoS("cluster TLS minimum version", "versxion", tls.VersionName(cfg.MinVersion)) | ||
| return cfg.MinVersion, profileSpec | ||
| } |
There was a problem hiding this comment.
haven't this been used before by the existing tests? if so, can we make it an test/internal/tls/tls.go function ?
There was a problem hiding this comment.
addressed in latest commit
| func verifyRTETLSPositive(rtePod *corev1.Pod, port string, minVersion uint16) { | ||
| GinkgoHelper() | ||
|
|
||
| By("Verifying TLS Handshake succeeds and negotiated version meets the cluster profile") | ||
| // fetch TLS Version and the cipher supported by TLS version | ||
| gotVersion, gotCipherId, err := intls.ProbeTLSSettings(e2eclient.K8sClient, rtePod, port) | ||
| Expect(err).ToNot(HaveOccurred(), "TLS Handshake failed on pod %q", rtePod.Name) | ||
| klog.InfoS("negotiated TLS settings", "version", tls.VersionName(gotVersion), "cipher", tls.CipherSuiteName(gotCipherId)) | ||
| Expect(gotVersion).To(BeNumerically(">=", minVersion), | ||
| "negotiated version %s below minimum %s", tls.VersionName(gotVersion), tls.VersionName(minVersion)) | ||
| By("Fetching /metrics over TLS via port-fortward") | ||
| body, err := intls.FetchMetrics(e2eclient.K8sClient, rtePod, port) | ||
| Expect(err).ToNot(HaveOccurred(), "failed to fetch metrics from pod %q", rtePod.Name) | ||
| Expect(body).ToNot(BeEmpty(), "metrics respose body should not be empty") | ||
| Expect(string(body)).To(ContainSubstring("rte_noderesourcetopology_writes_total")) | ||
| } | ||
|
|
||
| func verifyRTETLSVersionRejected(rtePod *corev1.Pod, port string, minVersion uint16) { | ||
| GinkgoHelper() | ||
|
|
||
| belowMinVersion, err := intls.TLSVersionBelow(minVersion) | ||
| Expect(err).ToNot(HaveOccurred(), "cannot determine version below %s", tls.VersionName(minVersion)) | ||
|
|
||
| By(fmt.Sprintf("Verifying TLS %s is rejected", tls.VersionName(belowMinVersion))) | ||
| err = intls.ProbeMaxTLSVersion(e2eclient.K8sClient, rtePod, port, belowMinVersion) | ||
| Expect(err).To(HaveOccurred(), "should reject TLS %s", tls.VersionName(belowMinVersion)) | ||
| Expect(errors.Is(err, intls.ErrTLSHandshakeRejected)).To(BeTrue(), "got: %v", err) | ||
| } | ||
|
|
||
| func verifyRTETLSCipherRejected(rtePod *corev1.Pod, port string, minVersion uint16, profileSpec configv1.TLSProfileSpec) { | ||
| GinkgoHelper() | ||
|
|
||
| if minVersion == tls.VersionTLS13 { | ||
| klog.InfoS("TLS 1.3 ciphers are not individually configurable, skipping cipher rejection test") | ||
| return | ||
| } | ||
|
|
||
| disallowedCipher := intls.FindDisallowedCipher(profileSpec.Ciphers) | ||
| if disallowedCipher == "" { | ||
| Skip("all known TLS 1.2 ciphers are in the allowed set, nothing to test") | ||
| } | ||
|
|
||
| By(fmt.Sprintf("Verifying cipher %s is rejected", disallowedCipher)) | ||
| err := intls.ProbeTLSCipher(e2eclient.K8sClient, rtePod, port, disallowedCipher) | ||
| Expect(err).To(HaveOccurred(), "should reject cipher %s", disallowedCipher) | ||
| Expect(errors.Is(err, intls.ErrTLSHandshakeRejected)).To(BeTrue(), "got: %v", err) | ||
| } |
There was a problem hiding this comment.
if these are used only once let's have them inlined please.
There was a problem hiding this comment.
addressed in latest commit
Move getClusterTLSProfile to internal test utils function as GetClusterTLSProfile tls.Config logic moved to test/internal/tls/tls.go scheduler negative test By now names the disallowed cipher (e.g. ECDHE-ECDSA-AES128-SHA256) AI Attribution: AIA Human-AI blend, New content, Human-initiated, Reviewed, Cursor 3.0.16 v1.0 Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Add e2e tests that probe each RTE pod's metrics port directly over port-forward to confirm:
Add FetchMetrics helper in test/internal/tls to perform an HTTPS GET /metrics over a port-forwarded connection, reusing the existing port-forward and TLS probing infrastructure.
AI Attribution: AIA Human-AI blend, New content, Human-initiated, Reviewed, Cursor 3.0.16 v1.0