LOG-8967: Add TLS Scanner e2e test#3269
Conversation
|
@jcantrill: This pull request references LOG-8967 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold |
WalkthroughAdds a TLS scanner testing framework and end-to-end tests: new types and scanner helper that deploys a Kubernetes Job, collects and parses scan results, and validates TLS compliance; integrates tests into the e2e suite and exposes an IMAGE_TLS_SCANNER Makefile variable. ChangesTLS scanner test infrastructure
Manifests and small config updates (unrelated DAG)
Sequence DiagramsequenceDiagram
actor E2E_Test as E2E Test
participant K8s as Kubernetes API
participant Job as TLS Scanner Job
participant Pod as Scanner Pod
participant Parser as Result Parser
participant Validator as Compliance Validator
E2E_Test->>K8s: Create ServiceAccount / RBAC / Job
K8s-->>E2E_Test: Job object created
E2E_Test->>K8s: Watch Job status
K8s->>Job: Schedule pod(s)
Job->>Pod: Start initContainer (scanner) -> write /tmp/scan-results.json
Pod->>Pod: results container cats file to logs
K8s-->>E2E_Test: Job.Status.Succeeded
E2E_Test->>K8s: Stream Pod logs (results container)
K8s-->>Parser: Log stream (JSON)
Parser-->>E2E_Test: []ScanResult
E2E_Test->>Validator: ValidateCompliance(results, TLSProfile)
Validator-->>E2E_Test: pass/fail
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 8 | ❌ 4❌ Failed checks (4 warnings)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoAdd TLS Scanner e2e test for workload compliance validation
WalkthroughsDescription• Add comprehensive e2e test suite for TLS Scanner integration • Validate operator and operand TLS configurations match cluster profile • Detect and verify TLS endpoints for both operator and operands • Implement TLS Scanner framework with job deployment and result parsing • Export TLS Scanner image configuration in Makefile for test execution Diagramflowchart LR
A["E2E Test Suite"] -->|deploys| B["TLS Scanner Job"]
B -->|scans| C["Operator & Operand Endpoints"]
C -->|generates| D["CSV Results"]
D -->|validates| E["Cluster TLS Profile Compliance"]
F["Scanner Framework"] -->|manages| B
G["Makefile Config"] -->|exports| F
File Changes1. test/e2e/operator/tls/e2e_test.go
|
Code Review by Qodo
1.
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcantrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
test/e2e/operator/tls/e2e_test.go (1)
182-184: ⚡ Quick winUse
strings.HasPrefixinstead of manual slice indexing for pod name prefix matchingThe guard
len(result.Pod) > len(prefix)uses strict>, which silently misses the edge case where a pod name is exactly as long as the prefix constant (unlikely in practice but logically wrong).strings.HasPrefixis simpler and handles all cases correctly:♻️ Proposed refactor
- if result.Component == constants.ClusterLoggingOperator || - (result.Pod != "" && len(result.Pod) > len(constants.ClusterLoggingOperator) && - result.Pod[:len(constants.ClusterLoggingOperator)] == constants.ClusterLoggingOperator) { + if result.Component == constants.ClusterLoggingOperator || + strings.HasPrefix(result.Pod, constants.ClusterLoggingOperator) {- if result.Pod != "" && len(result.Pod) > len(forwarderName) && - result.Pod[:len(forwarderName)] == forwarderName { + if strings.HasPrefix(result.Pod, forwarderName) {Also applies to: 191-192
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/operator/tls/e2e_test.go` around lines 182 - 184, The pod name prefix check in the conditional that compares result.Pod to constants.ClusterLoggingOperator uses manual slicing and a strict length check which misses the edge case where lengths are equal; replace that manual slice-based prefix test with strings.HasPrefix(result.Pod, constants.ClusterLoggingOperator) (and import "strings" if missing) for the condition that currently references result.Pod and constants.ClusterLoggingOperator; make the same change for the analogous check referenced at the other occurrence (the lines noted in the comment).test/framework/e2e/tls/scanner.go (1)
153-155: 💤 Low valueJob creation does not handle
AlreadyExists, unlike the SA and CRB setup aboveThe ServiceAccount and ClusterRoleBinding creation both skip
apierrors.IsAlreadyExists. The Job creation at line 153 returns an error unconditionally if the job already exists. While today this is protected by each test creating a freshscannerNS, ifDeployis ever called a second time in the same namespace the caller will get a confusing raw API error. A simple guard would keep it consistent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/framework/e2e/tls/scanner.go` around lines 153 - 155, The Job creation call using s.KubeClient.BatchV1().Jobs(scannerNamespace).Create currently returns an error on any conflict; update the logic to detect and handle apierrors.IsAlreadyExists (like the ServiceAccount/ClusterRoleBinding code does): call the Create, and if it fails, check apierrors.IsAlreadyExists(err) and if so either retrieve the existing job or log/ignore and continue returning the existing job object; otherwise return the wrapped error as before. Ensure you reference the createdJob/error from the Create call and use apierrors.IsAlreadyExists to guard the fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/operator/tls/e2e_test.go`:
- Around line 40-43: The test's skip is unreachable because
tlsscanner.GetImage() currently falls back to DefaultImage; update the GetImage
implementation in scanner.go so it returns an empty string when the environment
variable IMAGE_TLS_SCANNER is not set (rather than returning DefaultImage), so
tlsscanner.GetImage() can be used by the test to detect absence and trigger the
skip/AfterEach guard; ensure references to DefaultImage remain for explicit
defaulting elsewhere but not inside GetImage.
In `@test/framework/e2e/tls/scanner.go`:
- Around line 336-359: containsTLSVersion currently returns true if any
advertised version >= minVersion, which is wrong; update it so it iterates all
parsed versions (use parseTLSVersion) and returns false as soon as it finds any
version < minVersionNum, otherwise return true only if at least one version was
parsed and none were below the minimum; keep the existing behavior for empty
tlsVersions (return false) and for unparsable minVersion (minVersionNum == 0) if
you want to preserve the prior assumption.
- Around line 63-68: GetImage currently always returns a non-empty string by
falling back to DefaultImage, so the test's skip check (tlsscanner.GetImage() ==
"") is never true; fix by separating "configured vs. default" behavior: add a
new helper (e.g. IsScannerConfigured or GetConfiguredImage) that returns the raw
env var (empty if unset) and keep GetImage returning DefaultImage when env var
is empty; update Deploy to use GetImage() for the job image but change the
test's skip condition to call the new IsScannerConfigured/GetConfiguredImage
helper instead of GetImage(); reference symbols: GetImage, ImageEnvVar,
DefaultImage, Deploy, and the skip check in e2e_test.go.
---
Nitpick comments:
In `@test/e2e/operator/tls/e2e_test.go`:
- Around line 182-184: The pod name prefix check in the conditional that
compares result.Pod to constants.ClusterLoggingOperator uses manual slicing and
a strict length check which misses the edge case where lengths are equal;
replace that manual slice-based prefix test with strings.HasPrefix(result.Pod,
constants.ClusterLoggingOperator) (and import "strings" if missing) for the
condition that currently references result.Pod and
constants.ClusterLoggingOperator; make the same change for the analogous check
referenced at the other occurrence (the lines noted in the comment).
In `@test/framework/e2e/tls/scanner.go`:
- Around line 153-155: The Job creation call using
s.KubeClient.BatchV1().Jobs(scannerNamespace).Create currently returns an error
on any conflict; update the logic to detect and handle apierrors.IsAlreadyExists
(like the ServiceAccount/ClusterRoleBinding code does): call the Create, and if
it fails, check apierrors.IsAlreadyExists(err) and if so either retrieve the
existing job or log/ignore and continue returning the existing job object;
otherwise return the wrapped error as before. Ensure you reference the
createdJob/error from the Create call and use apierrors.IsAlreadyExists to guard
the fallback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b3f9556e-31a1-45b3-a939-e1388a5bdf1e
📒 Files selected for processing (4)
Makefiletest/e2e/operator/tls/e2e_test.gotest/e2e/operator/tls/suite_test.gotest/framework/e2e/tls/scanner.go
|
/label tide/merge-method-squash |
|
/test functional/target |
|
/retest |
|
@jcantrill: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
a136bf0 to
d1e1b4d
Compare
|
@rhmdnd you here for input about if this meets your expectations. I was able to successfully execute the test after some minor changes to the scanner. I've submitted a PR for that change along with one in openshift/release to mirror the image for broader use. Your input would be appreciated |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
test/framework/e2e/framework.go (1)
195-217:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBase cleanup and the return value on the final namespace object.
The new visitor hook runs after
namehas already been captured for cleanup, logging, and the return value. If a visitor setsnamespace.NameorGenerateName, this helper creates one namespace but returns/deletes another. Register cleanup and returnnamespace.Nameafter all visitors have run.♻️ Suggested fix
func (tc *E2ETestFramework) CreateNamespace(name string, visitors ...func(*corev1.Namespace)) string { - if value, found := os.LookupEnv("GENERATOR_NS"); found { - name = value - } else { - tc.AddCleanup(func() error { - opts := metav1.DeleteOptions{} - return tc.KubeClient.CoreV1().Namespaces().Delete(context.TODO(), name, opts) - }) - } namespace := &corev1.Namespace{ ObjectMeta: metav1.ObjectMeta{ Name: name, }, } for _, visitor := range visitors { visitor(namespace) } + if value, found := os.LookupEnv("GENERATOR_NS"); found { + namespace.Name = value + } else { + tc.AddCleanup(func() error { + opts := metav1.DeleteOptions{} + return tc.KubeClient.CoreV1().Namespaces().Delete(context.TODO(), namespace.Name, opts) + }) + } if err := tc.Test.Recreate(namespace); err != nil { clolog.Error(err, "Error") os.Exit(1) } - clolog.V(1).Info("Created namespace", "namespace", name) - return name + clolog.V(1).Info("Created namespace", "namespace", namespace.Name) + return namespace.Name }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/framework/e2e/framework.go` around lines 195 - 217, The CreateNamespace helper registers cleanup and returns the original name before visitor functions can modify namespace.Name/GenerateName, so move the tc.AddCleanup(...) call to after the visitors run and use namespace.Name in both the cleanup Delete(...) call and as the function return value; i.e., run visitors first, then register cleanup that deletes context.TODO(), namespace.Name, and finally return namespace.Name (not the local variable name).olm_deploy/scripts/env.sh (1)
7-13:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid defaulting the LFME image to a mutable
:latesttag.If this variable is left unset, the E2E tests start testing whatever image happens to be published as
log-file-metric-exporter:latestat that moment, decoupling test behavior from the operator branch and risking flakiness when the tag moves. Other image versions (LOGGING_VECTOR_VERSION,LOGGING_VERSION) use pinned defaults. Please derive the default fromLOGGING_VERSIONor pin a branch-compatible tag instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@olm_deploy/scripts/env.sh` around lines 7 - 13, The LFME image default is set to mutable ":latest" via LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION, which risks flaky tests; change the default to derive from the operator's LOGGING_VERSION (or a branch-compatible pinned tag) by replacing the default assignment for LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION with one that uses LOGGING_VERSION (e.g. LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION=${LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION:-${LOGGING_VERSION}}) and keep IMAGE_LOG_FILE_METRIC_EXPORTER using that variable so the LFME image is pinned consistently like IMAGE_LOGGING_VECTOR and IMAGE_CLUSTER_LOGGING_OPERATOR.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/framework/e2e/tls/scanner.go`:
- Around line 273-280: The ScanResult struct is missing Namespace and Pod values
so ValidateCompliance reports "Pod / port ..." without identifiers; set
result.Namespace and result.Pod when constructing ScanResult (e.g., populate
Namespace from ipResult.OpenShiftComponent.Namespace and Pod from ipResult.Pod
or ipResult.PodName depending on the scanner field name) or, if you prefer
changing validation, update ValidateCompliance to use existing fields
(Component, IP, Port) instead; adjust the code around ScanResult creation to
assign these namespace/pod fields consistently with the scanner output.
- Around line 32-63: The hard-coded fallback DefaultImage currently points to a
personal image; change DefaultImage to the project-owned image
"quay.io/openshift/tls-scanner:latest" (the commented-out value) so GetImage()
will return the project-owned default when os.Getenv(ImageEnvVar) is empty;
update the DefaultImage constant definition (symbol: DefaultImage) and leave
GetImage() and ImageEnvVar unchanged.
- Around line 171-187: The loop around
s.KubeClient.BatchV1().Jobs(scannerNamespace).Create retries immediately on
apierrors.IsAlreadyExists; change it to call deleteFn(), then wait for the old
Job to actually disappear (poll
s.KubeClient.BatchV1().Jobs(scannerNamespace).Get(job.Name, ...) until it
returns NotFound) with a bounded retry policy or context deadline (e.g.,
maxAttempts or timeout + small sleep/backoff between polls), and if the
timeout/retries are exceeded return an error instead of looping forever; keep
adding the cleanup via s.addCleanup only after a successful Create.
---
Outside diff comments:
In `@olm_deploy/scripts/env.sh`:
- Around line 7-13: The LFME image default is set to mutable ":latest" via
LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION, which risks flaky tests; change the
default to derive from the operator's LOGGING_VERSION (or a branch-compatible
pinned tag) by replacing the default assignment for
LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION with one that uses LOGGING_VERSION
(e.g.
LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION=${LOGGING_LOG_FILE_METRIC_EXPORTER_VERSION:-${LOGGING_VERSION}})
and keep IMAGE_LOG_FILE_METRIC_EXPORTER using that variable so the LFME image is
pinned consistently like IMAGE_LOGGING_VECTOR and
IMAGE_CLUSTER_LOGGING_OPERATOR.
In `@test/framework/e2e/framework.go`:
- Around line 195-217: The CreateNamespace helper registers cleanup and returns
the original name before visitor functions can modify
namespace.Name/GenerateName, so move the tc.AddCleanup(...) call to after the
visitors run and use namespace.Name in both the cleanup Delete(...) call and as
the function return value; i.e., run visitors first, then register cleanup that
deletes context.TODO(), namespace.Name, and finally return namespace.Name (not
the local variable name).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9e66b57e-2996-4910-a6d1-77be8c66e1e3
📒 Files selected for processing (10)
Makefilebundle/manifests/cluster-logging.clusterserviceversion.yamlconfig/manager/manager.yamlolm_deploy/scripts/env.shtest/e2e/operator/tls/e2e_test.gotest/e2e/operator/tls/suite_test.gotest/framework/e2e/framework.gotest/framework/e2e/splunk.gotest/framework/e2e/tls/scanner.gotest/framework/e2e/tls/types.go
✅ Files skipped from review due to trivial changes (2)
- config/manager/manager.yaml
- test/framework/e2e/splunk.go
🚧 Files skipped from review as they are similar to previous changes (2)
- test/e2e/operator/tls/suite_test.go
- Makefile
| for { | ||
| deleteFn := func() error { | ||
| deletePolicy := metav1.DeletePropagationForeground | ||
| return s.KubeClient.BatchV1().Jobs(scannerNamespace).Delete(context.TODO(), job.Name, metav1.DeleteOptions{ | ||
| PropagationPolicy: &deletePolicy, | ||
| }) | ||
| } | ||
| createdJob, err := s.KubeClient.BatchV1().Jobs(scannerNamespace).Create(context.TODO(), job, metav1.CreateOptions{}) | ||
| if err != nil && apierrors.IsAlreadyExists(err) { | ||
| deleteFn() | ||
| continue | ||
| } else if err != nil { | ||
| return nil, fmt.Errorf("failed to create TLS Scanner Job: %w", err) | ||
| } | ||
| s.addCleanup(deleteFn) | ||
| return createdJob, nil | ||
| } |
There was a problem hiding this comment.
Bound the delete-and-recreate loop.
Foreground deletion is asynchronous, but this loop retries Create immediately with no wait or deadline. If the old Job lingers, Deploy can spin forever and hang the spec. Please wait for the Job to disappear or cap retries before recreating it.
♻️ Suggested direction
- for {
+ deadline := time.Now().Add(30 * time.Second)
+ for time.Now().Before(deadline) {
deleteFn := func() error {
deletePolicy := metav1.DeletePropagationForeground
return s.KubeClient.BatchV1().Jobs(scannerNamespace).Delete(context.TODO(), job.Name, metav1.DeleteOptions{
PropagationPolicy: &deletePolicy,
})
}
createdJob, err := s.KubeClient.BatchV1().Jobs(scannerNamespace).Create(context.TODO(), job, metav1.CreateOptions{})
if err != nil && apierrors.IsAlreadyExists(err) {
- deleteFn()
+ if err := deleteFn(); err != nil && !apierrors.IsNotFound(err) {
+ return nil, fmt.Errorf("failed to delete existing TLS Scanner Job: %w", err)
+ }
+ time.Sleep(time.Second)
continue
} else if err != nil {
return nil, fmt.Errorf("failed to create TLS Scanner Job: %w", err)
}
s.addCleanup(deleteFn)
return createdJob, nil
}
+ return nil, fmt.Errorf("timed out recreating TLS Scanner Job %q", job.Name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for { | |
| deleteFn := func() error { | |
| deletePolicy := metav1.DeletePropagationForeground | |
| return s.KubeClient.BatchV1().Jobs(scannerNamespace).Delete(context.TODO(), job.Name, metav1.DeleteOptions{ | |
| PropagationPolicy: &deletePolicy, | |
| }) | |
| } | |
| createdJob, err := s.KubeClient.BatchV1().Jobs(scannerNamespace).Create(context.TODO(), job, metav1.CreateOptions{}) | |
| if err != nil && apierrors.IsAlreadyExists(err) { | |
| deleteFn() | |
| continue | |
| } else if err != nil { | |
| return nil, fmt.Errorf("failed to create TLS Scanner Job: %w", err) | |
| } | |
| s.addCleanup(deleteFn) | |
| return createdJob, nil | |
| } | |
| deadline := time.Now().Add(30 * time.Second) | |
| for time.Now().Before(deadline) { | |
| deleteFn := func() error { | |
| deletePolicy := metav1.DeletePropagationForeground | |
| return s.KubeClient.BatchV1().Jobs(scannerNamespace).Delete(context.TODO(), job.Name, metav1.DeleteOptions{ | |
| PropagationPolicy: &deletePolicy, | |
| }) | |
| } | |
| createdJob, err := s.KubeClient.BatchV1().Jobs(scannerNamespace).Create(context.TODO(), job, metav1.CreateOptions{}) | |
| if err != nil && apierrors.IsAlreadyExists(err) { | |
| if err := deleteFn(); err != nil && !apierrors.IsNotFound(err) { | |
| return nil, fmt.Errorf("failed to delete existing TLS Scanner Job: %w", err) | |
| } | |
| time.Sleep(time.Second) | |
| continue | |
| } else if err != nil { | |
| return nil, fmt.Errorf("failed to create TLS Scanner Job: %w", err) | |
| } | |
| s.addCleanup(deleteFn) | |
| return createdJob, nil | |
| } | |
| return nil, fmt.Errorf("timed out recreating TLS Scanner Job %q", job.Name) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/framework/e2e/tls/scanner.go` around lines 171 - 187, The loop around
s.KubeClient.BatchV1().Jobs(scannerNamespace).Create retries immediately on
apierrors.IsAlreadyExists; change it to call deleteFn(), then wait for the old
Job to actually disappear (poll
s.KubeClient.BatchV1().Jobs(scannerNamespace).Get(job.Name, ...) until it
returns NotFound) with a bounded retry policy or context deadline (e.g.,
maxAttempts or timeout + small sleep/backoff between polls), and if the
timeout/retries are exceeded return an error instead of looping forever; keep
adding the cleanup via s.addCleanup only after a successful Create.
Description
This PR:
Links
Summary by CodeRabbit
Tests
Chores