OAPE-696: Add E2E coverage reporting with Codecov integration#111
OAPE-696: Add E2E coverage reporting with Codecov integration#111PillaiManish wants to merge 2 commits into
Conversation
Add coverage-instrumented build and collection scripts for E2E test coverage reporting. Uses a no-PVC approach: SIGUSR1 flushes coverage data from the running operator pod, which is then copied out via oc cp and uploaded to Codecov. Co-authored-by: Cursor <cursoragent@cursor.com>
WalkthroughAdded E2E coverage instrumentation infrastructure for the secrets-store-csi-driver-operator. Includes a coverage-instrumented container image build via a new Dockerfile, Makefile targets for building/pushing the image and collecting coverage artifacts, and a lifecycle script that patches the operator's CSV, orchestrates pod rollouts, flushes coverage, and optionally uploads coverage reports to Codecov. ChangesE2E Coverage Instrumentation and Collection
Sequence DiagramsequenceDiagram
participant User as User/CI
participant K8s as Kubernetes<br/>(CSV, Deployment)
participant Pod as Operator Pod<br/>(coverage-instrumented)
participant LocalFS as Local Filesystem<br/>(artifacts)
participant Codecov as Codecov Service
User->>K8s: setup: patch CSV<br/>replace image, set GOCOVERDIR
K8s->>Pod: rollout deployment<br/>new pod with coverage binary
Pod->>Pod: run operator<br/>collect coverage data
User->>Pod: collect: send SIGTERM<br/>flush coverage to disk
Pod->>Pod: restart (ready again)
Pod->>LocalFS: copy coverage artifacts<br/>from /tmp/e2e-cover
LocalFS->>LocalFS: convert via go tool covdata<br/>generate coverage-e2e.out
alt CODECOV_TOKEN present
LocalFS->>Codecov: download uploader<br/>(with SHA256/PGP verify)
LocalFS->>Codecov: upload coverage report<br/>(non-fatal on failure)
Codecov-->>LocalFS: report acknowledged
end
LocalFS->>User: coverage report ready<br/>in ARTIFACT_DIR
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: PillaiManish The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@PillaiManish: This pull request references OAPE-696 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/secrets-store-csi-driver-operator/coverage_flush.go`:
- Around line 14-16: The GOCOVERDIR creation currently uses os.MkdirAll(dir,
0o777) and ignores errors; change this to create the directory with restrictive
permissions (e.g., 0o700 or 0o755 as appropriate) and handle the returned error
from os.MkdirAll instead of discarding it: call os.MkdirAll(dir, 0o700), check
the error, and propagate or log/fail fast (using the existing operator logger or
by returning the error) so coverage flush code won’t proceed when directory
creation fails; update the block that reads the GOCOVERDIR env var and the
os.MkdirAll call accordingly.
In `@hack/e2e-coverage.sh`:
- Around line 93-95: The current send of SIGUSR1 uses a hardcoded PID 1 (the
line invoking oc exec ... 'kill -USR1 1'), which can be wrong if the operator
isn't PID 1; update the script to locate the actual operator process inside the
container (using pidof or pgrep) and send SIGUSR1 to that PID, falling back to
PID 1 if no process is found. Inside the block that uses "${NAMESPACE}" and
"${pod}", run something like pidof <operator-binary-name> || pgrep -f
<operator-binary-name> to capture the PID(s) and then call kill -USR1 "$PID" (or
kill -USR1 1 if the lookup returns empty) so the script remains compatible with
single-binary containers while handling wrapper/init cases. Ensure the lookup
happens inside the oc exec command so the PID refers to the container namespace.
In `@Makefile`:
- Around line 80-85: The build-coverage target is generating invalid Go flag
syntax by naively concatenating "$(GO_BUILD_FLAGS),e2ecoverage" and the
suggested "-tags e2ecoverage" breaks FIPS by producing duplicate -tags; fix by
producing a single -tags value for the coverage build: detect if GO_BUILD_FLAGS
already contains a -tags clause and if so append ",e2ecoverage" to that existing
tag list, otherwise add a new "-tags e2ecoverage" flag; implement this logic in
the Makefile when constructing flags for the build-coverage target (use
GO_BUILD_FLAGS and create/compute a temporary coverage tags variable or a
modified FLAGS variable) so the final go build invocation has at most one -tags
flag and includes e2ecoverage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f4068362-35f1-4771-9630-6d924234b1da
📒 Files selected for processing (4)
Dockerfile.coverageMakefilecmd/secrets-store-csi-driver-operator/coverage_flush.gohack/e2e-coverage.sh
| echo "Sending SIGUSR1 to flush coverage data..." | ||
| oc exec -n "${NAMESPACE}" "${pod}" -- /bin/sh -c 'kill -USR1 1' | ||
| sleep 3 |
There was a problem hiding this comment.
PID 1 assumption may not hold in all environments.
The script sends SIGUSR1 to PID 1, assuming the operator binary is the init process. While this is typical for single-binary containers, it may not be true if the container uses a wrapper script, an init system, or runs multiple processes. Consider using pgrep or pidof to find the actual operator process.
🔍 Proposed improvement
- echo "Sending SIGUSR1 to flush coverage data..."
- oc exec -n "${NAMESPACE}" "${pod}" -- /bin/sh -c 'kill -USR1 1'
- sleep 3
+ echo "Sending SIGUSR1 to flush coverage data..."
+ local pid
+ pid=$(oc exec -n "${NAMESPACE}" "${pod}" -- /bin/sh -c 'pidof secrets-store-csi-driver-operator || echo 1')
+ oc exec -n "${NAMESPACE}" "${pod}" -- /bin/sh -c "kill -USR1 ${pid}"
+ sleep 3This falls back to PID 1 if pidof doesn't find the process, maintaining backward compatibility.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "Sending SIGUSR1 to flush coverage data..." | |
| oc exec -n "${NAMESPACE}" "${pod}" -- /bin/sh -c 'kill -USR1 1' | |
| sleep 3 | |
| echo "Sending SIGUSR1 to flush coverage data..." | |
| local pid | |
| pid=$(oc exec -n "${NAMESPACE}" "${pod}" -- /bin/sh -c 'pidof secrets-store-csi-driver-operator || echo 1') | |
| oc exec -n "${NAMESPACE}" "${pod}" -- /bin/sh -c "kill -USR1 ${pid}" | |
| sleep 3 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hack/e2e-coverage.sh` around lines 93 - 95, The current send of SIGUSR1 uses
a hardcoded PID 1 (the line invoking oc exec ... 'kill -USR1 1'), which can be
wrong if the operator isn't PID 1; update the script to locate the actual
operator process inside the container (using pidof or pgrep) and send SIGUSR1 to
that PID, falling back to PID 1 if no process is found. Inside the block that
uses "${NAMESPACE}" and "${pod}", run something like pidof
<operator-binary-name> || pgrep -f <operator-binary-name> to capture the PID(s)
and then call kill -USR1 "$PID" (or kill -USR1 1 if the lookup returns empty) so
the script remains compatible with single-binary containers while handling
wrapper/init cases. Ensure the lookup happens inside the oc exec command so the
PID refers to the container namespace.
| .PHONY: build-coverage | ||
| build-coverage: ## Build the operator binary with coverage instrumentation. | ||
| $(GO) build $(GO_MOD_FLAGS) $(GO_BUILD_FLAGS),e2ecoverage $(GO_LD_FLAGS) \ | ||
| -cover -covermode=atomic -coverpkg=./... \ | ||
| -o secrets-store-csi-driver-operator \ | ||
| ./cmd/secrets-store-csi-driver-operator |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Test build-coverage target syntax by checking GO_BUILD_FLAGS expansion
# Extract GO_BUILD_FLAGS definition and test both FIPS and non-FIPS cases
echo "=== Non-FIPS case ==="
echo "GO_BUILD_FLAGS = -trimpath"
echo "Expanded: go build -trimpath,e2ecoverage"
echo "This would fail with: unknown flag: -trimpath,e2ecoverage"
echo ""
echo "=== FIPS case ==="
echo "GO_BUILD_FLAGS = -trimpath -tags strictfipsruntime,openssl"
echo "Expanded: go build -trimpath -tags strictfipsruntime,openssl,e2ecoverage"
echo "This works but is fragile (assumes GO_BUILD_FLAGS ends with -tags)"Repository: openshift/secrets-store-csi-driver-operator
Length of output: 453
🏁 Script executed:
# Find the Makefile and examine the build-coverage target and GO_BUILD_FLAGS definition
find . -name "Makefile" -type f | head -20Repository: openshift/secrets-store-csi-driver-operator
Length of output: 1152
🏁 Script executed:
# Check the actual build-coverage target and surrounding context
head -n 85 ./Makefile | tail -n 20Repository: openshift/secrets-store-csi-driver-operator
Length of output: 574
🏁 Script executed:
# Find GO_BUILD_FLAGS definition in the Makefile
rg -n "GO_BUILD_FLAGS" ./Makefile -A 2 -B 2Repository: openshift/secrets-store-csi-driver-operator
Length of output: 840
🏁 Script executed:
# Verify the actual behavior - check if build-coverage would fail with the current syntax
# Let's understand what the expansion would be in both cases
echo "=== Current state in non-FIPS mode ==="
echo "GO_BUILD_FLAGS=-trimpath"
echo "Expanded command: go build -trimpath,e2ecoverage"
echo ""
echo "=== Current state in FIPS mode ==="
echo "GO_BUILD_FLAGS=-trimpath -tags strictfipsruntime,openssl"
echo "Expanded command: go build -trimpath -tags strictfipsruntime,openssl,e2ecoverage"
echo ""
echo "=== Testing proposed fix in FIPS mode ==="
echo "Proposed: GO_BUILD_FLAGS -tags e2ecoverage"
echo "FIPS expansion: go build -trimpath -tags strictfipsruntime,openssl -tags e2ecoverage"
echo "(This has TWO -tags flags which is invalid)"Repository: openshift/secrets-store-csi-driver-operator
Length of output: 589
Fix build-coverage flags syntax error—proposed fix is incorrect for FIPS mode.
The $(GO_BUILD_FLAGS),e2ecoverage concatenation produces invalid syntax in non-FIPS mode. When GO_BUILD_FLAGS is -trimpath, this expands to go build -trimpath,e2ecoverage, which is invalid Go build syntax.
However, the proposed fix ($(GO_BUILD_FLAGS) -tags e2ecoverage) breaks FIPS mode by creating duplicate -tags flags: go build -trimpath -tags strictfipsruntime,openssl -tags e2ecoverage. Go's build command does not accept multiple -tags flags.
The correct solution requires either:
- Appending to the tags list when
-tagsalready exists inGO_BUILD_FLAGS, or - Creating a separate tags variable for the coverage target that merges with existing FIPS tags
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 80 - 85, The build-coverage target is generating
invalid Go flag syntax by naively concatenating "$(GO_BUILD_FLAGS),e2ecoverage"
and the suggested "-tags e2ecoverage" breaks FIPS by producing duplicate -tags;
fix by producing a single -tags value for the coverage build: detect if
GO_BUILD_FLAGS already contains a -tags clause and if so append ",e2ecoverage"
to that existing tag list, otherwise add a new "-tags e2ecoverage" flag;
implement this logic in the Makefile when constructing flags for the
build-coverage target (use GO_BUILD_FLAGS and create/compute a temporary
coverage tags variable or a modified FLAGS variable) so the final go build
invocation has at most one -tags flag and includes e2ecoverage.
Remove coverage_flush.go and use SIGTERM to flush coverage data instead of SIGUSR1. Container restarts after SIGTERM, emptyDir preserves coverage files, and oc cp runs from the restarted container. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Dockerfile.coverage (1)
6-9: ⚡ Quick winDrop root default in the coverage runtime image.
The runtime stage has no explicit
USER, so it defaults to root. Please set a non-root user for defense-in-depth.Proposed hardening
FROM registry.ci.openshift.org/ocp/4.22:base-rhel9 COPY --from=builder /go/src/github.com/openshift/secrets-store-csi-driver-operator/secrets-store-csi-driver-operator /usr/bin/ ENV GOCOVERDIR=/tmp/e2e-cover +USER 65532 ENTRYPOINT ["/bin/sh", "-c", "mkdir -p /tmp/e2e-cover && exec /usr/bin/secrets-store-csi-driver-operator \"$@\"", "--"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile.coverage` around lines 6 - 9, The runtime stage in Dockerfile.coverage defaults to root; add a non-root user and switch to it before ENTRYPOINT to improve hardening: create a dedicated user/group (or use an existing non-root UID), ensure ownership/permissions of /tmp/e2e-cover and the binary at /usr/bin/secrets-store-csi-driver-operator are adjusted (chown/chmod) so the non-root user can write to GOCOVERDIR and execute the binary, then add a USER instruction before ENTRYPOINT to run the container as that non-root user.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@Dockerfile.coverage`:
- Around line 6-9: The runtime stage in Dockerfile.coverage defaults to root;
add a non-root user and switch to it before ENTRYPOINT to improve hardening:
create a dedicated user/group (or use an existing non-root UID), ensure
ownership/permissions of /tmp/e2e-cover and the binary at
/usr/bin/secrets-store-csi-driver-operator are adjusted (chown/chmod) so the
non-root user can write to GOCOVERDIR and execute the binary, then add a USER
instruction before ENTRYPOINT to run the container as that non-root user.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 67bd0df7-10b1-406d-8f94-fae8b631578c
📒 Files selected for processing (3)
Dockerfile.coverageMakefilehack/e2e-coverage.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- hack/e2e-coverage.sh
|
@PillaiManish: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Dockerfile.coverage) that builds the operator with Go's-coverflagscoverage_flush.go) gated behind a//go:build e2ecoveragetag, excluded from production buildshack/e2e-coverage.shwithsetupandcollectsubcommands for CI and local useHow it works (no-PVC approach)
Dockerfile.coveragesetuppatches the live CSV to swap in the coverage image and setGOCOVERDIRcollectsends SIGUSR1 to flush coverage data, copies it from the running pod viaoc cp, converts to a Go profile, and uploads to CodecovNo PVC or extractor pod is needed -- coverage data is flushed on demand while the pod is still alive.
Tested
-cover -covermode=atomic -coverpkg=./...coverage_flush.goexcluded withoute2ecoveragetag)Follow-up (separate PR)
openshift/releaseCI config changes to wire up coverage steps in Prow jobsMade with Cursor
Summary by CodeRabbit