OCPBUGS-86473: Consolidate audit log must-gather tests to reduce parallel downloads and master node CPU pressure#31200
OCPBUGS-86473: Consolidate audit log must-gather tests to reduce parallel downloads and master node CPU pressure#31200xueqzhan wants to merge 5 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughConsolidates audit-logs validations into a single must-gather test: adds an explicit validation step, checks apiserver audit directories/lock.log for sequential execution, validates OAuth gzip audit files contain "auditID", removes now-redundant standalone tests, and simplifies two must-gather invocations to omit the explicit gather command. ChangesAudit Logs Test Refactoring
🎯 3 (Moderate) | ⏱️ ~20 minutes Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (2 errors, 3 warnings)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xueqzhan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/payload-job periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-rhcos9-techpreview |
|
@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/57d1de60-5473-11f1-8018-90fc888065f7-0 |
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview |
|
@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/71dac150-5473-11f1-8c56-bf533e7f0877-0 |
|
Scheduling required tests: |
|
/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-rhcos9-techpreview 5 |
|
@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/91671ef0-5474-11f1-98a8-f4e80f2c9b13-0 |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 4 |
|
@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ab855ea0-5474-11f1-8777-eccceea9d0ed-0 |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 6 |
|
@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a02bb390-54ad-11f1-84f9-af2b2ba73bbf-0 |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 6 |
|
@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a6785b50-5556-11f1-8ae8-0bd1020826e7-0 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/cli/mustgather.go`:
- Around line 294-295: The current code calls result.Flakef unconditionally when
lock.log exists, causing false flakes; update the logic around the
result.Flakef(apiserver, lockLog) call so it only reports a flake when the
captured lockLog actually contains content (e.g., check len(lockLog) > 0 or
strings.TrimSpace(string(lockLog)) != "" before calling result.Flakef). Locate
the call to result.Flakef and the lockLog variable in mustgather.go and wrap the
Flakef invocation in that conditional, returning nil (or continuing) when
lockLog is empty.
- Line 299: Replace the incorrect Gomega assertion
o.Expect(seen.HasAll(expectedAuditSubDirs...), o.BeTrue()) with the proper
matcher call by invoking .To(o.BeTrue()) on the Expect result (i.e.
o.Expect(seen.HasAll(expectedAuditSubDirs...)).To(o.BeTrue())), and update the
lock log handling so result.Flakef(...) is only called when lockLog actually
indicates the “lock was still held” condition (gate the flake report on a
specific substring/marker in lockLog such as the known "lock was still held"
message or other reliable indicator rather than any presence of lock.log).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: f74e07b4-39e8-4915-b499-6b698bcc1dfc
📒 Files selected for processing (1)
test/extended/cli/mustgather.go
|
Scheduling required tests: |
|
@xueqzhan: This pull request references Jira Issue OCPBUGS-86473, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 4 |
|
@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1643bc80-5936-11f1-95e8-b00b9e9fdb68-0 |
|
Scheduling required tests: |
| defer os.RemoveAll(tempDir) | ||
|
|
||
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/usr/bin/gather_audit_logs").Execute() | ||
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute() |
There was a problem hiding this comment.
This runs https://github.com/openshift/must-gather/blob/main/collection-scripts/gather but this does not run gather_audit_logs?
There was a problem hiding this comment.
These two tests (here and line 551) only check must-gather.logs wrapper metadata — they don't need any gathered content. @xueqzhan, Default gather is heavier than gather_audit_logs was. Consider -- /bin/true instead as gather.logs is created unconditionally by the client regardless of script output, and checkGatherLogsForImage only checks existence.
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute() | |
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/bin/true").Execute() |
There was a problem hiding this comment.
one of the other tests in this file uses -- /bin/bash -c "ls -l > /artifacts/ls.log" which would work also.
There was a problem hiding this comment.
Good suggestion
- applied
-- /bin/trueto the "Verify version" test since it only checksmust-gather.logswhich is created by the oc client wrapper. - Left the "Verify logs generated" test using default must-gather (not
/bin/true) because it callscheckGatherLogsForImagewhich expectsgather.logsto exist in image subdirectories, and I'm not confident/bin/truewould create those.
| } | ||
| }) | ||
|
|
||
| g.When("looking at the audit logs [apigroup:config.openshift.io]", func() { |
There was a problem hiding this comment.
Doesn't this rename the test which has an impact on CR dashboard?. Do we really need to rename the test?
There was a problem hiding this comment.
The primary test name is preserved. The two removed tests will disappear from the dashboard, which is expected since their validations still execute within the consolidated test. No existing passing test is being renamed — just two tests are being absorbed into an existing one.
| defer os.RemoveAll(tempDir) | ||
|
|
||
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/usr/bin/gather_audit_logs").Execute() | ||
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute() |
There was a problem hiding this comment.
These two tests (here and line 551) only check must-gather.logs wrapper metadata — they don't need any gathered content. @xueqzhan, Default gather is heavier than gather_audit_logs was. Consider -- /bin/true instead as gather.logs is created unconditionally by the client regardless of script output, and checkGatherLogsForImage only checks existence.
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute() | |
| err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/bin/true").Execute() |
| for _, file := range oauthAuditFiles { | ||
| f, err := os.Open(file) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| defer f.Close() |
There was a problem hiding this comment.
nit: the file remains open during the entire test. maybe wrap iteration in a closure:
for _, file := range oauthAuditFiles {
func() {
f, err := os.Open(file)
...
}()
}|
@xueqzhan: This pull request references Jira Issue OCPBUGS-86473, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary by CodeRabbit