Skip to content

OCPBUGS-86473: Consolidate audit log must-gather tests to reduce parallel downloads and master node CPU pressure#31200

Open
xueqzhan wants to merge 5 commits into
openshift:mainfrom
xueqzhan:tp-cpu
Open

OCPBUGS-86473: Consolidate audit log must-gather tests to reduce parallel downloads and master node CPU pressure#31200
xueqzhan wants to merge 5 commits into
openshift:mainfrom
xueqzhan:tp-cpu

Conversation

@xueqzhan
Copy link
Copy Markdown
Contributor

@xueqzhan xueqzhan commented May 20, 2026

Summary by CodeRabbit

  • Tests
    • Consolidated and clarified the must-gather audit-logs test with explicit validation steps.
    • Now verifies apiserver audit directories were processed strictly sequentially and confirms OAuth audit entries include audit IDs.
    • Removed redundant standalone tests now covered by the consolidated audit-logs flow.
    • Adjusted related must-gather invocations to rely on default gather behavior while preserving existing log assertions.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 167dc8ab-3a8a-4615-863d-e368235abe27

📥 Commits

Reviewing files that changed from the base of the PR and between de95822 and 2477c1b.

📒 Files selected for processing (1)
  • test/extended/cli/mustgather.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/extended/cli/mustgather.go

Walkthrough

Consolidates audit-logs validations into a single must-gather test: adds an explicit validation step, checks apiserver audit directories/lock.log for sequential execution, validates OAuth gzip audit files contain "auditID", removes now-redundant standalone tests, and simplifies two must-gather invocations to omit the explicit gather command.

Changes

Audit Logs Test Refactoring

Layer / File(s) Summary
Audit-logs validation and added checks
test/extended/cli/mustgather.go
Adds an explicit g.By step label for audit validation, integrates sequential apiserver checks by walking apiserver audit subdirectories and inspecting lock.log, and validates OAuth audit gzip files by asserting the presence of auditID.
Removed redundant must-gather tests
test/extended/cli/mustgather.go
Removes the separate tests that previously validated sequential apiserver execution and login-attempt OAuth audit logs; those checks are moved into the consolidated audit-logs test.
Simplified must-gather invocations in other tests
test/extended/cli/mustgather.go
Update two tests to call oc adm must-gather with only --dest-dir (remove explicit -- /usr/bin/gather_audit_logs) while keeping existing must-gather.logs/output assertions.

🎯 3 (Moderate) | ⏱️ ~20 minutes


Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (2 errors, 3 warnings)

Check name Status Explanation Resolution
Container-Privileges ❌ Error PR adds deployment manifests with privileged: true, hostNetwork: true, and runAsUser: 0 in test infrastructure without documented security justification. Document security justification for privileged containers in manifests or refactor to use less privileged alternatives.
No-Sensitive-Data-In-Logs ❌ Error Line 521 logs the entire must-gather.logs file content via e2e.Logf(), which may expose sensitive data like tokens, passwords, or internal hostnames from system output. Remove the e2e.Logf("headContent is %s", headContent) call on line 521, or log only specific required substrings rather than the entire file content.
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning The "audit logs" test violates single responsibility by testing multiple unrelated behaviors. Many assertions lack meaningful failure messages. OAuth token resources created in test lack cleanup. Split test into focused cases; add error messages to assertions; add cleanup for OAuth tokens created during test setup.
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Test contains hardcoded IPv4 localhost (127.0.0.1) on lines 150 and 162 in RedirectURI fields, incompatible with IPv6-only disconnected environments. Replace hardcoded "https://127.0.0.1:12000/oauth/token/implicit" with IPv6-compatible address (::1) or use net.JoinHostPort() for dynamic host:port construction.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references OCPBUGS-86473 and accurately summarizes the main change: consolidating audit log must-gather tests to reduce parallel downloads and master node CPU pressure, which aligns with the core objectives described in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test titles (g.It) in mustgather.go are stable, deterministic strings with no dynamic content like pod names, timestamps, UUIDs, node names, or IP addresses.
Microshift Test Compatibility ✅ Passed All 9 new tests use APIs unavailable on MicroShift (config.openshift.io, oauth.openshift.io) but are properly protected with [apigroup:...] tags, which are automatically skipped by MicroShift CI jobs.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo tests are added in this PR. Changes consolidate existing must-gather tests; the audit logs test doesn't assume multiple nodes as all apiservers run on SNO's single node.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies test file (test/extended/cli/mustgather.go), not deployment manifests, operator code, or controllers. Custom check applies only when those are modified.
Ote Binary Stdout Contract ✅ Passed File test/extended/cli/mustgather.go contains no process-level stdout writes; all code runs within test blocks, uses only safe fmt functions (Errorf, Sprintf), and no klog/logging violations detected.
No-Weak-Crypto ✅ Passed No weak cryptographic patterns found. The PR only uses SHA256 (strong), gzip (not crypto), and UUID. No MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, custom crypto, or unsafe comparisons detected.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from p0lyn0mial and sjenning May 20, 2026 17:30
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xueqzhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2026
@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 20, 2026
@xueqzhan
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-rhcos9-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-rhcos9-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/57d1de60-5473-11f1-8018-90fc888065f7-0

@xueqzhan
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/71dac150-5473-11f1-8c56-bf533e7f0877-0

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@xueqzhan
Copy link
Copy Markdown
Contributor Author

/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-rhcos9-techpreview 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn-rhcos9-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/91671ef0-5474-11f1-98a8-f4e80f2c9b13-0

@xueqzhan
Copy link
Copy Markdown
Contributor Author

/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 4

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ab855ea0-5474-11f1-8777-eccceea9d0ed-0

@xueqzhan
Copy link
Copy Markdown
Contributor Author

/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 6

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a02bb390-54ad-11f1-84f9-af2b2ba73bbf-0

@xueqzhan
Copy link
Copy Markdown
Contributor Author

/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 6

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a6785b50-5556-11f1-8ae8-0bd1020826e7-0

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/cli/mustgather.go`:
- Around line 294-295: The current code calls result.Flakef unconditionally when
lock.log exists, causing false flakes; update the logic around the
result.Flakef(apiserver, lockLog) call so it only reports a flake when the
captured lockLog actually contains content (e.g., check len(lockLog) > 0 or
strings.TrimSpace(string(lockLog)) != "" before calling result.Flakef). Locate
the call to result.Flakef and the lockLog variable in mustgather.go and wrap the
Flakef invocation in that conditional, returning nil (or continuing) when
lockLog is empty.
- Line 299: Replace the incorrect Gomega assertion
o.Expect(seen.HasAll(expectedAuditSubDirs...), o.BeTrue()) with the proper
matcher call by invoking .To(o.BeTrue()) on the Expect result (i.e.
o.Expect(seen.HasAll(expectedAuditSubDirs...)).To(o.BeTrue())), and update the
lock log handling so result.Flakef(...) is only called when lockLog actually
indicates the “lock was still held” condition (gate the flake report on a
specific substring/marker in lockLog such as the known "lock was still held"
message or other reliable indicator rather than any presence of lock.log).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f74e07b4-39e8-4915-b499-6b698bcc1dfc

📥 Commits

Reviewing files that changed from the base of the PR and between f1b4d90 and f3601f5.

📒 Files selected for processing (1)
  • test/extended/cli/mustgather.go

Comment thread test/extended/cli/mustgather.go Outdated
Comment thread test/extended/cli/mustgather.go Outdated
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@xueqzhan xueqzhan changed the title Try to remove a crashing pod to test cpu situation in tp jobs OCPBUGS-86473: Consolidate audit log must-gather tests to reduce parallel downloads and master node CPU pressure May 26, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@xueqzhan: This pull request references Jira Issue OCPBUGS-86473, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary by CodeRabbit

  • Tests
  • Enhanced audit-logs must-gather test with clearer validation steps and consolidated coverage.
  • Now verifies apiserver audit directories were processed strictly sequentially and validates OAuth audit entries include audit IDs.
  • Removed redundant standalone tests whose checks are now covered in the consolidated flow.
  • Adjusted must-gather invocations in related tests to rely on default gather behavior while preserving output assertions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@xueqzhan
Copy link
Copy Markdown
Contributor Author

/payload-aggregate periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview 4

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

@xueqzhan: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-rhcos9-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1643bc80-5936-11f1-95e8-b00b9e9fdb68-0

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Comment thread test/extended/cli/mustgather.go Outdated
defer os.RemoveAll(tempDir)

err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/usr/bin/gather_audit_logs").Execute()
err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute()
Copy link
Copy Markdown
Member

@ardaguclu ardaguclu May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs https://github.com/openshift/must-gather/blob/main/collection-scripts/gather but this does not run gather_audit_logs?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests (here and line 551) only check must-gather.logs wrapper metadata — they don't need any gathered content. @xueqzhan, Default gather is heavier than gather_audit_logs was. Consider -- /bin/true instead as gather.logs is created unconditionally by the client regardless of script output, and checkGatherLogsForImage only checks existence.

Suggested change
err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute()
err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/bin/true").Execute()

Copy link
Copy Markdown
Contributor

@sanchezl sanchezl May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the other tests in this file uses -- /bin/bash -c "ls -l > /artifacts/ls.log" which would work also.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion

  • applied -- /bin/true to the "Verify version" test since it only checks must-gather.logs which is created by the oc client wrapper.
  • Left the "Verify logs generated" test using default must-gather (not /bin/true) because it calls checkGatherLogsForImage which expects gather.logs to exist in image subdirectories, and I'm not confident /bin/true would create those.

}
})

g.When("looking at the audit logs [apigroup:config.openshift.io]", func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this rename the test which has an impact on CR dashboard?. Do we really need to rename the test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary test name is preserved. The two removed tests will disappear from the dashboard, which is expected since their validations still execute within the consolidated test. No existing passing test is being renamed — just two tests are being absorbed into an existing one.

Comment thread test/extended/cli/mustgather.go Outdated
defer os.RemoveAll(tempDir)

err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/usr/bin/gather_audit_logs").Execute()
err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests (here and line 551) only check must-gather.logs wrapper metadata — they don't need any gathered content. @xueqzhan, Default gather is heavier than gather_audit_logs was. Consider -- /bin/true instead as gather.logs is created unconditionally by the client regardless of script output, and checkGatherLogsForImage only checks existence.

Suggested change
err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir).Execute()
err = oc.AsAdmin().WithoutNamespace().Run("adm").Args("must-gather", "--dest-dir="+tempDir, "--", "/bin/true").Execute()

Comment thread test/extended/cli/mustgather.go Outdated
for _, file := range oauthAuditFiles {
f, err := os.Open(file)
o.Expect(err).NotTo(o.HaveOccurred())
defer f.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the file remains open during the entire test. maybe wrap iteration in a closure:

for _, file := range oauthAuditFiles {
      func() {
          f, err := os.Open(file)
          ...
      }()
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@xueqzhan: This pull request references Jira Issue OCPBUGS-86473, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Summary by CodeRabbit

  • Tests
  • Consolidated and clarified the must-gather audit-logs test with explicit validation steps.
  • Now verifies apiserver audit directories were processed strictly sequentially and confirms OAuth audit entries include audit IDs.
  • Removed redundant standalone tests now covered by the consolidated audit-logs flow.
  • Adjusted related must-gather invocations to rely on default gather behavior while preserving existing log assertions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants