Skip to content

WIP: feat(must-gather): Refactor to golang to remove oc dependency. LOG-9008:#3315

Open
jcantrill wants to merge 22 commits into
openshift:masterfrom
jcantrill:log9008
Open

WIP: feat(must-gather): Refactor to golang to remove oc dependency. LOG-9008:#3315
jcantrill wants to merge 22 commits into
openshift:masterfrom
jcantrill:log9008

Conversation

@jcantrill

@jcantrill jcantrill commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

This PR:

  1. Refactors the must-gather scripts to golang to remove oc dependency
  2. Attempts to preserve directory structure
  3. Remove "cluster-logging" which duplicates other artifacts
  4. Remove obsolete ELK artifacts
  5. Group pod logs to core/pods for consistency with oc adm inspect
  6. Remove openshift types not in use

Links

LOG-9007

cc @vparfonov @JoaoBraveCoding @xperimental

Summary by CodeRabbit

  • New Features
    • Must-gather is now a compiled executable with CLI options for output destination, logging namespace, and debug logging.
    • Collection runs via built-in concurrent collectors for cluster-scoped data, per-namespace resources and pod logs, UI plugin artifacts, Prometheus rules, and LokiStack.
    • Secret values in exported YAML are redacted.
  • Chores
    • Container runtime image was slimmed down and now ships only the compiled must-gather artifacts; gather is provided as a symlink to the executable.
  • Tests
    • The e2e “Forward to Lokistack” test is forced to fail (temporary).

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 18, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 18, 2026

Copy link
Copy Markdown

@jcantrill: This pull request references LOG-9008 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.8.0" version, but no target version was set.

Details

In response to this:

Description

This PR:

  1. Refactors the must-gather scripts to golang to remove oc dependency
  2. Attempts to preserve directory structure
  3. Remove "cluster-logging" which duplicates other artifacts
  4. Remove obsolete ELK artifacts
  5. Group pod logs to core/pods for consistency with oc adm inspect
  6. Remove openshift types not in use

Links

LOG-9007

cc @vparfonov @JoaoBraveCoding @xperimental

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcantrill

Copy link
Copy Markdown
Contributor Author

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Replaces shell-based must-gather collection with a Go binary, new shared API types, a Kubernetes client wrapper, multiple collectors, a concurrent orchestrator, and a CLI entrypoint. Updates the Makefile and Dockerfiles to build and package the binary.

Changes

Must-gather binary migration

Layer / File(s) Summary
API contracts and client
must-gather/internal/api/types.go, must-gather/internal/api/path.go, must-gather/internal/api/logger.go, must-gather/internal/api/path_test.go, must-gather/internal/api/suite_test.go, must-gather/internal/client/client.go
Defines the shared logger, path, config, collector, and result types; adds path helpers and tests; and introduces the Kubernetes client wrapper for list, exec, and YAML write operations.
Core collectors
must-gather/internal/cluster/collector.go, must-gather/internal/namespace/collector.go, must-gather/internal/collection/collection_collector.go
Implements cluster-scoped, namespace-scoped, and CLO-specific collection flows, including namespace discovery, pod log capture, and operator environment/build-info extraction.
Metrics, LokiStack, UIPlugin, and script removals
must-gather/internal/metrics/collector.go, must-gather/internal/logstore/lokistack/collector.go, must-gather/internal/ui/uiplugin_collector.go, must-gather/collection-scripts/*
Adds collectors for Prometheus rules, LokiStack resources, and UIPlugin/console data, while removing the shell scripts and helper functions they replace.
Orchestrator, CLI, and packaging
must-gather/gather.go, must-gather/cmd/main.go, Makefile, Dockerfile, Dockerfile.art, test/e2e/logforwarding/lokistack/forward_to_lokistack_test.go
Wires the concurrent gather orchestrator, adds the CLI entrypoint, updates build targets, changes container packaging to the Go binary, and adds the test failure marker.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐇 Bash hops out, Go hops in with style,
Collectors march together, mile by mile.
A Path, a Logger, and clients on the run,
The gatherer hums—its work is done!
From Dockerfile to runtime, bright and clean,
A must-gather ballet on a greener machine.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive Description covers main objectives and references a JIRA issue, but lacks reviewer/approver assignments and proper formatting that the template requires. Add /cc and /assign directives with specific reviewers/approvers from the OWNERS file to meet template requirements.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly indicates the main change: refactoring must-gather from bash/oc to Go, which is the core objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from Clee2691 and vparfonov June 18, 2026 21:06
@openshift-ci

openshift-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added midstream/Dockerfile A Dockerfile.in sync is needed with midstream approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 18, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (1)
must-gather/cmd/main.go (1)

51-55: ⚖️ Poor tradeoff

Consider adding signal handling for graceful cancellation.

Using context.Background() means the gather operation cannot be cancelled via SIGINT/SIGTERM. For a potentially long-running collection process, signal-based cancellation would improve the user experience.

This is optional for initial implementation but worth considering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@must-gather/cmd/main.go` around lines 51 - 55, The gather.Run() call in main
uses context.Background() which cannot be cancelled, preventing graceful
shutdown on SIGINT/SIGTERM signals. Replace the context.Background() call with a
context created via signal.NotifyContext() that listens for os.Interrupt and
syscall.SIGTERM signals. This will allow the gather operation to detect
cancellation requests and shut down gracefully when the user sends those
signals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@must-gather/cmd/main.go`:
- Around line 18-23: The `logFileName` variable is set from the `--log-file`
flag but the value is not being passed to the `NewGather` function, which
hardcodes the LogFileName field to "gather-debug.log". Pass the `logFileName`
variable as a parameter to the `NewGather` function call (located after the flag
parsing in main.go), and update the `NewGather` function signature in
`must-gather/gather.go` to accept and use this parameter value instead of
hardcoding it to "gather-debug.log".
- Line 39: The logFile.Close() call in the defer statement is not checking for
potential errors. Since Close() can fail when flushing buffered data to a file,
you need to handle the error it returns. Modify the defer statement to capture
and check the error returned by logFile.Close(), and handle it appropriately by
logging the error or returning it to the caller so failures are not silently
ignored.

In `@must-gather/gather.go`:
- Around line 82-85: The Run method currently returns nil unconditionally after
calling g.logResults(results), preventing callers from detecting when collectors
have failed. Instead of returning nil, you need to check the results for any
collector failures after logging, aggregate those errors into a single error
value, and return that aggregated error if any failures occurred, or nil if all
collectors succeeded. This allows callers to detect when the gather operation
was unsuccessful.

In `@must-gather/internal/api/logger.go`:
- Around line 20-24: The fmt.Fprintf call in the Log method of DefaultLogger is
not checking its error return value, which causes write failures to be silently
ignored and fails errcheck linting. Capture the error return value from the
fmt.Fprintf call and handle it explicitly by either returning the error from the
Log method signature, panicking on write failure, or logging the error to an
appropriate fallback output.

In `@must-gather/internal/client/client.go`:
- Around line 167-180: The ListResources function currently logs write errors
from WriteResourceToFile but continues processing and returns nil regardless of
whether any writes failed. Track whether any write errors occurred during the
loop by introducing an error variable that gets set when WriteResourceToFile
fails. Keep the existing warning log for debugging purposes, but after the loop
completes over listObj.Items, check if any errors were encountered and return an
error value instead of always returning nil to properly signal failure to the
caller.

In `@must-gather/internal/logstore/lokistack/collector.go`:
- Around line 47-50: The error handling in the LokiStack collector is
incorrectly treating all errors from DynamicClient.Resource(lokiGVR).List() as a
missing CRD and silently returning nil. Instead of catching all errors with a
generic message, examine the specific error type to determine if it truly
represents a missing CRD (like a not found or NoMatchError) versus other
failures like RBAC denials or transient API issues. For CRD-not-found errors,
continue with the current skip behavior, but for other errors, log them
appropriately and return the error to avoid silently dropping LokiStack
artifacts.

In `@must-gather/internal/metrics/collector.go`:
- Line 82: The WriteFile method call on line 82 returns an error that is
currently being ignored when writing to error.log. Capture the error return
value from the WriteFile call (in the line where
resultPath.Add("error.log").WriteFile is invoked) and handle it appropriately by
checking if the error is non-nil and taking appropriate action such as logging
or returning the error, rather than silently dropping the write failure.

In `@must-gather/internal/namespace/collector.go`:
- Around line 111-154: The collector creates unbounded goroutines in nested
loops: one for each namespace (starting at the outer loop around line 111) and
another for each resource within each namespace (in the inner loop around line
123). On large clusters, this causes runaway goroutine creation leading to API
saturation. Implement concurrency limiting using a semaphore pattern (such as a
buffered channel) to bound the number of concurrent goroutines for both the
namespace iteration loop and the resource iteration loop within it. Similarly,
the collectPodLogs method (called around line 151) also needs to be reviewed and
bounded if it creates additional unbounded goroutines. Apply the same semaphore
limiting pattern throughout to cap concurrent operations to a reasonable limit.
- Line 233: The defer statement for logs.Close() is currently ignoring the error
return value, which causes the errcheck static analysis tool to fail. Modify the
defer statement to capture and handle the error returned by logs.Close(), either
by logging the error or returning it from the function. This will satisfy the
errcheck linter requirement and ensure proper error handling during resource
cleanup.
- Around line 235-250: Replace the memory-buffering approach using io.ReadAll
with direct streaming to avoid memory pressure on large pod logs. Instead of
reading all log data into memory first and then writing it, create the
destination directory and file first, then stream the log data directly from the
logs reader to the file using io.Copy or similar streaming mechanism. This
eliminates the intermediate buffering step in the section where logData is
captured and later passed to logFile.WriteFile.

In `@must-gather/internal/ui/uiplugin_collector.go`:
- Around line 48-52: The error handling in the UIPlugin list operation treats
all errors as if the CRD is not available, which masks permission errors and
transient API failures. Modify the error handling block around the uiPluginList
list operation to check if the error specifically indicates a resource not found
condition (CRD absence) versus other types of failures. If the error is a
resource-not-found error, log it as CRD not available and return nil. For other
errors (permission denied, transient failures, etc.), either return the error or
log it at a warning/error level without skipping collection. Use appropriate
Kubernetes error checking utilities or type assertions to differentiate between
a missing CRD and other API/permission failures.

---

Nitpick comments:
In `@must-gather/cmd/main.go`:
- Around line 51-55: The gather.Run() call in main uses context.Background()
which cannot be cancelled, preventing graceful shutdown on SIGINT/SIGTERM
signals. Replace the context.Background() call with a context created via
signal.NotifyContext() that listens for os.Interrupt and syscall.SIGTERM
signals. This will allow the gather operation to detect cancellation requests
and shut down gracefully when the user sends those signals.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e3b574ac-ae00-4437-8647-e8451f03e21e

📥 Commits

Reviewing files that changed from the base of the PR and between d2691e4 and 25e1956.

📒 Files selected for processing (24)
  • Dockerfile
  • Dockerfile.art
  • Makefile
  • must-gather/cmd/main.go
  • must-gather/collection-scripts/common
  • must-gather/collection-scripts/gather
  • must-gather/collection-scripts/gather_cluster_logging_operator_resources
  • must-gather/collection-scripts/gather_collection_resources
  • must-gather/collection-scripts/gather_logstore_resources
  • must-gather/collection-scripts/gather_monitoring
  • must-gather/collection-scripts/monitoring_common.sh
  • must-gather/gather.go
  • must-gather/internal/api/logger.go
  • must-gather/internal/api/path.go
  • must-gather/internal/api/path_test.go
  • must-gather/internal/api/suite_test.go
  • must-gather/internal/api/types.go
  • must-gather/internal/client/client.go
  • must-gather/internal/cluster/collector.go
  • must-gather/internal/collection/collection_collector.go
  • must-gather/internal/logstore/lokistack/collector.go
  • must-gather/internal/metrics/collector.go
  • must-gather/internal/namespace/collector.go
  • must-gather/internal/ui/uiplugin_collector.go
💤 Files with no reviewable changes (7)
  • must-gather/collection-scripts/gather_cluster_logging_operator_resources
  • must-gather/collection-scripts/gather_collection_resources
  • must-gather/collection-scripts/gather_monitoring
  • must-gather/collection-scripts/monitoring_common.sh
  • must-gather/collection-scripts/gather
  • must-gather/collection-scripts/common
  • must-gather/collection-scripts/gather_logstore_resources

Comment thread must-gather/cmd/main.go
Comment thread must-gather/cmd/main.go Outdated
Comment thread must-gather/gather.go
Comment thread must-gather/internal/api/logger.go
Comment thread must-gather/internal/client/client.go
Comment thread must-gather/internal/metrics/collector.go Outdated
Comment thread must-gather/internal/namespace/collector.go
Comment thread must-gather/internal/namespace/collector.go Outdated
Comment thread must-gather/internal/namespace/collector.go Outdated
Comment thread must-gather/internal/ui/uiplugin_collector.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/logforwarding/lokistack/forward_to_lokistack_test.go`:
- Line 136: The unconditional Fail(">>>>> REMOVE ME <<< TESTING MUST-GATHER")
call in the It block prevents the test from ever passing, even when the
forwarding functionality works correctly. Remove this Fail statement entirely
from the test flow, or if temporary debug output is needed for development
purposes, move it outside of the mergeable test code path behind explicit
debug-only conditional logic that would not execute in normal test runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3ccd9a29-6c81-43f9-a46b-c462875645ee

📥 Commits

Reviewing files that changed from the base of the PR and between 6c3b905 and 4db50b6.

📒 Files selected for processing (9)
  • must-gather/cmd/main.go
  • must-gather/gather.go
  • must-gather/internal/api/logger.go
  • must-gather/internal/client/client.go
  • must-gather/internal/logstore/lokistack/collector.go
  • must-gather/internal/metrics/collector.go
  • must-gather/internal/namespace/collector.go
  • must-gather/internal/ui/uiplugin_collector.go
  • test/e2e/logforwarding/lokistack/forward_to_lokistack_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • must-gather/cmd/main.go
  • must-gather/internal/api/logger.go
  • must-gather/internal/logstore/lokistack/collector.go
  • must-gather/internal/ui/uiplugin_collector.go
  • must-gather/internal/namespace/collector.go
  • must-gather/internal/metrics/collector.go
  • must-gather/internal/client/client.go

found, err = lokistackReceiver.HasInfrastructureLogs(serviceAccount.Name, framework.DefaultWaitForLogsTimeout)
Expect(err).To(BeNil())
Expect(found).To(BeTrue())
Fail(">>>>> REMOVE ME <<< TESTING MUST-GATHER")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove unconditional failure in test path

Line 136 unconditionally fails the It block, so this scenario can never pass even when forwarding works. This should be removed (or guarded behind explicit debug-only logic outside mergeable test flow).

Suggested fix
-        Fail(">>>>> REMOVE ME <<< TESTING MUST-GATHER")

As per coding guidelines, "Review Ginkgo test code for quality..." requires production-quality test behavior, and this hard-stop violates that expectation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Fail(">>>>> REMOVE ME <<< TESTING MUST-GATHER")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/logforwarding/lokistack/forward_to_lokistack_test.go` at line 136,
The unconditional Fail(">>>>> REMOVE ME <<< TESTING MUST-GATHER") call in the It
block prevents the test from ever passing, even when the forwarding
functionality works correctly. Remove this Fail statement entirely from the test
flow, or if temporary debug output is needed for development purposes, move it
outside of the mergeable test code path behind explicit debug-only conditional
logic that would not execute in normal test runs.

Source: Coding guidelines

@jcantrill

Copy link
Copy Markdown
Contributor Author

/test images

@jcantrill

Copy link
Copy Markdown
Contributor Author

/test e2e-target

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@must-gather/internal/client/client.go`:
- Around line 190-193: Fail closed in WriteResourceToFile by propagating Secret
redaction failures instead of silently continuing. Update sanitizeResource to
return (runtime.Object, error), handle and return any errors from
unstructured.NestedMap and unstructured.SetNestedMap, and make
WriteResourceToFile stop and bubble up the error before marshaling when
redaction cannot be completed. Use the existing Client methods sanitizeResource
and WriteResourceToFile to locate and update the flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 65f1a653-1489-4578-a7ef-7017c1435f62

📥 Commits

Reviewing files that changed from the base of the PR and between 4db50b6 and a329c5c.

📒 Files selected for processing (1)
  • must-gather/internal/client/client.go

Comment on lines +190 to +193
func (c *Client) WriteResourceToFile(resource runtime.Object, destPath api.Path) error {
// Sanitize secrets before writing
sanitizedResource := c.sanitizeResource(resource)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify Secret sanitization does not discard NestedMap/SetNestedMap errors.
# Expected after the fix:
# - No `NestedMap(..., _)` discarded-error pattern in sanitizeResource.
# - Every `SetNestedMap` call is guarded by error handling.

sed -n '190,241p' must-gather/internal/client/client.go

rg -n 'NestedMap\(.*_, found, _|SetNestedMap\(' must-gather/internal/client/client.go

Repository: openshift/cluster-logging-operator

Length of output: 1987


Fail closed when Secret redaction cannot be applied.

Lines 213 and 225 discard unstructured.NestedMap accessor errors, and lines 221 and 233 ignore unstructured.SetNestedMap return values. If either fails, WriteResourceToFile still marshals the deep copy with original Secret data intact. Change sanitizeResource to return (runtime.Object, error) and propagate errors up the call stack.

🛡️ Proposed fail-closed fix
 func (c *Client) WriteResourceToFile(resource runtime.Object, destPath api.Path) error {
 	// Sanitize secrets before writing
-	sanitizedResource := c.sanitizeResource(resource)
+	sanitizedResource, err := c.sanitizeResource(resource)
+	if err != nil {
+		return fmt.Errorf("failed to sanitize resource: %w", err)
+	}
 
 	// Marshal to YAML
 	yamlBytes, err := yaml.Marshal(sanitizedResource)
 	if err != nil {
 		return fmt.Errorf("failed to marshal resource to YAML: %w", err)
 	}
 
 	// Write to file (WriteFile creates parent directory automatically)
 	return destPath.WriteFile(yamlBytes)
 }
 
 // sanitizeResource redacts secret data values
-func (c *Client) sanitizeResource(resource runtime.Object) runtime.Object {
+func (c *Client) sanitizeResource(resource runtime.Object) (runtime.Object, error) {
 	// Check if this is a Secret
 	if unstructuredObj, ok := resource.(*unstructured.Unstructured); ok {
 		if unstructuredObj.GetKind() == "Secret" {
 			// Make a deep copy to avoid modifying the original
 			sanitized := unstructuredObj.DeepCopy()
 
 			// Redact all data values (base64 encoded)
-			if data, found, _ := unstructured.NestedMap(sanitized.Object, "data"); found {
+			if data, found, err := unstructured.NestedMap(sanitized.Object, "data"); err != nil {
+				return nil, fmt.Errorf("failed to read Secret data: %w", err)
+			} else if found {
 				for key, val := range data {
 					length := 0
 					if strVal, ok := val.(string); ok {
 						length = len(strVal)
 					}
 					data[key] = fmt.Sprintf("REDACTED length=%d", length)
 				}
-				unstructured.SetNestedMap(sanitized.Object, data, "data")
+				if err := unstructured.SetNestedMap(sanitized.Object, data, "data"); err != nil {
+					return nil, fmt.Errorf("failed to redact Secret data: %w", err)
+				}
 			}
 
 			// Redact all stringData values (plain text)
-			if stringData, found, _ := unstructured.NestedMap(sanitized.Object, "stringData"); found {
+			if stringData, found, err := unstructured.NestedMap(sanitized.Object, "stringData"); err != nil {
+				return nil, fmt.Errorf("failed to read Secret stringData: %w", err)
+			} else if found {
 				for key, val := range stringData {
 					length := 0
 					if strVal, ok := val.(string); ok {
 						length = len(strVal)
 					}
 					stringData[key] = fmt.Sprintf("REDACTED length=%d", length)
 				}
-				unstructured.SetNestedMap(sanitized.Object, stringData, "stringData")
+				if err := unstructured.SetNestedMap(sanitized.Object, stringData, "stringData"); err != nil {
+					return nil, fmt.Errorf("failed to redact Secret stringData: %w", err)
+				}
 			}
 
-			return sanitized
+			return sanitized, nil
 		}
 	}
 
-	return resource
+	return resource, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (c *Client) WriteResourceToFile(resource runtime.Object, destPath api.Path) error {
// Sanitize secrets before writing
sanitizedResource := c.sanitizeResource(resource)
func (c *Client) WriteResourceToFile(resource runtime.Object, destPath api.Path) error {
// Sanitize secrets before writing
sanitizedResource, err := c.sanitizeResource(resource)
if err != nil {
return fmt.Errorf("failed to sanitize resource: %w", err)
}
// Marshal to YAML
yamlBytes, err := yaml.Marshal(sanitizedResource)
if err != nil {
return fmt.Errorf("failed to marshal resource to YAML: %w", err)
}
// Write to file (WriteFile creates parent directory automatically)
return destPath.WriteFile(yamlBytes)
}
// sanitizeResource redacts secret data values
func (c *Client) sanitizeResource(resource runtime.Object) (runtime.Object, error) {
// Check if this is a Secret
if unstructuredObj, ok := resource.(*unstructured.Unstructured); ok {
if unstructuredObj.GetKind() == "Secret" {
// Make a deep copy to avoid modifying the original
sanitized := unstructuredObj.DeepCopy()
// Redact all data values (base64 encoded)
if data, found, err := unstructured.NestedMap(sanitized.Object, "data"); err != nil {
return nil, fmt.Errorf("failed to read Secret data: %w", err)
} else if found {
for key, val := range data {
length := 0
if strVal, ok := val.(string); ok {
length = len(strVal)
}
data[key] = fmt.Sprintf("REDACTED length=%d", length)
}
if err := unstructured.SetNestedMap(sanitized.Object, data, "data"); err != nil {
return nil, fmt.Errorf("failed to redact Secret data: %w", err)
}
}
// Redact all stringData values (plain text)
if stringData, found, err := unstructured.NestedMap(sanitized.Object, "stringData"); err != nil {
return nil, fmt.Errorf("failed to read Secret stringData: %w", err)
} else if found {
for key, val := range stringData {
length := 0
if strVal, ok := val.(string); ok {
length = len(strVal)
}
stringData[key] = fmt.Sprintf("REDACTED length=%d", length)
}
if err := unstructured.SetNestedMap(sanitized.Object, stringData, "stringData"); err != nil {
return nil, fmt.Errorf("failed to redact Secret stringData: %w", err)
}
}
return sanitized, nil
}
}
return resource, nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@must-gather/internal/client/client.go` around lines 190 - 193, Fail closed in
WriteResourceToFile by propagating Secret redaction failures instead of silently
continuing. Update sanitizeResource to return (runtime.Object, error), handle
and return any errors from unstructured.NestedMap and unstructured.SetNestedMap,
and make WriteResourceToFile stop and bubble up the error before marshaling when
redaction cannot be completed. Use the existing Client methods sanitizeResource
and WriteResourceToFile to locate and update the flow.

Source: Linters/SAST tools

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@jcantrill: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-using-bundle a329c5c link false /test e2e-using-bundle
ci/prow/e2e-target a329c5c link true /test e2e-target
ci/prow/functional-target a329c5c link true /test functional-target
ci/prow/lint a329c5c link true /test lint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. midstream/Dockerfile A Dockerfile.in sync is needed with midstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants