Skip to content

fix: log sanitized error details when cluster secret unmarshal fails (#27148)#27159

Open
officialasishkumar wants to merge 2 commits into
argoproj:masterfrom
officialasishkumar:fix/log-cluster-secret-unmarshal-error
Open

fix: log sanitized error details when cluster secret unmarshal fails (#27148)#27159
officialasishkumar wants to merge 2 commits into
argoproj:masterfrom
officialasishkumar:fix/log-cluster-secret-unmarshal-error

Conversation

@officialasishkumar
Copy link
Copy Markdown

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Title of the PR
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

Summary

When SecretToCluster fails, the error was logged without any diagnostic details, making it difficult for operators to debug why a cluster secret could not be parsed.

A prior attempt (#27151) to fix this by logging the raw error was correctly rejected because json.Unmarshal errors can contain fragments of the data being parsed, which in this case is sensitive cluster configuration (TLS certs, tokens, credentials).

This PR introduces sanitizeUnmarshalError, which uses errors.As to extract only safe diagnostic information:

  • *json.SyntaxError: logs the byte offset (e.g., json syntax error at byte offset 42)
  • *json.UnmarshalTypeError: logs the field name (e.g., cannot unmarshal into field "tlsClientConfig.insecure")
  • Unknown error types: logs a generic message (invalid cluster secret data)

This gives operators enough context to diagnose the issue (e.g., "the JSON is malformed at byte 42" or "the field type doesn't match") without any risk of leaking secret data into application logs.

All five call sites in cluster.go that previously discarded the error have been updated.

Example log output

Before:

level=error msg="could not unmarshal cluster secret my-cluster"

After:

level=error msg="could not unmarshal cluster secret my-cluster: json syntax error at byte offset 2"

Validation

  • New unit tests for sanitizeUnmarshalError covering all three error classification paths (syntax error, type error, unknown) plus wrapped errors.
  • New integration test for ListClusters verifying that invalid secrets are skipped while valid ones are still returned.
  • Full util/db test suite passes.

Closes #27148

When SecretToCluster fails, the error was logged without any diagnostic
details, making it difficult for operators to debug why a cluster secret
could not be parsed. A prior attempt to log the raw error was rejected
because json.Unmarshal errors can contain fragments of the data being
parsed, which in this case is sensitive cluster configuration.

This change introduces sanitizeUnmarshalError which extracts only safe
diagnostic information from the error: byte offset for syntax errors,
and field name for type mismatch errors. This gives operators enough
context to diagnose the issue without risk of leaking secret data into
application logs.

All five call sites that previously discarded the error details now
include the sanitized message.

Closes argoproj#27148

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
@officialasishkumar officialasishkumar requested a review from a team as a code owner April 4, 2026 09:39
Copilot AI review requested due to automatic review settings April 4, 2026 09:39
@bunnyshell
Copy link
Copy Markdown

bunnyshell Bot commented Apr 4, 2026

❌ Preview Environment undeployed from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds safe diagnostic logging for cluster secret JSON unmarshal failures so operators can troubleshoot parsing issues without risking leakage of sensitive secret contents.

Changes:

  • Introduces sanitizeUnmarshalError(err) to extract safe details from encoding/json unmarshal errors (syntax offset / type error field / generic fallback).
  • Updates cluster secret unmarshal error logs in ListClusters, WatchClusters, and UpdateCluster to include sanitized details.
  • Adds unit tests for sanitizeUnmarshalError and a test ensuring ListClusters skips invalid cluster secrets while returning valid ones.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
util/db/cluster.go Adds sanitization helper and uses it in cluster secret unmarshal error logs.
util/db/cluster_test.go Adds tests for sanitization behavior and for skipping invalid cluster secrets in ListClusters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread util/db/cluster_test.go
Comment on lines +182 to +186
t.Run("json type error", func(t *testing.T) {
err := &json.UnmarshalTypeError{Field: "tlsClientConfig.insecure"}
msg := sanitizeUnmarshalError(err)
assert.Equal(t, `cannot unmarshal into field "tlsClientConfig.insecure"`, msg)
})
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected string in this assertion is a raw string literal (backticks), so the " sequences are not unescaped. This currently asserts for backslashes that won’t be present in sanitizeUnmarshalError output (which uses %q and produces plain double quotes). Use a normal string literal with escaped quotes, or keep the raw string but remove the backslashes so the expected value matches the actual message.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the string literal concern. Switched from a raw string literal to a regular double-quoted string with escaped quotes so the intent is unambiguous and consistent with the other test cases in this block.

Changed the test expectation from a raw string literal with escaped
quotes to a regular string literal, so the assertion matches the
actual output from sanitizeUnmarshalError which uses fmt.Sprintf %q.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 4, 2026

Codecov Report

❌ Patch coverage is 73.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.36%. Comparing base (68cbd05) to head (4f513f1).
⚠️ Report is 285 commits behind head on master.

Files with missing lines Patch % Lines
util/db/cluster.go 73.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #27159      +/-   ##
==========================================
- Coverage   63.36%   63.36%   -0.01%     
==========================================
  Files         415      415              
  Lines       56551    56561      +10     
==========================================
+ Hits        35836    35842       +6     
- Misses      17336    17344       +8     
+ Partials     3379     3375       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Log error when unmarshaling cluster secret fails

2 participants