Skip to content

Commit f780a7b

Browse files
co-cydwoz
authored andcommitted
Log exception class+message on token-read errors, traceback at DEBUG
Address review feedback from @bdrx312 on PR #69074: the original `log.warning` calls in the SaltDeserializationError and OSError handlers in `LoadAuth.get_tok` did not capture the exception at all, making troubleshooting harder. Two combined changes: 1. Add the exception class+message inline via `%r` so each occurrence is one greppable line that names which subclass (e.g. `ConnectionResetError`) and its message produced the warning -- enough to triage without a full traceback. 2. Add a companion `log.debug(..., exc_info=True)` per @bdrx312's suggestion. Operators who need the full traceback for a specific intermittent failure can raise the level to DEBUG and see it. Costs nothing at the default WARNING level because logging skips exc_info formatting when the level is disabled. This avoids GBs/hour of stack frames during a flapping Redis or NFS outage (the original `exc_info=True` approach considered earlier) while keeping full diagnostic depth one log-level change away. Refs: #69073
1 parent 7c4d9a1 commit f780a7b

1 file changed

Lines changed: 17 additions & 5 deletions

File tree

salt/auth/__init__.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -243,31 +243,43 @@ def get_tok(self, tok):
243243
tdata = self.tokens["{}.get_token".format(self.opts["eauth_tokens"])](
244244
self.opts, tok
245245
)
246-
except salt.exceptions.SaltDeserializationError:
246+
except salt.exceptions.SaltDeserializationError as exc:
247247
# The on-disk / in-store token blob is corrupt and cannot
248248
# be parsed. Removing it is the right call -- a corrupt
249249
# token can never authenticate anyway, and leaving it
250250
# around makes every subsequent ``get_tok`` for the same
251-
# id keep failing.
251+
# id keep failing. ``%r`` on the exception gives the
252+
# operator the class and message inline (e.g. msgpack
253+
# format error, truncated file) without spamming a full
254+
# traceback into a hot-path WARNING; the full traceback is
255+
# available via the companion ``log.debug`` for deeper
256+
# investigation.
252257
log.warning(
253-
"Token %r could not be deserialized; removing it from the store.",
258+
"Token %r could not be deserialized (%r); removing it from the store.",
254259
tok,
260+
exc,
255261
)
262+
log.debug("Token deserialization traceback:", exc_info=True)
256263
self.rm_token(tok)
257264
return {}
258-
except OSError:
265+
except OSError as exc:
259266
# Transient backend error (Redis connection blip, NFS hang,
260267
# hung disk). The token itself is fine; do NOT delete it --
261268
# that would log every authenticated user out on every
262269
# backend hiccup. Return an empty dict so the caller treats
263270
# this request as not-authenticated; the next request will
264271
# retry against the backend and succeed once it recovers.
272+
# Same logging pattern as above -- exception class + message
273+
# at WARNING, full traceback at DEBUG so a flapping deploy
274+
# stays diagnoseable without GB/hour of stack frames.
265275
log.warning(
266-
"Token store transient error reading %r; treating as "
276+
"Token store transient error reading %r (%r); treating as "
267277
"not-authenticated for this request without removing the "
268278
"token from the store.",
269279
tok,
280+
exc,
270281
)
282+
log.debug("Token store transient-error traceback:", exc_info=True)
271283
return {}
272284

273285
if not tdata:

0 commit comments

Comments
 (0)