You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Diagnose log injection smoke test flakiness instead of masking it
The `check raw file injection` test flakes across 11+ logging backend
variants. CI Visibility data shows the failure is bimodal — successful
runs complete in 3-9s, but failures sit at exactly 30s (the
PollingConditions timeout) with traceCount=0. Nothing in between. This
means the process either works or is totally broken — no amount of
timeout increase will help.
The current test is blind during the 30s wait — it just polls
traceCount with no diagnostics when the process crashes or hangs.
Changes:
- Add `waitForTraceCountAlive` that checks process liveness on every
poll iteration. If the process dies, it fails immediately with the
exit code, RC poll count, and last 20 lines of process output.
- On timeout, enrich the error with diagnostic state (process alive?,
traceCount, RC polls received, last 30 lines of output) so the next
CI failure tells us whether it's a crash, a hang, or a connectivity
issue.
- Reorder `waitForTraceCount(4)` before `waitFor` to confirm all
traces are delivered while the process is still alive.
- Assert `waitFor` return value for a clear error if the process hangs.
tag: no release notes
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
0 commit comments