Skip to content

Commit 59d9551

Browse files
authored
[AGENTRUN-1240] fix(loader): retry unix.Poll on EINTR instead of starting trace-agent (#49943)
## What does this PR do? When `unix.Poll` is interrupted by a signal, it returns `EINTR` (interrupted system call). This is not a real error — the signal did not kill the process, so it's safe and correct to retry the poll. Previously, any `EINTR` from `poll(2)` caused the loader to immediately exec the trace-agent, spiking its PSS even when no traces were arriving. This was the root cause of the quality gate PSS failure in incident-53566. ## Changes - On `EINTR`, log a warning and retry the poll loop - Promote the non-EINTR poll error from `Warn` to `Error` (it's an unexpected failure) ## Testing Verified the binary builds cleanly. The fix is behaviorally equivalent to the `SA_RESTART` convention for signal-interrupted syscalls. Co-authored-by: pierre.gimalac <pierre.gimalac@datadoghq.com>
1 parent 48038ec commit 59d9551

1 file changed

Lines changed: 6 additions & 1 deletion

File tree

cmd/loader/main_nix.go

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,12 @@ func main() {
136136
for {
137137
n, err := unix.Poll(pollfds, -1)
138138
if err != nil {
139-
log.Warnf("error while polling: %v", err)
139+
if err == unix.EINTR {
140+
// EINTR means a signal interrupted the syscall; this is not an error, retry
141+
log.Warnf("Polling interrupted by signal, retrying...")
142+
continue
143+
}
144+
log.Errorf("error while polling: %v", err)
140145
break
141146
}
142147

0 commit comments

Comments
 (0)