You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Stale UDS socket and SHM region recovery was gated on run-directory
permissions: automatic unlink was refused unless the run dir was owned
by the process euid and not group/other-writable. netdata's systemd
unit ships RuntimeDirectoryMode=0775, so on every standard install a
crashed server could never reclaim its endpoint on restart — the
service stayed AddrInUse until manual cleanup or reboot. The guard
came from a CodeQL TOCTOU alert and assumed hostile writers in a
shared directory; the run dir is the embedding service's private
runtime directory, so that threat model does not apply.
Liveness is now the only criterion, in C, Rust, and Go, for both
transports: a live server's endpoint (SHM owner PID alive with
non-zero generation; UDS connect probe succeeds) is never deleted and
a second server gets address-in-use. Everything else found at an
endpoint name — dead server leftovers, foreign files, symlinks,
unreadable entries, empty directories — is silently reclaimed so the
service starts. The one exception: fd exhaustion (EMFILE/ENFILE)
during the liveness check keeps the entry, since liveness could not
be evaluated and deleting could remove a live endpoint.
Descriptor-relative fstatat/unlinkat cleanup mechanics are kept.
Tests updated across all three languages, with new coverage for
crash recovery in 0775 run dirs and foreign-file reclaim; the public
specs and the integrator skill describe the new contract. Tracked as
SOW-0018; supersedes the SOW-0017 test-side umask workaround.
0 commit comments