Skip to content

Commit d442722

Browse files
authored
feat(sidecar): add thread mode as fallback connection for restricted environments (#3573)
* feat(tracer): add configuration for connection mode Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat(tracer): add sidecar thread listener module Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat(tracer): implement threaded connection fallback Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * fix: compilation error Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * fix(tracer): remove debug logs Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat(tracer): add connection mode and function thread mode tests Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat(tracer): fix fork with thread mode Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * apply feedbacks Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * feat(sidecar): support threaded connection for windows Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * chore: apply feedbacks Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * chore: regenerate supported-configurations.json for DD_TRACE_SIDECAR_CONNECTION_MODE * fix(tests): remove diagnostics-with-body from replayDebuggerData() filter With newer libdatadog (29e00628b), debugger logs are routed to /debugger/v2/input instead of /debugger/v1/diagnostics. The filter in replayDebuggerData() still included /debugger/v1/diagnostics with body (added in c3be656 for older routing), which caused it to return a diagnostics response before the actual /debugger/v2/input snapshot, breaking debugger_span_decoration_probe.phpt. Remove the diagnostics clause — only /debugger/v1/input and /debugger/v2/input are snapshot/log endpoints. * fix(tests): update debugger_span_decoration_probe for libdatadog fallback routing libdatadog 29e00628b routes logs to /debugger/v2/input but falls back to /debugger/v1/diagnostics when the agent doesn't support v2 (as is the case with the request replayer in tests). Update the expected URI in debugger_span_decoration_probe.phpt accordingly. Also restore the /debugger/v1/diagnostics filter in replayDebuggerData() which was incorrectly removed in the previous commit. * fix(sidecar): fallback to own master when parent thread listener unavailable In ddtrace_sidecar_setup_thread_mode, a forked child detecting is_child_process=true would try to connect to the parent's thread listener. If the parent used subprocess mode (no thread listener), the connect would fail and the child returned with no sidecar and no fallback. Mirror the existing fallback logic from ddtrace_sidecar_handle_fork: when the parent's listener is unavailable, reset ddtrace_sidecar_master_pid to the current process and fall through to start a new master listener in this process. * fix(sidecar): promote orphaned child to master in thread mode reconnect path When a parent process initializes the sidecar in thread mode, forks, and then exits, the child inherits a broken transport (parent's listener thread is dead). In dd_sidecar_connect(), if ddog_sidecar_connect_worker() fails and current_pid != master_pid, promote the child to master so it can still submit traces. The existing fallback in ddtrace_sidecar_setup_thread_mode covers the initial-setup path, but the reconnect path (ddtrace_sidecar_connect_callback -> dd_sidecar_connect) had no equivalent fallback, causing a silent failure for orphaned children that already had an inherited transport. Add a .phpt test that verifies the orphaned child can create and submit spans after the parent exits. * fix(sidecar): encode master uid in thread-mode socket name for setuid compatibility Thread-mode sockets now include the master's effective uid in the filename (libdd.<ver>@<uid>-<pid>.sock in /tmp/libdatadog/). A worker process that later drops privileges via setuid() (e.g. www-data under PHP-FPM) still computes the same socket path as the master listener, and ensure_dir_exists now best-effort chmods the directory world-writable to allow socket creation by any user. Also fixes a double-dot bug in socket/lock path construction (Rust >=1.87 no longer strips leading dots from with_extension arguments). Adds test: sidecar_thread_mode_permissions.phpt verifies the socket is created with the correct uid-pid encoding. * chore: update libdatadog submodule (fix Windows compilation) * fix(sidecar): thread mode uses abstract socket, fix ASAN warnings Update libdatadog submodule: thread mode sidecar now uses abstract Unix sockets on Linux (no filesystem permissions needed, any user can connect) and a single-threaded Tokio runtime (no extra OS threads, fixes LSan "Running thread was not suspended" ASAN warnings at process exit). Update sidecar_thread_mode_permissions.phpt to verify abstract socket usage (no filesystem socket created) instead of checking file permissions. * chore: bump cbindgen Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> * fix(test): suppress LSan leak detection in orphan fork test After fork, the child inherits the parent's heap including the Tokio current_thread runtime allocations and Rust stdlib once-cell inits from the sidecar thread. Since threads don't survive fork, these are orphaned allocations that LSan incorrectly reports as leaks. This is the same reason daemonize() already sets LSAN_OPTIONS=detect_leaks=0 for the subprocess sidecar. * fix(sidecar): set SHM open mode to 0644 when master runs as root When the PHP-FPM master process runs as root, the sidecar thread (thread mode) creates named SHM objects with 0600 by default, making them inaccessible to worker processes running as www-data. Call ddog_sidecar_set_shm_open_mode(0644) before starting the master sidecar listener when geteuid()==0, so workers can open the SHM regions read-only. The mode is set in both ddtrace_sidecar_setup_thread_mode() and ddtrace_sidecar_minit() to cover all code paths. * revert: remove --allow-to-run-as-root from php-fpm test helper * fix(sidecar): remove set_shm_open_mode, use fchown approach for cross-user SHM Replace the incorrect set_shm_open_mode(0644) workaround with the proper fchown-based fix implemented in the libdatadog submodule. The SHM ownership is now transferred to the worker's UID (obtained via SO_PEERCRED on first connection) rather than relaxing file permissions. - Remove ddog_sidecar_set_shm_open_mode() calls from thread mode setup - Remove declaration from components-rs/sidecar.h - Update libdatadog submodule to pick up the fchown implementation * fix(sidecar): prevent worker processes from starting their own listener thread In thread mode with PHP-FPM, dd_activate_once (called via pthread_once on first RINIT) runs independently in each worker process since the master never serves requests. When subprocess mode fails and thread mode is attempted, each worker hit the fallback path in ddtrace_sidecar_setup_thread_mode() that called ddog_sidecar_connect_master() — starting a new listener thread per worker. The master listener must only be started in MINIT (via ddtrace_sidecar_minit()) in the master process, so it survives PHP-FPM forking. When a worker cannot connect to the master's listener, it now logs a warning and runs without the sidecar instead of spawning its own thread. Non-child processes (master, CLI) retain the ability to start a new listener as a fallback, preserving the behavior requested in earlier review feedback. * test(sidecar): add FPM root+user-switch test for thread mode cross-user SHM Adds a test that exercises the fchown() cross-user SHM path: PHP-FPM master runs as root, workers switch to an unprivileged user (www-data/daemon/nobody), and thread mode is used. This is the exact scenario that motivated the SO_PEERCRED + fchown() fix. Infrastructure changes: - www.conf: add {{user_group}} placeholder for optional user/group directives - PhpFpm.php: accept $fpmUser/$fpmGroup constructor params - WebServer.php: add setPhpFpmUser() and pass it through to PhpFpm - WebFrameworkTestCase.php: add configureWebServer() hook (called before start()) so subclasses can apply extra server config without reimplementing the full setUpWebServer() logic New test (SidecarThreadModeRootTest): - Skips if not root, not fpm-fcgi SAPI, or no unprivileged user available - testTracesAreSubmittedWithRootMasterAndUnprivilegedWorker: a single request through a root-master/www-data-worker FPM pool must produce traces — failure means the worker cannot access the SHM after fchown() - testMultipleWorkersShareSingleMasterListenerThread: 3 requests with multiple workers must all succeed, ensuring the per-worker-thread regression is caught * fix(tests): remove void return type for PHP 7.0 compatibility `void` return type hint was introduced in PHP 7.1; PHP 7.0 CI jobs were failing with TypeError on every WebFrameworkTestCase subclass. * test(sidecar): force fpm-fcgi mode in root test to run in all CI jobs Remove the DD_TRACE_TEST_SAPI=fpm-fcgi skip condition and instead call putenv() to force FPM mode before the parent sets up the web server. This allows the test to run in every test_web_custom matrix entry (cli-server, cgi-fcgi, apache2handler) rather than only when the CI job explicitly sets fpm-fcgi SAPI. * test(sidecar): fix flaky multi-worker trace assertion Use untilNumberOfTraces(3) so the test agent waits for all 3 traces before collecting, instead of returning early after the first one. Also restore the >= 3 assertion which is now reliable. * test(sidecar): allow root FPM test to run via sudo in CI Add $runAsSudo param to PhpFpm and setPhpFpmSudo() to WebServer so php-fpm can be started as root even when the test runner is non-root. The test now skips only if neither root nor passwordless sudo is available, allowing it to run in CI where circleci has NOPASSWD sudo. * fix(tests): avoid global env pollution from putenv in SidecarThreadModeRootTest Replace putenv('DD_TRACE_TEST_SAPI=fpm-fcgi') with a new WebServer::setForceSapi() method that overrides the SAPI for a single WebServer instance only, without affecting the global process environment used by all subsequent test classes. * test(sidecar): skip root FPM test under apache2handler SAPI Apache2handler uses a static shared instance on the same port as our FPM+Nginx setup, causing port conflicts. Skip the test in that case; it still runs under cli-server and cgi-fcgi CI jobs. * chore: update one pipeline Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com> --------- Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>
1 parent 17c83a1 commit d442722

26 files changed

Lines changed: 1036 additions & 635 deletions

.gitlab/generate-tracer.php

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -560,6 +560,9 @@ function before_script_steps($with_docker_auth = false) {
560560
foreach ($type_jobs as $target => $versions):
561561
foreach ($versions as $major_minor):
562562
$sapis = $type == "web" && version_compare($major_minor, "7.2", ">=") ? ["cli-server", "cgi-fcgi", "apache2handler"] : [""];
563+
if ($target == "test_web_custom" && in_array("cli-server", $sapis)) {
564+
$sapis[] = "fpm-fcgi";
565+
}
563566
foreach ($sapis as $sapi):
564567
?>
565568
"<?= $target ?>: [<?= $major_minor, $sapi ? ", $sapi" : "" ?>]":

.gitlab/one-pipeline.locked.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# DO NOT EDIT THIS FILE MANUALLY
22
# This file is auto-generated by automation.
33
include:
4-
- remote: https://gitlab-templates.ddbuild.io/libdatadog/one-pipeline/ca/fbfa24e9dd887ed24ce65e71f2e41562c809f40cfc26489705b32406de7e096f/one-pipeline.yml
4+
- remote: https://gitlab-templates.ddbuild.io/libdatadog/one-pipeline/ca/71efdd408f3ca243b65866e112903c3b1f8dd49449a59b348440da6c20b1c3c0/one-pipeline.yml

0 commit comments

Comments
 (0)