Skip to content

ExecSolib strategy: make ddtrace.so directly executable#3711

Open
cataphract wants to merge 25 commits intomasterfrom
glopes/exec-solib
Open

ExecSolib strategy: make ddtrace.so directly executable#3711
cataphract wants to merge 25 commits intomasterfrom
glopes/exec-solib

Conversation

@cataphract
Copy link
Copy Markdown
Contributor

@cataphract cataphract commented Mar 19, 2026

Introduce the ExecSolib spawn strategy by embedding an ELF entry point (_dd_solib_start) into ddtrace.so itself.

ddtrace.so becomes pie executable and runs without the dynamic linker. After self-relocation:

  • it loads the trampoline into memory, but doesn't execute it yet
  • it copies ddtrace.so (/proc/self/exe) into a memfd and massages it so its loading by the dynamic linker without php doesn't fail.
  • replaces in the command line all occurrences to ddtrace.so to /proc/self/fd/, so that dlopen from the trampoline will use the massaged version
  • loads the dynamic linker into memory
  • jumps into the dynamic linker after massaging the auxiliary vector so that ld.so executes the loaded trampoline

tested on glibc and musl on linux aarch64

Description

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@cataphract cataphract requested review from a team as code owners March 19, 2026 17:44
@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 bot commented Mar 20, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 60.65% (-0.03%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 574e9ce | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.78%. Comparing base (10064f5) to head (3442a78).
⚠️ Report is 51 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3711   +/-   ##
=======================================
  Coverage   68.78%   68.78%           
=======================================
  Files         166      166           
  Lines       19015    19015           
  Branches     1792     1792           
=======================================
  Hits        13079    13079           
  Misses       5124     5124           
  Partials      812      812           
Flag Coverage Δ
helper-rust-integration 78.82% <ø> (ø)
helper-rust-unit 49.36% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10064f5...3442a78. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Introduce the ExecSolib spawn strategy by embedding an ELF entry point
(_dd_solib_start) into ddtrace.so itself.
But the import can't be declared hidden. In the end the symbol will be
in the got but linker to emits a RELATIVE reloc (not GLOB_DAT) -- so
should work with our self-relocation.
cataphract and others added 2 commits March 20, 2026 12:29
The x86-64 inline asm restoring the kernel stack and jumping to ld.so:

    "mov %[sp], %%rsp\n"
    "xor %%edx, %%edx\n"   // required: rdx = 0 for ld.so startup ABI
    "jmpq *%[entry]\n"

GCC at -O0 allocated %[entry] (ldso_entry) to rdx, causing the xor to
zero the jump target before the jmpq executed → SIGSEGV at address 0x0
on every x86-64 ExecSolib launch.

The fix is to pin ldso_entry to rax via the "a" constraint.  Using the
"rdx" clobber alone is not sufficient: GCC is permitted to allocate
input operands into clobbered registers because inputs are consumed
before the asm fires.  A specific register constraint ("a" = rax) is
the correct and optimization-safe solution.

With the fix, GCC emits:
    mov  %rcx, %rsp    ; stack_top in rcx (or any non-rax "r")
    xor  %edx, %edx    ; zero rdx (harmless: entry is in rax)
    jmpq *%rax         ; jump to ldso_entry

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 20, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-04-09 17:16:43

Comparing candidate commit 574e9ce in PR branch glopes/exec-solib with baseline commit 14dabf8 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 192 metrics, 2 unstable metrics.

@cataphract cataphract force-pushed the glopes/exec-solib branch 2 times, most recently from 76bb66d to 4e0b02e Compare March 21, 2026 12:44
@cataphract cataphract force-pushed the glopes/exec-solib branch 2 times, most recently from 4cffacf to 9293858 Compare March 22, 2026 02:16
@cataphract cataphract force-pushed the glopes/exec-solib branch 2 times, most recently from 6e1977d to 09acb37 Compare March 24, 2026 16:55
cataphract and others added 7 commits March 25, 2026 02:19
…n detection

Update libdatadog submodule to fix container ID extraction when running under
Podman with cgroupns=host. The container cgroup path includes a /container
subdirectory after the .scope suffix (e.g.
0::/machine.slice/libpod-HEXID.scope/container), which the previous regex
did not handle. This caused origin detection to fail: no entity ID was sent
to the agent, so container tags were missing from APM traces.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cataphract cataphract requested a review from a team as a code owner April 9, 2026 09:56
cataphract and others added 2 commits April 9, 2026 13:35
The trampoline binary embedded in ddtrace.so was produced as ET_EXEC
(non-PIE) by toolchains that don't default to -fPIE (e.g. devtoolset-7
on CentOS 7).  elf_load_trampoline accepted only ET_DYN and used
mmap(NULL) to pick a random load base — an ET_EXEC binary loaded that
way crashes because its absolute virtual addresses no longer match.

Two-pronged fix:
1. libdatadog/spawn_worker/build.rs: add -fPIE/-pie on Linux so the
   trampoline is always ET_DYN, matching the original design intent.
2. solib_bootstrap.c: add a __builtin_trap() guard after the ET_DYN
   check so a mis-built ET_EXEC trampoline aborts loudly instead of
   silently misbehaving.

Fixes "failed to map trampoline" (exit 121) on bookworm-slim for
PHP 8.3-8.5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants