Skip to content

fix(shared-runtime): guard shutdown() against Tokio TLS destruction#2169

Open
rachelyangdog wants to merge 3 commits into
mainfrom
rachel.yang/fix-shared-runtime-tls-shutdown-panic
Open

fix(shared-runtime): guard shutdown() against Tokio TLS destruction#2169
rachelyangdog wants to merge 3 commits into
mainfrom
rachel.yang/fix-shared-runtime-tls-shutdown-panic

Conversation

@rachelyangdog

Copy link
Copy Markdown
Contributor

During CPython interpreter finalization, thread-local storage is destroyed before atexit handlers fire. SharedRuntime::shutdown() calls runtime.block_on() which internally calls context::enter() to set up Tokio's CONTEXT thread-local. If that TLS slot is already destroyed, context::enter() panics with "The Tokio context thread-local variable has been destroyed", which PyO3 converts to a pyo3_runtime.PanicException. This causes a crash on every uWSGI worker shutdown when using ddtrace >=4.9.x.

Fix: check Handle::try_current().is_thread_local_destroyed() before calling block_on(). If TLS is gone, return Ok(()) early — the OS will clean up remaining Tokio threads on process exit. This eliminates both the panic and the subsequent 60s hang/SIGKILL.

Reproducer: uWSGI app with lazy-apps=true, ddtrace imported via uwsgi import=, 4 workers. SIGTERM triggers the panic on every worker.

What does this PR do?

A brief description of the change being made with this pull request.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Describe here in detail how the change can be validated.

…uring CPython finalization

During CPython interpreter finalization, thread-local storage is destroyed
before atexit handlers fire. SharedRuntime::shutdown() calls runtime.block_on()
which internally calls context::enter() to set up Tokio's CONTEXT thread-local.
If that TLS slot is already destroyed, context::enter() panics with
"The Tokio context thread-local variable has been destroyed", which PyO3
converts to a pyo3_runtime.PanicException. This causes a crash on every uWSGI
worker shutdown when using ddtrace >=4.9.x.

Fix: check Handle::try_current().is_thread_local_destroyed() before calling
block_on(). If TLS is gone, return Ok(()) early — the OS will clean up
remaining Tokio threads on process exit. This eliminates both the panic and
the subsequent 60s hang/SIGKILL.

Reproducer: uWSGI app with lazy-apps=true, ddtrace imported via uwsgi import=,
4 workers. SIGTERM triggers the panic on every worker.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rachelyangdog rachelyangdog requested a review from a team as a code owner June 26, 2026 16:14

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5b9a9050b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread libdd-shared-runtime/src/shared_runtime/mod.rs Outdated
@dd-octo-sts

dd-octo-sts Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 85.14 MB 85.14 MB +0% (+5.03 KB) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.82 MB 7.82 MB +0% (+24 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.51 MB 10.51 MB +0% (+528 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 96.27 MB 96.28 MB +0% (+4.59 KB) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 25.14 MB 25.14 MB +0% (+1.00 KB) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 88.04 KB 88.04 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 183.36 MB 183.37 MB +0% (+8.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 938.86 MB 938.87 MB +0% (+12.33 KB) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.22 MB 8.23 MB +.05% (+4.50 KB) 🔍
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 88.04 KB 88.04 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 24.30 MB 24.31 MB +.03% (+8.00 KB) 🔍
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 48.47 MB 48.47 MB +.01% (+6.58 KB) 🔍
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.79 MB 21.79 MB +0% (+1.00 KB) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 89.42 KB 89.42 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 187.40 MB 187.39 MB -0% (-8.00 KB) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 927.44 MB 927.45 MB +0% (+11.93 KB) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.35 MB 6.35 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 89.42 KB 89.42 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 26.09 MB 26.09 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 46.11 MB 46.12 MB +.01% (+6.39 KB) 🔍
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 75.88 MB 75.89 MB +0% (+4.41 KB) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.70 MB 8.70 MB +0% (+32 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 91.34 MB 91.35 MB +0% (+4.53 KB) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.59 MB 10.59 MB +.04% (+4.39 KB) 🔍

… refactor

Resolves merge conflict with main, which refactored SharedRuntime into
separate ForkSafeRuntime, BasicRuntime, and LocalRuntime types. The TLS
destruction guard (Handle::try_current().is_thread_local_destroyed()) is
now applied to ForkSafeRuntime::shutdown() in fork_safe.rs, where the
block_on call lives after the refactor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 188 documentation warning(s) found

📦 libdd-shared-runtime - 188 warning(s)


Updated: 2026-06-29 15:47:14 UTC | Commit: a61095f | missing-docs job results

@github-actions

Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/rachel.yang/fix-shared-runtime-tls-shutdown-panic

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 22 22 No change (0%)
datadog-live-debugger 4 4 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-sidecar 45 45 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 6 6 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-remote-config 3 3 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 11 11 No change (0%)
Total 182 182 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@datadog-official

datadog-official Bot commented Jun 29, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

pr-name | pr_name_lint   View in Datadog   GitHub Actions

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 82.35%
Overall Coverage: 73.96% (-0.04%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1742952 | Docs | Datadog PR Page | Give us feedback!

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 5 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-shared-runtime - 5 error(s)

Show output
error[unsound]: Unsoundness in `Error::downcast_mut()`
  ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:2:1
  │
2 │ anyhow 1.0.93 registry+https://github.com/rust-lang/crates.io-index
  │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
  │
  ├ ID: RUSTSEC-2026-0190
  ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0190
  ├ Affected versions of this crate violate borrow rules, resulting in undefined behavior, when the user adds context to an error via `Error::context` and then later calls `Error::downcast_mut` on the returned `Error`.
    
    The flaw was corrected in commit `6e8c000` by revising how the mutable reference is constructed, avoiding inclusion of a shared reference in the resulting borrow chain.
    
    ## Example
    
    ```rust
    use anyhow::Error;
    use std::fmt;
    
    #[derive(Debug)]
    struct ErrorContext(&'static str);
    
    impl fmt::Display for ErrorContext {
        fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
            fmt::Display::fmt(&self.0, f)
        }
    }
    
    fn main() {
        let mut error = Error::msg("inner error").context(ErrorContext("old context"));
        let context: &mut ErrorContext = error.downcast_mut().unwrap();
        context.0 = "new context";
        println!("{:?}", error);
    }
    ```
    
    ## Miri output
    
    ```
    error: Undefined Behavior: trying to retag from <1538> for Unique permission at alloc602[0x38], but that tag only grants SharedReadOnly permission for this location
       --> src/ptr.rs:170:18
        |
    170 |         unsafe { &mut *self.ptr.as_ptr() }
        |                  ^^^^^^^^^^^^^^^^^^^^^^^ this error occurs as part of retag at alloc602[0x38..0x48]
        |
        = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
        = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
    help: <1538> was created by a SharedReadOnly retag at offsets [0x38..0x48]
       --> src/ptr.rs:89:18
        |
     89 |             ptr: NonNull::from(ptr),
        |                  ^^^^^^^^^^^^^^^^^^
        = note: stack backtrace:
                0: anyhow::ptr::Mut::<'_, ErrorContext>::deref_mut
                    at src/ptr.rs:170:18: 170:41
                1: anyhow::error::<impl anyhow::Error>::downcast_mut::<ErrorContext>
                    at src/error.rs:560:18: 560:46
                2: main
                    at examples/downcast_mut.rs:15:38: 15:58
    ```
  ├ Announcement: https://github.com/dtolnay/anyhow/issues/451
  ├ Solution: Upgrade to >=1.0.103 (try `cargo update -p anyhow`)
  ├ anyhow v1.0.93
    ├── libdd-capabilities v2.0.0
    │   ├── libdd-capabilities-impl v2.0.0
    │   │   └── libdd-shared-runtime v1.0.0
    │   └── libdd-shared-runtime v1.0.0 (*)
    └── libdd-common v5.0.0
        ├── libdd-capabilities-impl v2.0.0 (*)
        └── libdd-shared-runtime v1.0.0 (*)

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:73:1
   │
73 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
   │
   ├ ID: RUSTSEC-2026-0097
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
   ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
     
     - The `log` and `thread_rng` features are enabled
     - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
     - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
     - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
     - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
     
     `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
   ├ Announcement: https://github.com/rust-random/rand/pull/1763
   ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
   ├ rand v0.8.5
     └── (dev) libdd-common v5.0.0
         ├── libdd-capabilities-impl v2.0.0
         │   └── libdd-shared-runtime v1.0.0
         └── libdd-shared-runtime v1.0.0 (*)

error[vulnerability]: Name constraints for URI names were incorrectly accepted
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:86:1
   │
86 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0098
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
   ├ Name constraints for URI names were ignored and therefore accepted.
     
     Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
     
     Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
     
     This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
   ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
   ├ rustls-webpki v0.103.10
     ├── rustls v0.23.37
     │   ├── hyper-rustls v0.27.7
     │   │   └── libdd-common v5.0.0
     │   │       ├── libdd-capabilities-impl v2.0.0
     │   │       │   └── libdd-shared-runtime v1.0.0
     │   │       └── libdd-shared-runtime v1.0.0 (*)
     │   ├── libdd-common v5.0.0 (*)
     │   ├── rustls-platform-verifier v0.6.2
     │   │   └── libdd-common v5.0.0 (*)
     │   └── tokio-rustls v0.26.0
     │       ├── hyper-rustls v0.27.7 (*)
     │       └── libdd-common v5.0.0 (*)
     └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:86:1
   │
86 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0099
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
   ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
     
     This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
     This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
     
     Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
     
     This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
   ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
   ├ rustls-webpki v0.103.10
     ├── rustls v0.23.37
     │   ├── hyper-rustls v0.27.7
     │   │   └── libdd-common v5.0.0
     │   │       ├── libdd-capabilities-impl v2.0.0
     │   │       │   └── libdd-shared-runtime v1.0.0
     │   │       └── libdd-shared-runtime v1.0.0 (*)
     │   ├── libdd-common v5.0.0 (*)
     │   ├── rustls-platform-verifier v0.6.2
     │   │   └── libdd-common v5.0.0 (*)
     │   └── tokio-rustls v0.26.0
     │       ├── hyper-rustls v0.27.7 (*)
     │       └── libdd-common v5.0.0 (*)
     └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:86:1
   │
86 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0104
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
   ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
     or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
     `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
     
     This panic is reachable prior to a CRL's signature being verified.
     
     Applications that do not use CRLs are not affected.
     
     Thank you to @tynus3 for the report.
   ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
   ├ rustls-webpki v0.103.10
     ├── rustls v0.23.37
     │   ├── hyper-rustls v0.27.7
     │   │   └── libdd-common v5.0.0
     │   │       ├── libdd-capabilities-impl v2.0.0
     │   │       │   └── libdd-shared-runtime v1.0.0
     │   │       └── libdd-shared-runtime v1.0.0 (*)
     │   ├── libdd-common v5.0.0 (*)
     │   ├── rustls-platform-verifier v0.6.2
     │   │   └── libdd-common v5.0.0 (*)
     │   └── tokio-rustls v0.26.0
     │       ├── hyper-rustls v0.27.7 (*)
     │       └── libdd-common v5.0.0 (*)
     └── rustls-platform-verifier v0.6.2 (*)

advisories FAILED, bans ok, sources ok

Updated: 2026-06-29 15:47:13 UTC | Commit: a61095f | dependency-check job results

@rachelyangdog rachelyangdog changed the title fix(shared-runtime): guard shutdown() against Tokio TLS destruction fix(shared-runtime): guard shutdown() against Tokio TLS destruction Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant