Skip to content

fix(profiling): clean up containers post-fork [backport 4.5]#17726

Open
KowalskiThomas wants to merge 1 commit into4.5from
kowalski/backport-17042-to-4.5
Open

fix(profiling): clean up containers post-fork [backport 4.5]#17726
KowalskiThomas wants to merge 1 commit into4.5from
kowalski/backport-17042-to-4.5

Conversation

@KowalskiThomas
Copy link
Copy Markdown
Contributor

Backport of PR #17042 (commit 39a5f23) to branch 4.5.

Description

Fixes a crash in the Profiler (~100/week) when fork is called while the Sampling Thread is actively modifying the LRUCache for Frames.

The root cause: postfork_child called frame_cache_.clear(), which traverses the std::list to free nodes. If the sampling thread was mid-operation (splice in lookup, emplace_front/pop_back in store) at fork time, the list's internal pointers can be in a corrupted state in the child, causing the crash.

Fix: replace clear() with postfork_child() which uses placement new to construct fresh empty containers, abandoning the old data as an intentional one-time leak.

Also adds prefork()/postfork_parent() atfork handlers registered via pthread_atfork so that restart_after_fork uses a pre-saved was_running_at_fork_ flag instead of relying on thread_seq_num parity (which prefork itself changes).

Notes on conflict resolution

The 4.5 branch uses a _stack_ function-name prefix (with leading underscore) for its internal atfork helpers, unlike main/4.6/4.7 which use stack_. The new _stack_atfork_prepare and _stack_atfork_parent functions follow the same _stack_ convention.

Fixes a crash in the Profiler when fork is called while the Sampling Thread
is actively modifying the LRUCache for Frames. Instead of calling
std::list::clear() post-fork (which can crash on a corrupted list), use
placement new to reinitialise the containers, abandoning the old data as an
intentional one-time leak. Also adds prefork/postfork_parent handlers via
pthread_atfork so that restart_after_fork uses a pre-saved flag rather than
relying on thread_seq_num parity (which prefork itself changes).

Backport of PR #17042 to branch 4.5.
@KowalskiThomas KowalskiThomas requested review from a team as code owners April 24, 2026 15:16
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Codeowners resolved as

ddtrace/internal/datadog/profiling/stack/echion/echion/cache.h          @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack/echion/echion/echion_sampler.h  @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack/include/sampler.hpp            @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack/src/sampler.cpp                @DataDog/profiling-python
releasenotes/notes/fix-profiling-fork-crash-lru-cache-b80e6574fc304037.yaml  @DataDog/apm-python

@KowalskiThomas KowalskiThomas added the Profiling Continous Profling label Apr 24, 2026
Copy link
Copy Markdown
Contributor

@vlad-scherbich vlad-scherbich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

@KowalskiThomas KowalskiThomas enabled auto-merge (squash) April 24, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Profiling Continous Profling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants