Combine python and native unwinder into single loop#210
Merged
Conversation
2bb7f1b to
31b0bd5
Compare
Move defines (STACK_DELTA_INVALID, STACK_DELTA_STOP, NATIVE_FRAMES_PER_PROGRAM) and functions (push_native, bsearch_step, get_stack_delta_map, get_stack_delta, unwind_register_address, unwind_one_frame) from native_stack_trace.ebpf.c into native_stack_trace.h so they can be reused by other eBPF programs. Zero functional changes: stripping BTF metadata from the before/after blobs produces identical binaries, confirming no generated code changed.
Python, especially pytorch programs can exhaust the tail call limit by switching from python to native unwinders more than 29 times. This happens because of eval/delegation patterns where one python frame will be decorated with a couple native frames. In order to unwind these stack successfully fold the native unwinder into the python unwinder so at each frame a python or native frame can be unwound. Replace the separate walk_python_stack inner loop and outer transition loop with a single switch-in-loop structure using step_python and step_native helper functions. This reduces tail call usage from one per batch to one per loop budget exhaustion (PYTHON_NATIVE_LOOP_ITERS=9 iterations). Move native unwinder map externs (exe_id_to_*_stack_deltas, stack_delta_page_to_info, unwind_info_array) out of the TESTING_COREDUMP guard in extmaps.h so python_tracer.ebpf.c can include native_stack_trace.h. - PYTHON_NATIVE_LOOP_ITERS=9 chosen to pass BPF verifier on 5.4 kernels (ITERS=10 times out the verifier at >300s) - On failed PyCodeObject read, push frame with code object address so the agent can try via /proc/pid/mem
31b0bd5 to
a76177b
Compare
There was a problem hiding this comment.
Pull request overview
This PR combines the Python and native stack unwinders into a single loop to address tail call limit exhaustion in Python programs (particularly PyTorch). Previously, interleaved Python/native frames could exceed the 29 tail call limit due to frequent switches between unwinders. The new approach processes up to 9 frames (Python or native) per iteration, allowing up to 261 interleaved frames within the 256 MAX_FRAMES limit.
Changes:
- Merged Python and native unwinding into a single loop within
unwind_python, eliminating frequent tail calls - Refactored native unwinding code from
native_stack_trace.ebpf.cintonative_stack_trace.hfor reuse - Enhanced debug output to include error codes for better diagnostics
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| support/ebpf/python_tracer.ebpf.c | Removed walk_python_stack, added step_python and step_native functions, implemented combined unwinding loop with 9 iterations per invocation, improved debug logging for PyCodeObject read failures |
| support/ebpf/native_stack_trace.h | New header file containing native unwinding functions (push_native, get_stack_delta, unwind_one_frame, etc.) moved from .c file for reuse by Python tracer |
| support/ebpf/native_stack_trace.ebpf.c | Replaced inline native unwinding code with include of native_stack_trace.h |
| support/ebpf/extmaps.h | Moved stack delta and unwind info map declarations outside TESTING_COREDUMP conditional since they're now used by python_tracer.ebpf.c |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
umanwizard
approved these changes
Feb 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Python, especially pytorch programs can exhaust the tail call limit
by switching from python to native unwinders more than 29 times.
This happens because of eval/delegation/module patterns where one python
frame will be decorated with a couple native frames.
Here's a snippet:
In order to unwind these stack successfully fold the native unwinder
into the python unwinder so at each frame a python or native frame
can be unwound.
With 9 frames per loop we can do up to 261 frames of interleaved python/native which exceeds our 256 MAX_FRAMES limit.