Skip to content

Combine python and native unwinder into single loop#210

Merged
gnurizen merged 3 commits into
mainfrom
py-native-2
Feb 24, 2026
Merged

Combine python and native unwinder into single loop#210
gnurizen merged 3 commits into
mainfrom
py-native-2

Conversation

@gnurizen
Copy link
Copy Markdown
Collaborator

@gnurizen gnurizen commented Feb 20, 2026

Python, especially pytorch programs can exhaust the tail call limit
by switching from python to native unwinders more than 29 times.
This happens because of eval/delegation/module patterns where one python
frame will be decorated with a couple native frames.

Here's a snippet:

python_function
do_call_core
_PyEval_EvalFrame
_PyEval_EvalFrame
method_vectorcall
_PyEval_EvalFrameDefault
_PyEval_Vector
_PyEval_Vector
_PyVectorcall_Call
_PyEval_EvalFrame
_PyFunction_Vectorcall
_PyFunction_Vectorcall
_PyObject_Call
_PyEval_Vector
_PyObject_VectorcallTstate
_PyObject_VectorcallTstate
PyObject_Call
_PyFunction_Vectorcall
method_vectorcall
method_vectorcall
python_function2

In order to unwind these stack successfully fold the native unwinder
into the python unwinder so at each frame a python or native frame
can be unwound.

With 9 frames per loop we can do up to 261 frames of interleaved python/native which exceeds our 256 MAX_FRAMES limit.

@gnurizen gnurizen marked this pull request as draft February 20, 2026 12:49
@gnurizen gnurizen force-pushed the py-native-2 branch 6 times, most recently from 2bb7f1b to 31b0bd5 Compare February 21, 2026 10:41
Move defines (STACK_DELTA_INVALID, STACK_DELTA_STOP,
NATIVE_FRAMES_PER_PROGRAM) and functions (push_native, bsearch_step,
get_stack_delta_map, get_stack_delta, unwind_register_address,
unwind_one_frame) from native_stack_trace.ebpf.c into
native_stack_trace.h so they can be reused by other eBPF programs.

Zero functional changes: stripping BTF metadata from the before/after
blobs produces identical binaries, confirming no generated code changed.
Python, especially pytorch programs can exhaust the tail call limit
by switching from python to native unwinders more than 29 times.
This happens because of eval/delegation patterns where one python
frame will be decorated with a couple native frames.

In order to unwind these stack successfully fold the native unwinder
into the python unwinder so at each frame a python or native frame
can be unwound.

Replace the separate walk_python_stack inner loop and outer
transition loop with a single switch-in-loop structure using
step_python and step_native helper functions. This reduces
tail call usage from one per batch to one per loop budget
exhaustion (PYTHON_NATIVE_LOOP_ITERS=9 iterations).

Move native unwinder map externs (exe_id_to_*_stack_deltas,
stack_delta_page_to_info, unwind_info_array) out of the
TESTING_COREDUMP guard in extmaps.h so python_tracer.ebpf.c
can include native_stack_trace.h.

- PYTHON_NATIVE_LOOP_ITERS=9 chosen to pass BPF verifier on
  5.4 kernels (ITERS=10 times out the verifier at >300s)
- On failed PyCodeObject read, push frame with code object
  address so the agent can try via /proc/pid/mem
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR combines the Python and native stack unwinders into a single loop to address tail call limit exhaustion in Python programs (particularly PyTorch). Previously, interleaved Python/native frames could exceed the 29 tail call limit due to frequent switches between unwinders. The new approach processes up to 9 frames (Python or native) per iteration, allowing up to 261 interleaved frames within the 256 MAX_FRAMES limit.

Changes:

  • Merged Python and native unwinding into a single loop within unwind_python, eliminating frequent tail calls
  • Refactored native unwinding code from native_stack_trace.ebpf.c into native_stack_trace.h for reuse
  • Enhanced debug output to include error codes for better diagnostics

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated no comments.

File Description
support/ebpf/python_tracer.ebpf.c Removed walk_python_stack, added step_python and step_native functions, implemented combined unwinding loop with 9 iterations per invocation, improved debug logging for PyCodeObject read failures
support/ebpf/native_stack_trace.h New header file containing native unwinding functions (push_native, get_stack_delta, unwind_one_frame, etc.) moved from .c file for reuse by Python tracer
support/ebpf/native_stack_trace.ebpf.c Replaced inline native unwinding code with include of native_stack_trace.h
support/ebpf/extmaps.h Moved stack delta and unwind info map declarations outside TESTING_COREDUMP conditional since they're now used by python_tracer.ebpf.c

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread support/ebpf/extmaps.h
Comment thread support/ebpf/python_tracer.ebpf.c Outdated
@gnurizen gnurizen merged commit 9e5a697 into main Feb 24, 2026
43 of 46 checks passed
@gnurizen gnurizen deleted the py-native-2 branch February 24, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants