Skip to content

Commit c68855d

Browse files
committed
pythongh-149202: Fix frame pointer unwinding on s390x and ARM
-fno-omit-frame-pointer is not enough to make every target walkable by the simple manual frame pointer unwinder. The helper used by test_frame_pointer_unwind used to assume the frame pointer named a two-word record where fp[0] was the previous frame pointer and fp[1] was the return address. That is only the generic layout used by some targets. This patch keeps that default, but moves the slots behind named offsets so architecture-specific layouts can describe where the backchain and return address really live. On s390x, GCC and Clang do not emit a usable backchain unless -mbackchain is also enabled. Without it, the unwinder stops at the current C frame and the test reports no Python frames. Once backchains are present, the helper must also stop at the current thread's known C stack bounds; otherwise it can follow the final backchain far enough to dereference an invalid frame and segfault. For Linux s390x backchain frames, the documented z/Architecture stack-frame layout saves r14, the return-address register, at byte offset 112 from the frame pointer, so read the return address from that named slot instead of fp[1]. The 112-byte offset comes from Linux's s390 debugging documentation: its Stack Frame Layout table shows z/Architecture backchain frames with the backchain at offset 0 and saved r14 of the caller function at offset 112: https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#stack-frame-layout This helper remains scoped to Linux s390x backchain frames. GNU SFrame's s390x notes state that the s390x ELF ABI does not generally mandate where RA and FP are saved, or whether they are saved at all: https://sourceware.org/binutils/docs/sframe-spec.html#s390x On 32-bit ARM, GCC defaults to Thumb mode on common armhf toolchains. The Thumb prologue keeps the saved frame pointer and link register at offsets that depend on the generated frame, which breaks the fp[0]/fp[1] walk used by the helper. Use -marm when it is supported for frame-pointer builds, and teach the helper the GCC ARM-mode slots where the previous frame pointer is at fp[-1] and the saved LR return address is at fp[0].
1 parent 3efd2f4 commit c68855d

7 files changed

Lines changed: 257 additions & 24 deletions

File tree

Doc/howto/perf_profiling.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,9 @@ How to obtain the best results
219219

220220
For best results, keep frame pointers enabled. On supported GCC-compatible
221221
toolchains, CPython builds itself with ``-fno-omit-frame-pointer`` and, when
222-
available, ``-mno-omit-leaf-frame-pointer`` by default. These flags allow
222+
available, ``-mno-omit-leaf-frame-pointer`` by default. On 32-bit ARM,
223+
CPython also adds ``-marm`` when supported. On s390 platforms, CPython also
224+
adds ``-mbackchain`` when supported. These flags allow
223225
profilers to unwind using only the frame pointer and not on DWARF debug
224226
information. This is because as the code that is interposed to allow ``perf``
225227
support is dynamically generated it doesn't have any DWARF debugging information

Doc/using/configure.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -784,9 +784,11 @@ also be used to improve performance.
784784

785785
Disable frame pointers, which are enabled by default (see :pep:`831`).
786786

787-
By default, the build appends ``-fno-omit-frame-pointer`` (and
788-
``-mno-omit-leaf-frame-pointer`` when the compiler supports it) to
789-
``BASECFLAGS`` so profilers, debuggers, and system tracing tools
787+
By default, the build appends ``-fno-omit-frame-pointer``,
788+
``-mno-omit-leaf-frame-pointer`` when the compiler supports it,
789+
``-marm`` on 32-bit ARM when supported, and ``-mbackchain`` on s390
790+
platforms when supported, to ``BASECFLAGS`` so
791+
profilers, debuggers, and system tracing tools
790792
(``perf``, ``eBPF``, ``dtrace``, ``gdb``) can walk the C call stack
791793
without DWARF metadata. The flags propagate to third-party C
792794
extensions through :mod:`sysconfig`. On compilers that do not

Doc/whatsnew/3.15.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2305,8 +2305,9 @@ Build changes
23052305
(:pep:`831`). Pass :option:`--without-frame-pointers` to opt out.
23062306
Authors of C extensions and native libraries built with custom build
23072307
systems should add ``-fno-omit-frame-pointer`` and
2308-
``-mno-omit-leaf-frame-pointer`` to their own ``CFLAGS`` to keep the
2309-
unwind chain intact.
2308+
``-mno-omit-leaf-frame-pointer`` to their own ``CFLAGS``,
2309+
``-marm`` on 32-bit ARM, and ``-mbackchain`` on s390 platforms,
2310+
to keep the unwind chain intact.
23102311
(Contributed by Pablo Galindo Salgado and Savannah Ostrowski in :gh:`149201`.)
23112312

23122313
.. _whatsnew315-windows-tail-calling-interpreter:
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
Enable frame pointers by default for GCC-compatible CPython builds, including
2-
``-mno-omit-leaf-frame-pointer`` when the compiler supports it, so profilers
3-
and debuggers can unwind native interpreter frames more reliably. Users can pass
2+
``-mno-omit-leaf-frame-pointer``, ``-marm`` on 32-bit ARM, and ``-mbackchain``
3+
on s390 platforms when the compiler supports them, so profilers and debuggers
4+
can unwind native interpreter frames more reliably. Users can pass
45
``--without-frame-pointers`` to opt out.

Modules/_testinternalcapi.c

Lines changed: 133 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,40 @@
5959

6060
static const uintptr_t min_frame_pointer_addr = 0x1000;
6161

62+
#ifdef __s390x__
63+
// Linux's s390 "Stack Frame Layout" table documents that z/Architecture
64+
// backchain frames start with the backchain at offset 0 and store "saved r14
65+
// of caller function" at offset 112. The same document's register table
66+
// identifies r14 as the return-address register, so this backchain unwinder
67+
// reads the return address from fp + 112.
68+
// https://www.kernel.org/doc/html/v5.3/s390/debugging390.html#stack-frame-layout
69+
//
70+
// This is only for Linux s390x backchain frames. The s390x ELF ABI does not
71+
// generally mandate where RA and FP are saved, or whether they are saved at all.
72+
// https://sourceware.org/binutils/docs/sframe-spec.html#s390x
73+
# define S390X_FRAME_RETURN_ADDRESS_OFFSET 112
74+
#endif
75+
76+
// The generic manual unwinder treats the frame pointer as a two-word record:
77+
// fp[0] is the previous frame pointer and fp[1] is the return address. That is
78+
// not true for every architecture, even with frame pointers enabled, so these
79+
// offsets describe the actual slots used by each supported frame layout.
80+
#if defined(__arm__) && !defined(__thumb__) && !defined(__clang__)
81+
// GCC ARM mode keeps the caller's fp one word below fp and the saved LR at
82+
// fp[0], so the return address is not in the generic fp[1] slot.
83+
# define FRAME_POINTER_NEXT_OFFSET (-1)
84+
# define FRAME_POINTER_RETURN_OFFSET 0
85+
#elif defined(__s390x__)
86+
// s390x backchain frames keep the previous frame pointer at fp[0], but save the
87+
// return-address register in the ABI register save area rather than fp[1].
88+
# define FRAME_POINTER_NEXT_OFFSET 0
89+
# define FRAME_POINTER_RETURN_OFFSET \
90+
(S390X_FRAME_RETURN_ADDRESS_OFFSET / (Py_ssize_t)sizeof(uintptr_t))
91+
#else
92+
# define FRAME_POINTER_NEXT_OFFSET 0
93+
# define FRAME_POINTER_RETURN_OFFSET 1
94+
#endif
95+
6296

6397
static PyObject *
6498
_get_current_module(void)
@@ -325,16 +359,97 @@ get_jit_backend(PyObject *self, PyObject *Py_UNUSED(args))
325359
#endif
326360
}
327361

362+
static int
363+
stack_address_is_valid(uintptr_t addr, uintptr_t stack_min, uintptr_t stack_max)
364+
{
365+
if (addr < min_frame_pointer_addr) {
366+
return 0;
367+
}
368+
if (stack_min != 0 && (addr < stack_min || addr >= stack_max)) {
369+
return 0;
370+
}
371+
return 1;
372+
}
373+
374+
static int
375+
frame_pointer_slot_is_valid(uintptr_t *frame_pointer, Py_ssize_t offset,
376+
uintptr_t stack_min, uintptr_t stack_max)
377+
{
378+
uintptr_t fp_addr = (uintptr_t)frame_pointer;
379+
uintptr_t slot_addr;
380+
uintptr_t delta = (uintptr_t)Py_ABS(offset) * sizeof(uintptr_t);
381+
if (offset < 0) {
382+
if (fp_addr < delta) {
383+
return 0;
384+
}
385+
slot_addr = fp_addr - delta;
386+
}
387+
else {
388+
if (fp_addr > UINTPTR_MAX - delta) {
389+
return 0;
390+
}
391+
slot_addr = fp_addr + delta;
392+
}
393+
if (!stack_address_is_valid(slot_addr, stack_min, stack_max)) {
394+
return 0;
395+
}
396+
if (stack_max != 0) {
397+
if (slot_addr > UINTPTR_MAX - sizeof(uintptr_t)) {
398+
return 0;
399+
}
400+
if (slot_addr + sizeof(uintptr_t) > stack_max) {
401+
return 0;
402+
}
403+
}
404+
return 1;
405+
}
406+
407+
static int
408+
next_frame_pointer_is_valid(uintptr_t *frame_pointer, uintptr_t *next_fp,
409+
uintptr_t stack_min, uintptr_t stack_max)
410+
{
411+
uintptr_t fp_addr = (uintptr_t)frame_pointer;
412+
uintptr_t next_addr = (uintptr_t)next_fp;
413+
if (!stack_address_is_valid(next_addr, stack_min, stack_max)) {
414+
return 0;
415+
}
416+
if ((next_addr % sizeof(uintptr_t)) != 0) {
417+
return 0;
418+
}
419+
#if _Py_STACK_GROWS_DOWN
420+
return next_addr > fp_addr;
421+
#else
422+
return next_addr < fp_addr;
423+
#endif
424+
}
425+
328426
static PyObject *
329427
manual_unwind_from_fp(uintptr_t *frame_pointer)
330428
{
331429
Py_ssize_t max_depth = 200;
332-
int stack_grows_down = _Py_STACK_GROWS_DOWN;
430+
uintptr_t stack_min = 0;
431+
uintptr_t stack_max = 0;
432+
433+
#ifdef __s390x__
434+
Py_BUILD_ASSERT(S390X_FRAME_RETURN_ADDRESS_OFFSET % sizeof(uintptr_t) == 0);
435+
#endif
333436

334437
if (frame_pointer == NULL) {
335438
return PyList_New(0);
336439
}
337440

441+
PyThreadState *tstate = _PyThreadState_GET();
442+
if (tstate != NULL) {
443+
_PyThreadStateImpl *tstate_impl = (_PyThreadStateImpl *)tstate;
444+
#if _Py_STACK_GROWS_DOWN
445+
stack_min = tstate_impl->c_stack_hard_limit;
446+
stack_max = tstate_impl->c_stack_top;
447+
#else
448+
stack_min = tstate_impl->c_stack_top;
449+
stack_max = tstate_impl->c_stack_hard_limit;
450+
#endif
451+
}
452+
338453
PyObject *result = PyList_New(0);
339454
if (result == NULL) {
340455
return NULL;
@@ -348,7 +463,21 @@ manual_unwind_from_fp(uintptr_t *frame_pointer)
348463
if ((fp_addr % sizeof(uintptr_t)) != 0) {
349464
break;
350465
}
351-
uintptr_t return_addr = frame_pointer[1];
466+
if (!stack_address_is_valid(fp_addr, stack_min, stack_max)) {
467+
break;
468+
}
469+
if (!frame_pointer_slot_is_valid(frame_pointer,
470+
FRAME_POINTER_NEXT_OFFSET,
471+
stack_min, stack_max)) {
472+
break;
473+
}
474+
if (!frame_pointer_slot_is_valid(frame_pointer,
475+
FRAME_POINTER_RETURN_OFFSET,
476+
stack_min, stack_max)) {
477+
break;
478+
}
479+
uintptr_t *next_fp = (uintptr_t *)frame_pointer[FRAME_POINTER_NEXT_OFFSET];
480+
uintptr_t return_addr = frame_pointer[FRAME_POINTER_RETURN_OFFSET];
352481

353482
PyObject *addr_obj = PyLong_FromUnsignedLongLong(return_addr);
354483
if (addr_obj == NULL) {
@@ -362,22 +491,10 @@ manual_unwind_from_fp(uintptr_t *frame_pointer)
362491
}
363492
Py_DECREF(addr_obj);
364493

365-
uintptr_t *next_fp = (uintptr_t *)frame_pointer[0];
366-
// Stop if the frame pointer is extremely low.
367-
if ((uintptr_t)next_fp < min_frame_pointer_addr) {
494+
if (!next_frame_pointer_is_valid(frame_pointer, next_fp,
495+
stack_min, stack_max)) {
368496
break;
369497
}
370-
uintptr_t next_addr = (uintptr_t)next_fp;
371-
if (stack_grows_down) {
372-
if (next_addr <= fp_addr) {
373-
break;
374-
}
375-
}
376-
else {
377-
if (next_addr >= fp_addr) {
378-
break;
379-
}
380-
}
381498
frame_pointer = next_fp;
382499
}
383500

configure

Lines changed: 100 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

configure.ac

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2548,6 +2548,16 @@ AS_VAR_IF([ac_cv_gcc_compat], [yes], [
25482548
AX_CHECK_COMPILE_FLAG([-mno-omit-leaf-frame-pointer], [
25492549
frame_pointer_cflags="$frame_pointer_cflags -mno-omit-leaf-frame-pointer"
25502550
], [], [-Werror])
2551+
AS_CASE([$host_cpu], [arm|armv*], [
2552+
AX_CHECK_COMPILE_FLAG([-marm], [
2553+
frame_pointer_cflags="$frame_pointer_cflags -marm"
2554+
], [], [-Werror])
2555+
])
2556+
AS_CASE([$host_cpu], [s390*], [
2557+
AX_CHECK_COMPILE_FLAG([-mbackchain], [
2558+
frame_pointer_cflags="$frame_pointer_cflags -mbackchain"
2559+
], [], [-Werror])
2560+
])
25512561
], [], [-Werror])
25522562
if test -n "$frame_pointer_cflags" && test "x$with_frame_pointers" != xno; then
25532563
BASECFLAGS="$frame_pointer_cflags $BASECFLAGS"

0 commit comments

Comments
 (0)