Skip to content

Commit 79a769a

Browse files
Tombanaclaude
andauthored
[mypyc] Fix reference leak when setting unboxed refcounted attrs (#21657)
I ran into this memory leak which can reproduced with: ```python from dataclasses import dataclass @DataClass class MyClass: v: int c = MyClass(1 << 70) ``` This PR adds a fix and a test that fails without the fix. I am no expert on mypyc internals, so I asked Claude Code to fix this bug. It is a one-line change so I believe it will be easy to review. From quick inspection it seems that the rest of the code has fewer comments than what Claude wrote, so if you prefer, I'll remove the verbose comment that this PR adds. Similarly for the test code, it can be shortened if wanted. <details> <summary>Long explanation from Claude code</summary> ### Description A native attribute setter generated by mypyc over-increfs the stored value when the attribute has a **refcounted unboxed type** — most importantly `int` (`CPyTagged`), and also tuples with refcounted items. For an unboxed type, `generate_setter` emitted: ```c tmp = CPyTagged_FromObject(value); // already creates a NEW (owned) reference CPyTagged_INCREF(tmp); // ...and then takes a second one self->_v = tmp; ``` CPyTagged_FromObject already increfs in the heap-boxed case, so the setter takes two references while the deallocator releases only one — leaking one reference on every set through the setter. The other two branches of generate_setter (object and the emit_cast path) are correct because they produce a borrowed value and rely on the single emit_inc_ref to take ownership; only the unboxed branch was inconsistent. Why this shows up with dataclasses This is reached whenever an attribute is set from interpreted code. The clearest real-world case is the __init__ that the stdlib dataclasses module synthesizes for a mypyc-compiled @DataClass: its self.v = v runs as interpreted code and goes through the generated descriptor setter. So every constructed instance of a compiled dataclass with a heap-boxed int field (value ≥ 2**62) leaked one PyLong — silent, unbounded growth in long-lived programs that build many such objects (e.g. 64-bit ids). It does not depend on slots, frozen, eq, field count, or field position; a hand-written native __init__ is unaffected because it stores via SetAttr rather than the descriptor. Small (inline-tagged) ints, floats, and object-typed fields are also unaffected since they aren't refcounted through this path. Reproducer ```python from dataclasses import dataclass import sys @DataClass class C: v: int big = 1 << 70 # heap-boxed int (>= 2**62), so it is refcounted base = sys.getrefcount(big) xs = [C(big) for _ in range(1000)] del xs print(sys.getrefcount(big) - base) # before: 1000 after: 0 ``` Fix Unbox with borrow=True in the setter's unboxed branch, so all three branches of generate_setter produce a borrowed value and the single emit_inc_ref takes exactly one owned reference. borrow=True is a no-op for non-refcounted unboxed types (float, fixed-width ints, bool) and is propagated correctly through RTuple unboxing, which fixes the analogous leak for tuple-typed attributes too. Tests Added testNativeAttrSetterRefcountLeak to mypyc/test-data/run-classes.test, covering a boxed-int dataclass field, an unboxed Tuple[int, int] field, and setter re-assignment. It fails before the fix (expected 100 live refs, got 200) and passes after. </details> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 629f456 commit 79a769a

2 files changed

Lines changed: 67 additions & 1 deletion

File tree

mypyc/codegen/emitclass.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1211,7 +1211,15 @@ def generate_setter(cl: ClassIR, attr: str, rtype: RType, emitter: Emitter) -> N
12111211
emitter.emit_line("if (value != NULL) {")
12121212

12131213
if rtype.is_unboxed:
1214-
emitter.emit_unbox("value", "tmp", rtype, error=ReturnHandler("-1"), declare_dest=True)
1214+
# Borrow the unboxed value: emit_inc_ref below takes the single owned
1215+
# reference, matching the borrowed-then-incref pattern of the other two
1216+
# branches. Without borrow=True, emit_unbox already creates a new
1217+
# reference for refcounted unboxed types (e.g. CPyTagged boxed ints,
1218+
# tuples with refcounted fields), so the emit_inc_ref would double the
1219+
# reference and leak the stored value on every set via this setter.
1220+
emitter.emit_unbox(
1221+
"value", "tmp", rtype, error=ReturnHandler("-1"), declare_dest=True, borrow=True
1222+
)
12151223
elif is_same_type(rtype, object_rprimitive):
12161224
emitter.emit_line("PyObject *tmp = value;")
12171225
else:

mypyc/test-data/run-classes.test

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6006,3 +6006,61 @@ for _ in range(100):
60066006
check(foo)
60076007
after = sys.getrefcount(foo.obj)
60086008
assert after - init == 0, f"Leaked {after - init} refs"
6009+
6010+
[case testNativeAttrSetterRefcountLeak]
6011+
# Setting a native attribute from interpreted code goes through the generated
6012+
# getset descriptor setter. For refcounted unboxed types (heap-boxed ints,
6013+
# tuples with refcounted items) the setter must take exactly one reference to
6014+
# the stored value, not two.
6015+
from dataclasses import dataclass
6016+
from typing import Tuple
6017+
6018+
@dataclass
6019+
class IntField:
6020+
v: int
6021+
6022+
@dataclass
6023+
class TupleField:
6024+
v: Tuple[int, int]
6025+
6026+
[file driver.py]
6027+
import sys
6028+
from native import IntField, TupleField
6029+
6030+
# A heap-boxed int (>= 2**62) is stored as a refcounted PyObject*, unlike small
6031+
# inline-tagged ints, so an over-incref strands a real reference. Compute the
6032+
# values at runtime (not as folded literals): on free-threaded builds code
6033+
# constants are immortal, and getrefcount could not observe a leak on them.
6034+
shift = 70
6035+
BIG = 1 << shift
6036+
6037+
def check_no_leak(make, value) -> None:
6038+
base = sys.getrefcount(value)
6039+
objs = [make(value) for _ in range(100)]
6040+
alive = sys.getrefcount(value)
6041+
# Each live instance must hold exactly one reference to the field value.
6042+
assert alive - base == 100, f"expected 100 live refs, got {alive - base}"
6043+
del objs
6044+
after = sys.getrefcount(value)
6045+
assert after == base, f"leaked {after - base} refs"
6046+
6047+
# The dataclass-generated __init__ stores self.v = v via the descriptor setter.
6048+
check_no_leak(IntField, BIG)
6049+
6050+
# Tuple[int, int] is stored as an unboxed RTuple; its boxed-int elements are
6051+
# refcounted, so the setter must not over-incref them either. Use the same int
6052+
# in both slots so each instance holds exactly two references to it.
6053+
ELEM = 1 << (shift + 1)
6054+
base = sys.getrefcount(ELEM)
6055+
objs = [TupleField((ELEM, ELEM)) for _ in range(100)]
6056+
alive = sys.getrefcount(ELEM)
6057+
assert alive - base == 200, f"expected 200 live refs, got {alive - base}"
6058+
del objs
6059+
assert sys.getrefcount(ELEM) == base, f"tuple field leaked {sys.getrefcount(ELEM) - base} refs"
6060+
6061+
# Re-assigning through the setter must release the previous value too.
6062+
o = IntField(BIG)
6063+
base = sys.getrefcount(BIG)
6064+
o.v = BIG
6065+
o.v = BIG
6066+
assert sys.getrefcount(BIG) == base, "reassignment leaked refs"

0 commit comments

Comments
 (0)