Skip to content

Commit 241709d

Browse files
dpinolDani Pinyol
andauthored
fix crashes using multithreaded julia from python (#758)
* fix: rare crash due to race condition The vectors PYJLVALUES and PYJLFREEVALUES are modified without any protection against GC-triggered re-entrancy. _pyjl_dealloc(o1) or PyJuliaValue_SetValue └→ push!(PYJLFREEVALUES, idx) # begins resize (sets internal flag) └→ vector must grow → allocates → triggers Julia GC └→ GC collects a Py object → runs py_finalizer └→ enqueue(ptr) → PyGILState_Check()==1 (same thread holds GIL) └→ Py_DecRef(ptr) → refcount hits 0 └→ _pyjl_dealloc(o2) └→ push!(PYJLFREEVALUES, idx2) ← BOOM: flag already set ConcurrencyViolationError! 1. push! needs to grow the vector (exceeds capacity), so it allocates 2. The allocation triggers Julia GC, which runs finalizers 3. A finalizer calls Py_DecRef (because enqueue sees the GIL is held on this same thread), dropping refcount to 0, which triggers _pyjl_dealloc re-entrantly on the same vector This is especially likely during shutdown (at_jl_exit → jl_atexit_hook), because jl_gc_run_all_finalizers runs finalizers on the calling thread (the main thread, which holds the GIL). Many Py objects are finalized at once, repeatedly pushing to PYJLFREEVALUES, and each push that triggers a reallocation can cause the re-entrant chain. Disabled Julia's GC during the critical vector modifications, so that push!/pop! cannot trigger allocation-based GC, which prevents the finalizer chain from re-entering. Added a SpinLock as defense-in-depth against true multi-thread races. * fix: no GC disable. avoid resize PYJLFREEVALUES in dealloc --------- Co-authored-by: Dani Pinyol <dani@avatarcognition.com>
1 parent b26f202 commit 241709d

1 file changed

Lines changed: 22 additions & 4 deletions

File tree

src/JlWrap/C.jl

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,14 @@ const PyJuliaBase_Type = Ref(C.PyNULL)
2323
# we store the actual julia values here
2424
# the `value` field of `PyJuliaValueObject` indexes into here
2525
const PYJLVALUES = []
26-
# unused indices in PYJLVALUES
26+
# unused indices in PYJLVALUES; kept the same length as PYJLVALUES (extra slots
27+
# are pushed alongside PYJLVALUES so the dealloc/finalizer path never has to
28+
# resize this vector). Only the first PYJLFREEVALUECOUNT[] entries are valid
29+
# free indices; the rest are placeholders.
2730
const PYJLFREEVALUES = Int[]
31+
const PYJLFREEVALUECOUNT = Ref(0)
32+
# lock protecting PYJLVALUES, PYJLFREEVALUES and PYJLFREEVALUECOUNT
33+
const PYJL_LOCK = Threads.SpinLock()
2834

2935
function _pyjl_new(t::C.PyPtr, ::C.PyPtr, ::C.PyPtr)
3036
alloc = C.PyType_GetSlot(t, C.Py_tp_alloc)
@@ -39,8 +45,14 @@ end
3945
function _pyjl_dealloc(o::C.PyPtr)
4046
idx = C.@ft UnsafePtr{PyJuliaValueObject}(o).value[]
4147
if idx != 0
48+
# No allocation in this critical section: PYJLFREEVALUES has been pre-sized
49+
# to match PYJLVALUES by PyJuliaValue_SetValue, so recording a free index is
50+
# just a write to an existing slot plus a counter bump. This avoids the GC-triggered
51+
lock(PYJL_LOCK)
4252
PYJLVALUES[idx] = nothing
43-
push!(PYJLFREEVALUES, idx)
53+
PYJLFREEVALUECOUNT[] += 1
54+
PYJLFREEVALUES[PYJLFREEVALUECOUNT[]] = idx
55+
unlock(PYJL_LOCK)
4456
end
4557
(C.@ft UnsafePtr{PyJuliaValueObject}(o).weaklist[!]) == C.PyNULL || C.PyObject_ClearWeakRefs(o)
4658
freeptr = C.PyType_GetSlot(C.Py_Type(o), C.Py_tp_free)
@@ -375,13 +387,19 @@ PyJuliaValue_SetValue(_o, @nospecialize(v)) = Base.GC.@preserve _o begin
375387
o = C.asptr(_o)
376388
idx = C.@ft UnsafePtr{PyJuliaValueObject}(o).value[]
377389
if idx == 0
378-
if isempty(PYJLFREEVALUES)
390+
lock(PYJL_LOCK)
391+
if PYJLFREEVALUECOUNT[] == 0
392+
# Grow both vectors together so length(PYJLFREEVALUES) == length(PYJLVALUES)
393+
# is preserved. The 0 in PYJLFREEVALUES is just a placeholder
379394
push!(PYJLVALUES, v)
395+
push!(PYJLFREEVALUES, 0)
380396
idx = length(PYJLVALUES)
381397
else
382-
idx = pop!(PYJLFREEVALUES)
398+
idx = PYJLFREEVALUES[PYJLFREEVALUECOUNT[]]
399+
PYJLFREEVALUECOUNT[] -= 1
383400
PYJLVALUES[idx] = v
384401
end
402+
unlock(PYJL_LOCK)
385403
C.@ft UnsafePtr{PyJuliaValueObject}(o).value[] = idx
386404
else
387405
PYJLVALUES[idx] = v

0 commit comments

Comments
 (0)