Replies: 1 comment
-
|
Please don't post AI content here with no human input. What do you personally propose to do about these findings? Did you verify them? How do they relate to open PRs around free threading? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Free-Threading Analysis Report
Generated with https://github.com/devdanzin/ft-review-toolkit, in conjunction with Claude Code
Extension: cffi /
_cffi_backend(9 C files, ~12,900 lines)Main entry:
src/c/_cffi_backend.c(monolithic, includes all other.cfiles)Thread headers:
misc_thread_common.h,misc_thread_posix.h,misc_win32.hAgents run: shared-state-auditor · ft-history-analyzer · lock-discipline-checker ·
atomic-candidate-finder · unsafe-api-detector · stop-the-world-advisor
Migration Status
Status: CLOSE — cffi completed a substantial free-threading port (PR #178, 2025)
and has declared
Py_MOD_GIL_NOT_USEDfor both module entry points.Six actionable findings remain before the declaration is fully correct.
PyUnstable_Module_SetGIL(m, Py_MOD_GIL_NOT_USED)_cffi_backend.c:7944,cffi1_module.c:135suppressions_free_threading.txtPyMutex unique_cache_lock_cffi_backend.c:398unique_cachecffi_atomic_load/storemisc_thread_posix.h,misc_win32.h__ATOMIC_SEQ_CSTatomics for both POSIX and MSVCcffi_check_flag/cffi_set_flagmisc_thread_common.h:397–401CFFI_LOCK/CFFI_UNLOCKmisc_thread_common.h:410–411Py_BEGIN_CRITICAL_SECTION(&_dummy)#ifdef Py_GIL_DISABLEDpre-init looprealize_c_type.c:77–85all_primitives[]before parallel accesspytest-run-parallel/@thread_unsafemarkerstesting/conftest.pyExecutive Summary
Findings by Priority
RACE Findings (fix immediately) — 6
externpy->reserved1/reserved2: pre-GIL plain read + concurrent write-write racecall_python.c:249, 170–175, 132–135interpdictcheck-then-act in_get_interpstate_dict— first callers race onPyDict_GetItem+SetItemcall_python.c:42–51static PyObject *attr_namelazy-init without atomicscall_python.c:14, 36–40zombie_nextunlocked fast-path read inthread_canary_free_zombiesmisc_thread_common.h:150LIB_GET_OR_CACHE_ADDRcheck-then-build-insert onlib->l_dictwithout a locklib_obj.c:443–452_testfunc6static localint y[]shared across threads_cffi_backend.c:7410Finding 1 —
externpy->reserved1/reserved2pre-GIL + write-write raceFile:
src/c/call_python.ccffi_call_python()is the entry point for everyextern "Python"callback. It readsexternpy->reserved1before acquiring the GIL (line 249):_update_cache_to_call_python(line 170–175) writes both fields:_ffi_def_extern_decorator(line 132–135) also writesreserved1:Under free-threading,
PyGILState_Ensure()provides no mutual exclusion — itonly ensures the calling thread has a thread state. Two threads can both miss the
reserved1 != _current_interp_key()check, both enter_update_cache_to_call_python,and both write
reserved1/reserved2concurrently — producing:reserved1/reserved2— undefined behavior under C11.Py_XDECREFofold1/old2— both threads read the pre-update values,both decref them; whichever thread stores last causes a premature free on the
object freed by the other thread.
reserved1 == NULLtest is aplain C load racing with a write on another thread.
reserved2is read at line 263 aftergil_ensure(), but its write at line 173 hasno ordering fence relative to the check at line 256 — a reader can see
reserved1updated but
reserved2still stale, then callgeneral_invoke_callbackwith astale (or freed) infotuple.
Note:
struct _cffi_externpy_sis ABI-stable (shipped inparse_c_type.handbaked into generated modules). Its field types cannot change. Use the existing
cffi_atomic_load(void**)/cffi_atomic_store(void**)macros.Fix:
Store ordering (
reserved2beforereserved1) is intentional: readers checkreserved1first; once they see a non-NULL matching the interp key, they must seethe updated
reserved2.For the check-then-update sequence at lines 256–263, wrap with
CFFI_LOCK()toprevent the write-write race entirely. The atomic stores in
_update_cache_to_call_pythonare the minimum fix;
CFFI_LOCK()is the complete fix.Finding 2 —
interpdictcheck-then-act in_get_interpstate_dictFile:
src/c/call_python.c:42–51_get_interpstate_dict()is on the hot callback path(
cffi_call_python→_update_cache_to_call_python→_get_interpstate_dict).It performs a check-then-act on the interpreter-state dict:
interpdictisPyInterpreterState_GetDict(interp)— shared across all threads ofthe same interpreter. Under free-threading, two threads can both observe
d == NULL,each allocate a new dict, and the second
PyDict_SetItemsilently replaces the first.Impact: Any
@ffi.def_extern()registrations stored by the first thread's dictare irretrievably lost. Later calls to that extern function return
err = 3("@ffi.def_extern() was not called in the current subinterpreter") and produce
incorrect zero-return behavior logged to stderr.
Note:
ffi_obj.c:1032already usesPyObject_CallMethod(cache, "setdefault", ...)specifically to avoid this race — the same pattern is needed here.
Fix: Use
PyDict_SetDefault()(Python 3.13+ internal API, thread-safe) orsetdefaultidiom:Finding 3 —
static PyObject *attr_namelazy-init without atomicsFile:
src/c/call_python.c:14, 36–40This is called from
cffi_call_python(multi-threaded C callback context).Two threads can both observe
attr_name == NULLand both write it. The plainpointer write/read pair is a data race under the C11 memory model even though
PyUnicode_InternFromStringreturns the same interned object to both threads.Fix (preferred — eager init at module load):
Fix (alternative — atomic lazy-init):
Finding 4 —
zombie_nextunlocked fast-path readFile:
src/c/misc_thread_common.h:150cffi_zombie_head.zombie_nextis written insidethread_canary_make_zombie()and_thread_canary_detach_with_lock(), both of which requireTLS_ZOM_LOCK(). Thefast-path read at line 150 acquires neither
TLS_ZOM_LOCKnor usescffi_atomic_load.Under free-threading, a thread shutting down (without the GIL) writes
zombie_nextwhile another thread reading it at line 150 produces a C11 data race on a pointer
that is
sizeof(void*)wide.Fix:
cffi_atomic_loadis already defined inmisc_thread_posix.h:59andmisc_win32.h:227; this is a one-line fix.Finding 5 —
LIB_GET_OR_CACHE_ADDRcheck-then-build-insertFile:
src/c/lib_obj.c:443–452lib_build_and_cache_attrends withPyDict_SetItem(lib->l_dict, name, x).The check-build-insert triple is not atomic. Two threads calling
lib.some_funcsimultaneously can both miss the cache, both build a
PyCFunctionObject, and bothinsert — the second insert replaces the first, leaking the first thread's object.
The individual
PyDict_*calls are internally thread-safe in CPython 3.13+'sfree-threaded build, but the logical atomicity of the three-step sequence is not
preserved. In cases where
lib_build_and_cache_attrinvolvesrealize_c_type(forOP_GLOBAL_VAR/OP_EXTERN_PYTHON),CFFI_LOCKinsiderealize_c_type_or_funcprovides secondary protection against double-realization; the primary waste is the
leaked object.
Fix: Wrap with
Py_BEGIN_CRITICAL_SECTION(lib):Check that
lib_build_and_cache_attrdoes not call back through the samelibobject (it calls
realize_c_type, which usesCFFI_LOCKon_dummy, not onlib,so re-entrancy through
lib's per-object lock does not occur).Finding 6 —
_testfunc6static localint y[]File:
src/c/_cffi_backend.c:7410This function is compiled into the module for test purposes only. Under concurrent
--parallel-threadstest execution, two threads calling_testfunc6simultaneouslyrace on
y. No production code path calls this function.Fix: Remove
static(or allocateyon the heap and require callers to free it).Since this is a test helper, the minimal fix is removing
static:UNSAFE Findings (fix before declaring free-threading support) — 2
dlerror()Windows shim writes intostatic char buf[32]— shared across threadsmisc_win32.h:205–213PyDict_GetItemborrowed ref oninterpstate_dictused afterPy_DECREFgapcall_python.c:162–169Finding 7 —
dlerror()Windows static bufferFile:
src/c/misc_win32.h:205–213On Windows, cffi provides its own
dlerror()shim that writes into astatic charbuffer and returns a pointer to it. Under free-threading, two threads calling
dlerror()simultaneously race onbuf: one thread's error message is overwrittenby the other before the first thread reads it.
Fix: Use a thread-local buffer:
On MSVC: use
__declspec(thread)instead of__thread. Alternatively, usecffi_tls_s(the existing TLS struct inmisc_thread_common.h) to store this buffer.Finding 8 — Borrowed ref from
PyDict_GetItemon concurrently-written dictFile:
src/c/call_python.c:162–169interpstate_dictis the per-interpreter@ffi.def_extern()registry. Underfree-threading,
_ffi_def_extern_decoratorcan write to it concurrently (it holdsonly the GIL, which is no longer exclusive). Between the
PyDict_GetItem(line 162)and
Py_INCREF(infotuple)(line 168),Py_DECREF(interpstate_key)runs — if thiskey's finalizer triggers Python code that removes an entry from
interpstate_dict,the borrowed
infotuplebecomes dangling.In practice
interpstate_keyis a freshly-allocatedPyLong_FromVoidPtr, so itsdestructor cannot reach
interpstate_dict. The immediate risk is low, but thepattern is a documented anti-pattern under free-threading.
Fix: Replace with the existing
PyDict_GetItemRefshim(already defined at
_cffi_backend.c:150):PROTECT Findings (add synchronization) — 1
_realize_c_struct_or_unionre-acquires CFFI_LOCK while it is held, silently suspending the outer critical sectionrealize_c_type.c:451, 753, 897Finding 9 — Nested CFFI_LOCK suspension via
Py_BEGIN_CRITICAL_SECTIONre-entrancyFile:
src/c/realize_c_type.cCFFI_LOCK()expands toPy_BEGIN_CRITICAL_SECTION(&_dummy). CPython'scritical_section.h(lines 55–58) documents:This means that if a call path already inside
CFFI_LOCK()re-entersCFFI_LOCK()(on the same_dummyobject), the outer critical section issilently released for the duration of the inner one.
The re-entrant path exists in two forms:
Path A —
realize_c_type_or_func(line 752–757) acquires CFFI_LOCK, then insiderealize_c_type_or_func_lock_heldcalls_realize_c_struct_or_union(line 451),which calls
do_realize_lazy_struct(line 576), which acquires CFFI_LOCK again(line 897).
Path B —
b_complete_struct_or_union(line 5537) acquires CFFI_LOCK, callsb_complete_struct_or_union_lock_held, which at line 5210 callsforce_lazy_struct(ftype)→do_realize_lazy_struct(ftype)→ CFFI_LOCK again.During the suspension window, another thread that acquires CFFI_LOCK can enter
do_realize_lazy_struct_lock_heldfor the samectwhile the original threadis still constructing it. The
ct_under_constructionflag (set atomically at line 875)provides a secondary guard: a concurrent thread will see
ct_under_construction == 1and fail the guard at
b_complete_struct_or_union_lock_held:5172. This prevents adouble-completion but creates a confusing "error: internal error" return for the
second thread.
This does not cause a deadlock — CPython's critical sections deliberately avoid
deadlock by suspending rather than blocking. However, the design is fragile and
relies on secondary flag checks to compensate for the gap.
Fix: Replace
do_realize_lazy_struct(which acquires CFFI_LOCK from scratch)with a
_lock_heldvariant in both re-entrant call sites:This eliminates the suspension window entirely. The
_lock_heldsuffix convention isalready established in the codebase.
MIGRATE Findings (structural changes) — 3
m_size = -1precludes per-interpreter module state; incompatible with sub-interpreter safetycffi1_module.c:99,_cffi_backend.c:7923PyTypeObjectobjects instead of heap types; prevents per-interpreter type isolation_cffi_backend.c:290–296ct->ct_stuffuses per-object CS; otherctfields use CFFI_LOCK_cffi_backend.c:2580–2586Finding 10 —
m_size = -1global module stateFiles:
src/c/cffi1_module.c:99,src/c/_cffi_backend.c:7923Both module definitions set
m_size = -1(no per-interpreter module state). Thisis incompatible with multi-phase module initialization (
Py_mod_exec) and meansthat all global statics (
unique_cache,all_primitives,g_ct_voidp, etc.) areshared across sub-interpreters in the same process.
For the current embedding / extension use cases this is acceptable. For long-term
correctness under Isolated Sub-Interpreters (PEP 684),
m_sizeshould be set tosizeof(cffi_module_state_t)with a matchingm_clear/m_traverseto holdper-interpreter references.
Fix (longer-term): Define a
cffi_module_state_tstruct, migrate per-interpreterglobals into it, implement
Py_mod_execslot for initialization, and setm_size = sizeof(cffi_module_state_t).Finding 11 — Static
PyTypeObjectobjectsFile:
src/c/_cffi_backend.c:290–296cffi defines its types as static
PyTypeObjectobjects:Static type objects are shared across sub-interpreters. When/if cffi migrates to
per-interpreter state (Finding 10), these should become heap-allocated types
(
PyType_FromSpec) to ensure each interpreter has its own type object with its owntp_dict.Fix (longer-term): Use
PyType_FromSpecwith aPyType_Specfor each type.This is a significant refactor best done after Finding 10 is resolved.
Finding 12 — Mixed lock domains for
ct->ct_stuffFile:
src/c/_cffi_backend.c:2580–2586ct->ct_stuff(the lazy array-type pointer incdata_slice) is initialized underPy_BEGIN_CRITICAL_SECTION(ct)— the per-object lock on the individualCTypeDescrObject. All otherctmutations useCFFI_LOCK()(the module-levelcritical section on
_dummy). These are two distinct lock domains guarding differentfields of the same
ctstruct.Currently there is no overlap (no field is guarded by both locks), so this is safe.
However, the asymmetry is a maintenance hazard: a future change that reads
ct_stuffinside
CFFI_LOCK()while a writer hasPy_BEGIN_CRITICAL_SECTION(ct)would createan undetected ordering issue.
Fix (longer-term): Document this invariant in a comment, or consolidate all
lazy-init paths on
ctto use the same lock domain.SAFE Patterns (confirmed safe — no action needed)
unique_cachedict_cffi_backend.c:398–400, 4648–4676PyMutex unique_cache_lockguards all reads/writes; init under import lockall_primitives[]arrayrealize_c_type.c:19, 77–85#ifdef Py_GIL_DISABLEDpre-init loop populates all slots at module init; read-only thereaftercffi_zombie_lock,cffi_zombie_headlistmisc_thread_common.h:47–49, 85TLS_ZOM_LOCK()guards all list mutations;cffi_zombie_lockinit is under import lockcffi_tls_index/cffi_tls_keyTLS handlemisc_win32.h:13,misc_thread_posix.h:22static char init_doneinPyInit__cffi_backend_cffi_backend.c:7911static char init_doneininit_ffi_libcffi1_module.c:27PyInit__cffi_backend; same import-lock protectionbuilder->ctx.types[]opcode slotsrealize_c_type.c:443, 453, 594, 738cffi_atomic_store(); read via_CFFI_LOAD_OP()→cffi_atomic_load()ct_lazy_field_list,ct_under_construction,ct_unrealized_struct_or_unionflagsmisc_thread_common.h:397–401cffi_check_flag()/cffi_set_flag()→__ATOMIC_SEQ_CSTatomicsct_flags_mutfield_cffi_backend.c:229–235cdata_slicelazyct->ct_stuff_cffi_backend.c:2580–2586Py_BEGIN_CRITICAL_SECTION(ct)serializes the once-initffi_init_oncecacheffi_obj.c:999–1004Py_BEGIN_CRITICAL_SECTION(self)correctly paired_embedding.hstartup_embedding.h:70–103, 386–464PyWeakref_GetObjectshim_cffi_backend.c:165–180#if PY_VERSION_HEX < 0x030d00a1— unreachable under free-threading (3.13+)PyIOBase_TypeObjfile_emulator.h:4init_file_emulator()under module init; read-only aftermalloc_closure.hstaticsmalloc_closure.h:31, 74, 85, 86MALLOC_CLOSURE_LOCK()→PyMutexguards all reads/writes_cffi_backend.c:4648,realize_c_type.c:753misc_thread_common.h:109, 157, 278PyErr_Occurred()/PyErr_Clear().cfilesPyThreadState— no cross-thread sharingPy_DECREFon locally owned references.cfilesreserved1/reserved2path (Finding 1) is a concernRecommendations
Immediate (RACE items — Findings 1–4)
1. Fix Finding 1 — Atomic access to
externpy->reserved1/reserved2In
src/c/call_python.c, change all four access sites to usecffi_atomic_loadand
cffi_atomic_store. Ensurereserved2is stored beforereserved1. Wrap thecheck-update-use sequence in
cffi_call_python(lines 255–265) withCFFI_LOCK()/CFFI_UNLOCK()to eliminate the write-write race entirely. This isthe highest-severity fix in the codebase.
2. Fix Finding 2 — Use
setdefaultforinterpdictinitializationIn
_get_interpstate_dict()(line 42–51), replace theGetItem+SetItempair witha
setdefault-based approach (PyObject_CallMethod(interpdict, "setdefault", ...))or
PyDict_SetDefault()to make the check-and-insert atomic.3. Fix Finding 3 — Move
attr_nameinit to module startupInitialize
static PyObject *attr_nameduringPyInit__cffi_backend()(orinit_ffi_lib()), beforePy_MOD_GIL_NOT_USEDis declared. Remove the runtimelazy-init check entirely. This is the same pattern already applied to
all_primitivesunder
#ifdef Py_GIL_DISABLED.4. Fix Finding 4 — Atomic fast-path read of
zombie_nextIn
thread_canary_free_zombies()(line 150), replace the plain pointer comparisonwith
cffi_atomic_load((void **)&cffi_zombie_head.zombie_next). One line change.Short-term (RACE + UNSAFE items — Findings 5–8)
5. Fix Finding 5 — Serialize
LIB_GET_OR_CACHE_ADDRWrap the
LIB_GET_OR_CACHE_ADDRmacro body inPy_BEGIN_CRITICAL_SECTION(lib)inlib_obj.c. Ensurelib_build_and_cache_attrdoes not re-enter through thelibobject's per-object lock (it uses CFFI_LOCK on
_dummy, so there is no conflict).6. Fix Finding 6 — Remove
staticfrom_testfunc6Change
static int y[50]toint y[50]in_cffi_backend.c:7410. Stack-allocatethe array. This is a one-word change.
7. Fix Finding 7 — Thread-local
dlerror()buffer on WindowsIn
misc_win32.h, add__declspec(thread)(MSVC) or__thread(GCC/Clang) tobufin thedlerror()shim, or store the buffer incffi_tls_s.8. Fix Finding 8 — Replace borrowed ref with
PyDict_GetItemRefIn
call_python.c:162–169, usePyDict_GetItemRef(interpstate_dict, ...)(the shimis already defined at
_cffi_backend.c:150) to obtain a strong reference before thePy_DECREF(interpstate_key)gap.Medium-term (PROTECT item — Finding 9)
9. Fix Finding 9 — Eliminate nested CFFI_LOCK suspension
Add
do_realize_lazy_struct_lock_held()and call it directly from_realize_c_struct_or_union()(line 451) and from withinb_complete_struct_or_union_lock_held()(line 5210, via an inline call replacingforce_lazy_struct). This eliminates the window where the outer CFFI_LOCK issilently suspended.
Longer-term (MIGRATE items — Findings 10–12)
10–12. Migrate to per-interpreter module state (
m_size > 0), heap type objects(
PyType_FromSpec), and a unified lock domain forctfields. These are significantrefactors best staged after the immediate fixes ship.
Suggested Next Steps
Cross-Agent Deduplication Notes
The following issues were flagged by multiple agents and counted once:
externpy->reserved1/reserved2attr_namelazy-initLIB_GET_OR_CACHE_ADDRTOCTOUGenerated by ft-review-toolkit:explore on 2026-06-13
Python 3.14t (Py_GIL_DISABLED=1 confirmed) · cffi 2.0.1.dev0
Agents: shared-state-auditor · ft-history-analyzer · lock-discipline-checker · atomic-candidate-finder · unsafe-api-detector · stop-the-world-advisor
Beta Was this translation helpful? Give feedback.
All reactions