perf: add free-threaded Python safety with per-object critical sections#153
perf: add free-threaded Python safety with per-object critical sections#153ifduyue wants to merge 1 commit into
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety support for free-threaded Python (3.13+) by implementing critical section macros and applying them to state-modifying and state-reading operations across all xxHash variants. Additionally, it improves memory safety by nullifying state pointers within deallocation functions. The review feedback correctly identifies a safety issue in PYXXH3_64_copy, where the critical section is closed prematurely, leaving a version-specific workaround that accesses the source object's state unprotected from concurrent access.
| XXH_CRITICAL_SECTION(self); | ||
| p->seed = self->seed; | ||
| XXH3_copyState(p->xxhash_state, self->xxhash_state); | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
The critical section in PYXXH3_64_copy ends too early. It should encompass the workaround for XXH_VERSION_NUMBER < 704 (lines 1646-1652) because that block accesses self->xxhash_state->customSecret. For consistency and safety in free-threaded Python, the lock should be held until all operations involving the source object's state are complete, as correctly implemented in PYXXH3_128_copy (lines 2108-2118).
XXH_CRITICAL_SECTION(self);
p->seed = self->seed;
XXH3_copyState(p->xxhash_state, self->xxhash_state);
#if XXH_VERSION_NUMBER < 704
// v0.7.3 and earlier have a bug where states reset with a seed
// will have a wild pointer to the original state when copied,
// causing a use-after-free if the original is freed.
if (p->xxhash_state->secret == &self->xxhash_state->customSecret[0])
p->xxhash_state->secret = &p->xxhash_state->customSecret[0];
#endif
XXH_END_CRITICAL_SECTION();8dccbe5 to
38e95a5
Compare
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces per-object critical sections to support free-threaded Python 3.13+ across XXH32, XXH64, and XXH3 hash implementations, ensuring thread safety for state-modifying operations. The reviewer identified a high-severity deadlock risk in the initialization functions where internal updates are performed while holding a lock, as the subsequent buffer release could execute arbitrary Python code. Additional feedback suggests that locking within deallocators may be unnecessary and points out a minor formatting issue with an extra blank line.
| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH32_reset(self->xxhash_state, seed); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH32_do_update(self, &buf); | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
Holding the critical section while calling PYXXH32_do_update is problematic because PYXXH32_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH32_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH32_do_update, as the latter already implements its own internal locking.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH32_reset(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();
if (buf.obj)
PYXXH32_do_update(self, &buf);| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH64_reset(self->xxhash_state, seed); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH64_do_update(self, &buf); | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
Holding the critical section while calling PYXXH64_do_update is problematic because PYXXH64_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH64_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH64_do_update, as the latter already implements its own internal locking.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH64_reset(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();
if (buf.obj)
PYXXH64_do_update(self, &buf);| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH3_64bits_reset_withSeed(self->xxhash_state, seed); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH3_64_do_update(self, &buf); | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
Holding the critical section while calling PYXXH3_64_do_update is problematic because PYXXH3_64_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH3_64_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH3_64_do_update, as the latter already implements its own internal locking.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_64bits_reset_withSeed(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();
if (buf.obj)
PYXXH3_64_do_update(self, &buf);| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH3_128bits_reset_withSeed(self->xxhash_state, seed); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH3_128_do_update(self, &buf); | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
Holding the critical section while calling PYXXH3_128_do_update is problematic because PYXXH3_128_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH3_128_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH3_128_do_update, as the latter already implements its own internal locking.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_128bits_reset_withSeed(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();
if (buf.obj)
PYXXH3_128_do_update(self, &buf);| XXH_CRITICAL_SECTION(self); | ||
| if (self->xxhash_state) { | ||
| XXH32_freeState(self->xxhash_state); | ||
| self->xxhash_state = NULL; | ||
| } | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
Using a critical section in tp_dealloc is generally unnecessary and potentially risky. If an object's reference count has reached zero, no other thread should be able to access it via Python references, meaning no other thread can be holding the lock. While it provides a safety net against internal C-level races, it adds overhead and complexity to the destruction path. If you decide to keep it, ensure that the object's mutex is not destroyed before the critical section ends.
| return NULL; | ||
| } | ||
|
|
||
|
|
38e95a5 to
24b083e
Compare
24b083e to
9472886
Compare
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for free-threaded Python 3.13+ by implementing per-object critical sections across all hash types (XXH32, XXH64, XXH3_64, and XXH3_128). The feedback highlights a critical issue where Py_BEGIN_ALLOW_THREADS is incorrectly nested inside critical sections, which is forbidden in Python 3.13 and could lead to deadlocks. Furthermore, the reviewer noted that several state-modifying operations in the init and vectorcall functions are currently unprotected, posing a risk of race conditions in a multi-threaded environment.
| XXH_CRITICAL_SECTION(self); | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH32_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
In Python 3.13+, it is explicitly forbidden to call Py_BEGIN_ALLOW_THREADS (or any macro that detaches the thread state) while holding a critical section. This is because the critical section relies on the thread being attached to safely hold the object's internal mutex (ob_mutex). Releasing the GIL/detaching while holding this lock can lead to deadlocks or crashes, particularly during garbage collection.
Since xxhash operations are generally very fast, the recommended fix is to remove the Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS block when using XXH_CRITICAL_SECTION. If releasing the GIL is absolutely necessary for performance on extremely large buffers, you should implement a custom mutex in the object structure instead of using the built-in critical section API.
XXH_CRITICAL_SECTION(self);
XXH32_update(self->xxhash_state, buf.buf, buf.len);
XXH_END_CRITICAL_SECTION();| XXH_CRITICAL_SECTION(self); | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH64_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
This block violates the Python 3.13 C API contract: Py_BEGIN_ALLOW_THREADS must not be nested inside a critical section. This applies to all hash types in this module (XXH64, XXH3_64, XXH3_128). Please remove the thread-releasing macros here.
XXH_CRITICAL_SECTION(self);
XXH64_update(self->xxhash_state, buf.buf, buf.len);
XXH_END_CRITICAL_SECTION();| XXH_CRITICAL_SECTION(self); | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
| XXH_CRITICAL_SECTION(self); | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| XXH_END_CRITICAL_SECTION(); |
There was a problem hiding this comment.
| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH32_reset(self->xxhash_state, seed); | ||
| XXH_END_CRITICAL_SECTION(); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH32_do_update(self, &buf); | ||
| if (buf.obj) { | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH32_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| PyBuffer_Release(&buf); | ||
| } |
There was a problem hiding this comment.
The XXH32_update call in the init method is not protected by the critical section. In a free-threaded environment, __init__ can be called on an existing object that might be concurrently accessed by other threads. To ensure thread safety, the entire state modification (reset and update) should be performed within the critical section.
Note: As mentioned in other comments, the Py_BEGIN_ALLOW_THREADS block must be removed if it is moved inside the critical section.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH32_reset(self->xxhash_state, seed);
if (buf.obj) {
XXH32_update(self->xxhash_state, buf.buf, buf.len);
}
XXH_END_CRITICAL_SECTION();
if (buf.obj) {
PyBuffer_Release(&buf);
}| if (buf.obj) { | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH32_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| PyBuffer_Release(&buf); | ||
| } |
There was a problem hiding this comment.
The update operation in vectorcall is unprotected. While this usually occurs on a newly allocated object, it is safer and more consistent with the rest of the PR to protect all state-modifying operations with the critical section to prevent potential races if the object is leaked during construction.
9472886 to
ede0417
Compare
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for free-threaded Python (3.13+) by implementing per-object critical sections to protect the internal hash state. It defines new macros for critical sections and applies them across various hash implementations (XXH32, XXH64, XXH3_64, and XXH3_128) in methods such as deallocation, initialization, and digest generation. Additionally, the update logic has been refactored to inline the hash update calls while releasing the Global Interpreter Lock (GIL). The reviewer identified several critical issues where the hash state is modified or accessed while the GIL is released but without being protected by a critical section. This occurs during both the initialization sequence and the standalone update methods for all hash types, creating potential race conditions that need to be addressed by extending the scope of the critical sections.
| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH32_reset(self->xxhash_state, seed); | ||
| XXH_END_CRITICAL_SECTION(); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH32_do_update(self, &buf); | ||
| if (buf.obj) { | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH32_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| PyBuffer_Release(&buf); | ||
| } |
There was a problem hiding this comment.
The update part of the initialization is performed outside the critical section while the GIL is released. In free-threaded Python, if the object is already reachable, another thread could call digest() or update() concurrently, leading to a race condition on the hash state. The critical section should be extended to cover the entire state modification sequence to ensure atomicity.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH32_reset(self->xxhash_state, seed);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
}
XXH_END_CRITICAL_SECTION();
if (buf.obj) {
PyBuffer_Release(&buf);
}| Py_BEGIN_ALLOW_THREADS | ||
| XXH32_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
The update operation is performed while the GIL is released but without a critical section. In free-threaded Python, this allows concurrent access to the xxhash_state from multiple threads, which is not thread-safe. You should wrap the GIL-releasing block in XXH_CRITICAL_SECTION to protect the object's state.
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH64_reset(self->xxhash_state, seed); | ||
| XXH_END_CRITICAL_SECTION(); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH64_do_update(self, &buf); | ||
| if (buf.obj) { | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH64_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| PyBuffer_Release(&buf); | ||
| } |
There was a problem hiding this comment.
Similar to the XXH32 implementation, the update call here is unprotected by the critical section while the GIL is released. This allows for potential race conditions in a free-threaded environment if the object is accessed by another thread during initialization.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH64_reset(self->xxhash_state, seed);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH64_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
}
XXH_END_CRITICAL_SECTION();
if (buf.obj) {
PyBuffer_Release(&buf);
}| Py_BEGIN_ALLOW_THREADS | ||
| XXH64_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH3_64bits_reset_withSeed(self->xxhash_state, seed); | ||
| XXH_END_CRITICAL_SECTION(); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH3_64_do_update(self, &buf); | ||
| if (buf.obj) { | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| PyBuffer_Release(&buf); | ||
| } |
There was a problem hiding this comment.
The initialization sequence for XXH3_64 should be fully encapsulated within the critical section to prevent concurrent access to the state while the GIL is released during the initial update.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_64bits_reset_withSeed(self->xxhash_state, seed);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
}
XXH_END_CRITICAL_SECTION();
if (buf.obj) {
PyBuffer_Release(&buf);
}| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
| XXH_CRITICAL_SECTION(self); | ||
| self->seed = seed; | ||
| XXH3_128bits_reset_withSeed(self->xxhash_state, seed); | ||
| XXH_END_CRITICAL_SECTION(); | ||
|
|
||
| if (buf.obj) | ||
| PYXXH3_128_do_update(self, &buf); | ||
| if (buf.obj) { | ||
| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS | ||
| PyBuffer_Release(&buf); | ||
| } |
There was a problem hiding this comment.
The XXH3_128 initialization also requires the critical section to be extended to cover the update call to ensure thread safety in free-threaded builds.
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_128bits_reset_withSeed(self->xxhash_state, seed);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
}
XXH_END_CRITICAL_SECTION();
if (buf.obj) {
PyBuffer_Release(&buf);
}| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
…al sections Use Py_BEGIN_CRITICAL_SECTION / Py_END_CRITICAL_SECTION to protect concurrent access to xxhash_state in all four hash types. - Python >= 3.13: uses PyObject's built-in ob_mutex via critical section - Python < 3.13: macros expand to nothing (GIL protects us) - GIL build (3.13+): macros expand to nothing (zero overhead) - Free-threaded build (3.13t+): real per-object locking - Locked operations: do_update, digest, hexdigest, intdigest, copy, reset, dealloc, init
ede0417 to
4eda0df
Compare
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for free-threaded Python (3.13+) by implementing per-object critical sections to protect internal hash states across various xxhash variants. The changes include the addition of synchronization macros and the updating of deallocation, initialization, and digest methods to ensure thread safety. Feedback from the reviewer indicates that the update methods for xxh32, xxh64, xxh3_64, and xxh3_128 are missing these critical section protections, which could lead to race conditions. Additionally, it is recommended to align TypeError messages with hashlib for better compatibility.
| Py_BEGIN_ALLOW_THREADS | ||
| XXH32_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
The update method for xxh32 is missing the critical section protection. Since PYXXH32_do_update was removed and its logic inlined here, the per-object lock must be explicitly added. Without it, concurrent calls to update on the same object will lead to race conditions and corruption of the internal xxhash_state in free-threaded Python. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();References
- For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.
| Py_BEGIN_ALLOW_THREADS | ||
| XXH64_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
The update method for xxh64 is missing critical section protection. Concurrent updates to the same hash object must be synchronized to ensure the integrity of the xxhash_state when running without the GIL. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH64_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();References
- For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.
| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
The update method for xxh3_64 is missing critical section protection. This operation modifies the internal state and must be protected by the object's mutex in free-threaded builds. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();References
- For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.
| Py_BEGIN_ALLOW_THREADS | ||
| XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
The update method for xxh3_128 is missing critical section protection. To support free-threaded Python safely, all state-modifying operations like update must be wrapped in a critical section. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();References
- For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.
Protect concurrent access to
xxhash_statein all four hash types usingPy_BEGIN_CRITICAL_SECTION/Py_END_CRITICAL_SECTION.ob_mutexvia critical sectiondo_update,digest,hexdigest,intdigest,copy,reset,dealloc,init