Skip to content

perf: add free-threaded Python safety with per-object critical sections#153

Open
ifduyue wants to merge 1 commit into
masterfrom
duyue/free-threaded
Open

perf: add free-threaded Python safety with per-object critical sections#153
ifduyue wants to merge 1 commit into
masterfrom
duyue/free-threaded

Conversation

@ifduyue
Copy link
Copy Markdown
Owner

@ifduyue ifduyue commented May 16, 2026

Protect concurrent access to xxhash_state in all four hash types using Py_BEGIN_CRITICAL_SECTION / Py_END_CRITICAL_SECTION.

  • Python >= 3.13: uses PyObject's built-in ob_mutex via critical section
  • Python < 3.13: macros expand to nothing (GIL protects us)
  • GIL build (3.13+): macros expand to nothing (zero overhead)
  • Free-threaded build (3.13t+): real per-object locking
  • Locked operations: do_update, digest, hexdigest, intdigest, copy, reset, dealloc, init

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 16, 2026

Merging this PR will not alter performance

✅ 30 untouched benchmarks
⏩ 90 skipped benchmarks1


Comparing duyue/free-threaded (4eda0df) with master (c0f71ae)

Open in CodSpeed

Footnotes

  1. 90 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety support for free-threaded Python (3.13+) by implementing critical section macros and applying them to state-modifying and state-reading operations across all xxHash variants. Additionally, it improves memory safety by nullifying state pointers within deallocation functions. The review feedback correctly identifies a safety issue in PYXXH3_64_copy, where the critical section is closed prematurely, leaving a version-specific workaround that accesses the source object's state unprotected from concurrent access.

Comment thread src/_xxhash.c Outdated
Comment on lines +1642 to +1645
XXH_CRITICAL_SECTION(self);
p->seed = self->seed;
XXH3_copyState(p->xxhash_state, self->xxhash_state);
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The critical section in PYXXH3_64_copy ends too early. It should encompass the workaround for XXH_VERSION_NUMBER < 704 (lines 1646-1652) because that block accesses self->xxhash_state->customSecret. For consistency and safety in free-threaded Python, the lock should be held until all operations involving the source object's state are complete, as correctly implemented in PYXXH3_128_copy (lines 2108-2118).

    XXH_CRITICAL_SECTION(self);
    p->seed = self->seed;
    XXH3_copyState(p->xxhash_state, self->xxhash_state);
#if XXH_VERSION_NUMBER < 704
    // v0.7.3 and earlier have a bug where states reset with a seed
    // will have a wild pointer to the original state when copied,
    // causing a use-after-free if the original is freed.
    if (p->xxhash_state->secret == &self->xxhash_state->customSecret[0])
        p->xxhash_state->secret = &p->xxhash_state->customSecret[0];
#endif
    XXH_END_CRITICAL_SECTION();

@ifduyue ifduyue force-pushed the duyue/free-threaded branch from 8dccbe5 to 38e95a5 Compare May 16, 2026 11:04
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 16, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-object critical sections to support free-threaded Python 3.13+ across XXH32, XXH64, and XXH3 hash implementations, ensuring thread safety for state-modifying operations. The reviewer identified a high-severity deadlock risk in the initialization functions where internal updates are performed while holding a lock, as the subsequent buffer release could execute arbitrary Python code. Additional feedback suggests that locking within deallocators may be unnecessary and points out a minor formatting issue with an extra blank line.

Comment thread src/_xxhash.c Outdated
Comment on lines +613 to +619
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH32_reset(self->xxhash_state, seed);

if (buf.obj)
PYXXH32_do_update(self, &buf);
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Holding the critical section while calling PYXXH32_do_update is problematic because PYXXH32_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH32_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH32_do_update, as the latter already implements its own internal locking.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH32_reset(self->xxhash_state, seed);
    XXH_END_CRITICAL_SECTION();

    if (buf.obj)
        PYXXH32_do_update(self, &buf);

Comment thread src/_xxhash.c Outdated
Comment on lines +1054 to +1060
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH64_reset(self->xxhash_state, seed);

if (buf.obj)
PYXXH64_do_update(self, &buf);
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Holding the critical section while calling PYXXH64_do_update is problematic because PYXXH64_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH64_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH64_do_update, as the latter already implements its own internal locking.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH64_reset(self->xxhash_state, seed);
    XXH_END_CRITICAL_SECTION();

    if (buf.obj)
        PYXXH64_do_update(self, &buf);

Comment thread src/_xxhash.c Outdated
Comment on lines +1494 to +1500
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_64bits_reset_withSeed(self->xxhash_state, seed);

if (buf.obj)
PYXXH3_64_do_update(self, &buf);
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Holding the critical section while calling PYXXH3_64_do_update is problematic because PYXXH3_64_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH3_64_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH3_64_do_update, as the latter already implements its own internal locking.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH3_64bits_reset_withSeed(self->xxhash_state, seed);
    XXH_END_CRITICAL_SECTION();

    if (buf.obj)
        PYXXH3_64_do_update(self, &buf);

Comment thread src/_xxhash.c Outdated
Comment on lines +1942 to +1948
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_128bits_reset_withSeed(self->xxhash_state, seed);

if (buf.obj)
PYXXH3_128_do_update(self, &buf);
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Holding the critical section while calling PYXXH3_128_do_update is problematic because PYXXH3_128_do_update calls PyBuffer_Release after releasing its own lock. However, since PYXXH3_128_init still holds the lock, PyBuffer_Release is effectively called while the object is locked. Releasing a buffer can trigger arbitrary Python code (e.g., via __del__), which is a deadlock risk when holding a per-object lock. Consider releasing the critical section before calling PYXXH3_128_do_update, as the latter already implements its own internal locking.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH3_128bits_reset_withSeed(self->xxhash_state, seed);
    XXH_END_CRITICAL_SECTION();

    if (buf.obj)
        PYXXH3_128_do_update(self, &buf);

Comment thread src/_xxhash.c
Comment on lines +464 to +469
XXH_CRITICAL_SECTION(self);
if (self->xxhash_state) {
XXH32_freeState(self->xxhash_state);
self->xxhash_state = NULL;
}
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a critical section in tp_dealloc is generally unnecessary and potentially risky. If an object's reference count has reached zero, no other thread should be able to access it via Python references, meaning no other thread can be holding the lock. While it provides a safety net against internal C-level races, it adds overhead and complexity to the destruction path. If you decide to keep it, ensure that the object's mutex is not destroyed before the critical section ends.

Comment thread src/_xxhash.c
return NULL;
}


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an unnecessary extra blank line introduced here.

@ifduyue ifduyue force-pushed the duyue/free-threaded branch from 38e95a5 to 24b083e Compare May 16, 2026 11:19
@ifduyue ifduyue force-pushed the duyue/free-threaded branch from 24b083e to 9472886 Compare May 16, 2026 13:14
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 16, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for free-threaded Python 3.13+ by implementing per-object critical sections across all hash types (XXH32, XXH64, XXH3_64, and XXH3_128). The feedback highlights a critical issue where Py_BEGIN_ALLOW_THREADS is incorrectly nested inside critical sections, which is forbidden in Python 3.13 and could lead to deadlocks. Furthermore, the reviewer noted that several state-modifying operations in the init and vectorcall functions are currently unprotected, posing a risk of race conditions in a multi-threaded environment.

Comment thread src/_xxhash.c Outdated
Comment on lines +669 to +673
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In Python 3.13+, it is explicitly forbidden to call Py_BEGIN_ALLOW_THREADS (or any macro that detaches the thread state) while holding a critical section. This is because the critical section relies on the thread being attached to safely hold the object's internal mutex (ob_mutex). Releasing the GIL/detaching while holding this lock can lead to deadlocks or crashes, particularly during garbage collection.

Since xxhash operations are generally very fast, the recommended fix is to remove the Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS block when using XXH_CRITICAL_SECTION. If releasing the GIL is absolutely necessary for performance on extremely large buffers, you should implement a custom mutex in the object structure instead of using the built-in critical section API.

    XXH_CRITICAL_SECTION(self);
    XXH32_update(self->xxhash_state, buf.buf, buf.len);
    XXH_END_CRITICAL_SECTION();

Comment thread src/_xxhash.c Outdated
Comment on lines +1112 to +1116
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH64_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block violates the Python 3.13 C API contract: Py_BEGIN_ALLOW_THREADS must not be nested inside a critical section. This applies to all hash types in this module (XXH64, XXH3_64, XXH3_128). Please remove the thread-releasing macros here.

    XXH_CRITICAL_SECTION(self);
    XXH64_update(self->xxhash_state, buf.buf, buf.len);
    XXH_END_CRITICAL_SECTION();

Comment thread src/_xxhash.c Outdated
Comment on lines +1554 to +1558
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Critical section violation: Py_BEGIN_ALLOW_THREADS is used inside XXH_CRITICAL_SECTION. This must be removed to ensure compatibility with free-threaded Python 3.13.

    XXH_CRITICAL_SECTION(self);
    XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
    XXH_END_CRITICAL_SECTION();

Comment thread src/_xxhash.c Outdated
Comment on lines +2004 to +2008
XXH_CRITICAL_SECTION(self);
Py_BEGIN_ALLOW_THREADS
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
XXH_END_CRITICAL_SECTION();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Critical section violation: Py_BEGIN_ALLOW_THREADS is used inside XXH_CRITICAL_SECTION. This must be removed to ensure compatibility with free-threaded Python 3.13.

    XXH_CRITICAL_SECTION(self);
    XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
    XXH_END_CRITICAL_SECTION();

Comment thread src/_xxhash.c
Comment on lines +606 to +616
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH32_reset(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();

if (buf.obj)
PYXXH32_do_update(self, &buf);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The XXH32_update call in the init method is not protected by the critical section. In a free-threaded environment, __init__ can be called on an existing object that might be concurrently accessed by other threads. To ensure thread safety, the entire state modification (reset and update) should be performed within the critical section.

Note: As mentioned in other comments, the Py_BEGIN_ALLOW_THREADS block must be removed if it is moved inside the critical section.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH32_reset(self->xxhash_state, seed);
    if (buf.obj) {
        XXH32_update(self->xxhash_state, buf.buf, buf.len);
    }
    XXH_END_CRITICAL_SECTION();

    if (buf.obj) {
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c
Comment on lines +503 to +508
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The update operation in vectorcall is unprotected. While this usually occurs on a newly allocated object, it is safer and more consistent with the rest of the PR to protect all state-modifying operations with the critical section to prevent potential races if the object is leaked during construction.

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 19, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for free-threaded Python (3.13+) by implementing per-object critical sections to protect the internal hash state. It defines new macros for critical sections and applies them across various hash implementations (XXH32, XXH64, XXH3_64, and XXH3_128) in methods such as deallocation, initialization, and digest generation. Additionally, the update logic has been refactored to inline the hash update calls while releasing the Global Interpreter Lock (GIL). The reviewer identified several critical issues where the hash state is modified or accessed while the GIL is released but without being protected by a critical section. This occurs during both the initialization sequence and the standalone update methods for all hash types, creating potential race conditions that need to be addressed by extending the scope of the critical sections.

Comment thread src/_xxhash.c
Comment on lines +606 to +616
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH32_reset(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();

if (buf.obj)
PYXXH32_do_update(self, &buf);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The update part of the initialization is performed outside the critical section while the GIL is released. In free-threaded Python, if the object is already reachable, another thread could call digest() or update() concurrently, leading to a race condition on the hash state. The critical section should be extended to cover the entire state modification sequence to ensure atomicity.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH32_reset(self->xxhash_state, seed);

    if (buf.obj) {
        Py_BEGIN_ALLOW_THREADS
        XXH32_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
    }
    XXH_END_CRITICAL_SECTION();

    if (buf.obj) {
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c
Comment on lines +669 to +671
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The update operation is performed while the GIL is released but without a critical section. In free-threaded Python, this allows concurrent access to the xxhash_state from multiple threads, which is not thread-safe. You should wrap the GIL-releasing block in XXH_CRITICAL_SECTION to protect the object's state.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH32_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();

Comment thread src/_xxhash.c
Comment on lines +1047 to +1057
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH64_reset(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();

if (buf.obj)
PYXXH64_do_update(self, &buf);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH64_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the XXH32 implementation, the update call here is unprotected by the critical section while the GIL is released. This allows for potential race conditions in a free-threaded environment if the object is accessed by another thread during initialization.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH64_reset(self->xxhash_state, seed);

    if (buf.obj) {
        Py_BEGIN_ALLOW_THREADS
        XXH64_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
    }
    XXH_END_CRITICAL_SECTION();

    if (buf.obj) {
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c
Comment on lines +1110 to +1112
Py_BEGIN_ALLOW_THREADS
XXH64_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The XXH64_update call is missing a critical section to protect the state when the GIL is released in free-threaded Python builds.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH64_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();

Comment thread src/_xxhash.c
Comment on lines +1487 to +1497
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_64bits_reset_withSeed(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();

if (buf.obj)
PYXXH3_64_do_update(self, &buf);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The initialization sequence for XXH3_64 should be fully encapsulated within the critical section to prevent concurrent access to the state while the GIL is released during the initial update.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH3_64bits_reset_withSeed(self->xxhash_state, seed);

    if (buf.obj) {
        Py_BEGIN_ALLOW_THREADS
        XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
    }
    XXH_END_CRITICAL_SECTION();

    if (buf.obj) {
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c
Comment on lines +1550 to +1552
Py_BEGIN_ALLOW_THREADS
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Missing critical section around the GIL-releasing XXH3_64bits_update call, which is necessary for thread safety in free-threaded Python.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();

Comment thread src/_xxhash.c
Comment on lines +1935 to +1945
XXH_CRITICAL_SECTION(self);
self->seed = seed;
XXH3_128bits_reset_withSeed(self->xxhash_state, seed);
XXH_END_CRITICAL_SECTION();

if (buf.obj)
PYXXH3_128_do_update(self, &buf);
if (buf.obj) {
Py_BEGIN_ALLOW_THREADS
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The XXH3_128 initialization also requires the critical section to be extended to cover the update call to ensure thread safety in free-threaded builds.

    XXH_CRITICAL_SECTION(self);
    self->seed = seed;
    XXH3_128bits_reset_withSeed(self->xxhash_state, seed);

    if (buf.obj) {
        Py_BEGIN_ALLOW_THREADS
        XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
    }
    XXH_END_CRITICAL_SECTION();

    if (buf.obj) {
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c
Comment on lines +1998 to +2000
Py_BEGIN_ALLOW_THREADS
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Missing critical section around the GIL-releasing XXH3_128bits_update call.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();

…al sections

Use Py_BEGIN_CRITICAL_SECTION / Py_END_CRITICAL_SECTION to protect
concurrent access to xxhash_state in all four hash types.

- Python >= 3.13: uses PyObject's built-in ob_mutex via critical section
- Python < 3.13: macros expand to nothing (GIL protects us)
- GIL build (3.13+): macros expand to nothing (zero overhead)
- Free-threaded build (3.13t+): real per-object locking
- Locked operations: do_update, digest, hexdigest, intdigest,
  copy, reset, dealloc, init
@ifduyue ifduyue force-pushed the duyue/free-threaded branch from ede0417 to 4eda0df Compare May 19, 2026 03:57
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 19, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for free-threaded Python (3.13+) by implementing per-object critical sections to protect internal hash states across various xxhash variants. The changes include the addition of synchronization macros and the updating of deallocation, initialization, and digest methods to ensure thread safety. Feedback from the reviewer indicates that the update methods for xxh32, xxh64, xxh3_64, and xxh3_128 are missing these critical section protections, which could lead to race conditions. Additionally, it is recommended to align TypeError messages with hashlib for better compatibility.

Comment thread src/_xxhash.c
Comment on lines +677 to +679
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The update method for xxh32 is missing the critical section protection. Since PYXXH32_do_update was removed and its logic inlined here, the per-object lock must be explicitly added. Without it, concurrent calls to update on the same object will lead to race conditions and corruption of the internal xxhash_state in free-threaded Python. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH32_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();
References
  1. For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.

Comment thread src/_xxhash.c
Comment on lines +1126 to +1128
Py_BEGIN_ALLOW_THREADS
XXH64_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The update method for xxh64 is missing critical section protection. Concurrent updates to the same hash object must be synchronized to ensure the integrity of the xxhash_state when running without the GIL. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH64_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();
References
  1. For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.

Comment thread src/_xxhash.c
Comment on lines +1574 to +1576
Py_BEGIN_ALLOW_THREADS
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The update method for xxh3_64 is missing critical section protection. This operation modifies the internal state and must be protected by the object's mutex in free-threaded builds. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();
References
  1. For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.

Comment thread src/_xxhash.c
Comment on lines +2030 to +2032
Py_BEGIN_ALLOW_THREADS
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The update method for xxh3_128 is missing critical section protection. To support free-threaded Python safely, all state-modifying operations like update must be wrapped in a critical section. Additionally, ensure that the TypeError message for unencoded strings is 'Strings must be encoded before hashing' for hashlib compatibility.

    XXH_CRITICAL_SECTION(self);
    Py_BEGIN_ALLOW_THREADS
    XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
    Py_END_ALLOW_THREADS
    XXH_END_CRITICAL_SECTION();
References
  1. For hashlib compatibility, the TypeError message for unencoded strings should be 'Strings must be encoded before hashing'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant