Skip to content

perf: METH_FASTCALL, tp_vectorcall, and related optimizations#147

Merged
ifduyue merged 9 commits intomasterfrom
vectorcall
Apr 27, 2026
Merged

perf: METH_FASTCALL, tp_vectorcall, and related optimizations#147
ifduyue merged 9 commits intomasterfrom
vectorcall

Conversation

@ifduyue
Copy link
Copy Markdown
Owner

@ifduyue ifduyue commented Apr 27, 2026

Summary

METH_FASTCALL for 12 module-level one-shot functions

  • 33-67% faster for small inputs (5B); 2-24% faster at 5MB

tp_vectorcall for 4 type constructors

  • 16-55% faster construction

Other optimizations

  • PyLong_FromUnsigned* replaces Py_BuildValue (avoids format string parse)
  • Py_ALWAYS_INLINE on internal arg-parsing helpers

Features

  • Comprehensive argument passing: positional, keyword, str, buffer types
  • Proper error handling: unknown keywords, duplicate arguments, too many positional args
  • Shared _parse_fastcall_args helper with consistent error reporting

Tests

  • 32 fastcall argument-passing tests covering normal + error cases
  • Updated benchmark suite with small-input micro-benchmarks

Bug fixes

  • Memory leaks in _new and _copy when createState fails
  • Missing PyErr_NoMemory() on allocation failures
  • const qualifier mismatch on PyModuleDef.slots
  • PYXXH3_128_digest docstring (8-byte → 16-byte)
  • Seed overflow handling with PyLong_AsUnsignedLongLongMask
  • GIL release in vectorcall update path
  • NULL checks for Python API returns in 128-bit math
  • Buffer leak on tp_alloc failure in vectorcall functions

Vectorcall bypasses tp_new→tp_init Python-level dispatch,
parsing arguments directly in C. Makes the _parse_fastcall_args
helper accept an input_required parameter so type constructors
can accept optional input while module functions require it.

Type constructor speedup: 41-66% depending on arguments.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements vectorcall support for the xxh32, xxh64, xxh3_64, and xxh3_128 hash types in the _xxhash C extension, which improves performance for function calls. It also refactors the internal _parse_fastcall_args function to handle optional input arguments. The review feedback suggests releasing the Global Interpreter Lock (GIL) using Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS during the initial hash update within the new vectorcall implementations to ensure consistent multi-threaded performance across the library.

Comment thread src/_xxhash.c Outdated
Comment on lines +524 to +527
if (buf.buf) {
XXH32_update(self->xxhash_state, buf.buf, buf.len);
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The XXH32_update call should be wrapped in Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS to release the GIL. This is consistent with the existing PYXXH32_do_update helper and is important for performance when hashing large buffers in multi-threaded environments, which is a primary use case for this library.

    if (buf.buf) {
        Py_BEGIN_ALLOW_THREADS
        XXH32_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +885 to +888
if (buf.buf) {
XXH64_update(self->xxhash_state, buf.buf, buf.len);
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The XXH64_update call should be wrapped in Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS to release the GIL, maintaining consistency with the PYXXH64_do_update implementation and ensuring multi-threaded performance for large inputs.

    if (buf.buf) {
        Py_BEGIN_ALLOW_THREADS
        XXH64_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +1242 to +1245
if (buf.buf) {
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The XXH3_64bits_update call should be wrapped in Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS to release the GIL, as seen in PYXXH3_64_do_update.

    if (buf.buf) {
        Py_BEGIN_ALLOW_THREADS
        XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
        PyBuffer_Release(&buf);
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +1608 to +1611
if (buf.buf) {
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The XXH3_128bits_update call should be wrapped in Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS to release the GIL, consistent with PYXXH3_128_do_update.

    if (buf.buf) {
        Py_BEGIN_ALLOW_THREADS
        XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
        Py_END_ALLOW_THREADS
        PyBuffer_Release(&buf);
    }

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 27, 2026

Merging this PR will not alter performance

✅ 10 untouched benchmarks
🆕 24 new benchmarks
⏩ 86 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 test_xxh32_ctor_empty N/A 84.2 µs N/A
🆕 test_xxh32_ctor N/A 85.9 µs N/A
🆕 test_xxh32_ctor_seed N/A 86.8 µs N/A
🆕 test_xxh3_64_intdigest_5b_seed_kw N/A 85.6 µs N/A
🆕 test_xxh32_hexdigest_5b N/A 85.1 µs N/A
🆕 test_xxh64_intdigest_5b N/A 84.8 µs N/A
🆕 test_xxh64_stream_intdigest_2mb N/A 1.8 ms N/A
🆕 test_xxh64_ctor N/A 86.7 µs N/A
🆕 test_xxh64_intdigest_str N/A 86 µs N/A
🆕 test_xxh32_intdigest_5b_seed_kw N/A 85.5 µs N/A
🆕 test_xxh32_intdigest_5b_seed N/A 87.1 µs N/A
🆕 test_xxh3_128_ctor N/A 87.8 µs N/A
🆕 test_xxh32_intdigest_str N/A 85.4 µs N/A
🆕 test_xxh3_128_intdigest_5b N/A 86.9 µs N/A
🆕 test_xxh3_128_intdigest_str N/A 87 µs N/A
🆕 test_xxh3_64_ctor N/A 87.5 µs N/A
🆕 test_xxh3_64_stream_intdigest_2mb N/A 1.6 ms N/A
🆕 test_xxh3_64_intdigest_5b N/A 84.7 µs N/A
🆕 test_xxh3_64_intdigest_str N/A 85.3 µs N/A
🆕 test_xxh64_hexdigest_5b N/A 86.5 µs N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing vectorcall (a974b6b) with master (1585d2b)

Open in CodSpeed

Footnotes

  1. 86 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

ifduyue added 2 commits April 27, 2026 13:28
Wrap XXH*_update calls in Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS
in all four tp_vectorcall functions, consistent with the existing
PYXXH*_do_update helpers. Important for multi-threaded hashing of
large buffers.
Replaces large-buffer-only benchmarks with a mix that captures
the recent optimizations:
- 5-byte inputs (call overhead dominates, 25-60% improvement)
- str input (tests _get_buffer_or_str UTF-8 path)
- type constructors (tests tp_vectorcall, 41-66% improvement)
- keyword seed passing
- Retains 1KB/10KB macro benchmarks for covering hash throughput
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements vectorcall support for the xxhash hash types (xxh32, xxh64, xxh3_64, and xxh3_128) to optimize performance for small inputs and constructor calls. It also refactors the internal _parse_fastcall_args function to handle optional input arguments and updates the benchmark suite to include micro-benchmarks for these new paths. Review feedback suggests refining error handling in the new vectorcall implementations by returning NULL directly when tp_alloc fails, rather than calling PyErr_NoMemory(), to avoid potentially overwriting existing exceptions.

Comment thread src/_xxhash.c Outdated
if (buf.buf)
PyBuffer_Release(&buf);
Py_XDECREF(buf_owner);
return PyErr_NoMemory();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In PYXXH32_vectorcall, if tp_alloc fails, it typically sets a MemoryError. Calling PyErr_NoMemory() again is redundant but safe. However, if tp_alloc were to fail for a reason other than memory exhaustion (though unlikely for this type), PyErr_NoMemory() would overwrite the actual error. It is generally better to return NULL directly if an error is already set, or only call PyErr_NoMemory() if XXH32_createState() fails (as it doesn't set a Python exception).

    return NULL;

Comment thread src/_xxhash.c Outdated
if (buf.buf)
PyBuffer_Release(&buf);
Py_XDECREF(buf_owner);
return PyErr_NoMemory();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to PYXXH32_vectorcall, if tp_alloc fails, it already sets an exception. Returning NULL directly is sufficient. PyErr_NoMemory() should only be used when a C-level allocation (like XXH64_createState) fails without setting a Python exception.

    return NULL;

Comment thread src/_xxhash.c Outdated
if (buf.buf)
PyBuffer_Release(&buf);
Py_XDECREF(buf_owner);
return PyErr_NoMemory();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consistent with other vectorcall implementations in this file, consider returning NULL directly if tp_alloc failed, as it already sets the appropriate exception.

    return NULL;

Comment thread src/_xxhash.c Outdated
if (buf.buf)
PyBuffer_Release(&buf);
Py_XDECREF(buf_owner);
return PyErr_NoMemory();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consistent with other vectorcall implementations in this file, consider returning NULL directly if tp_alloc failed, as it already sets the appropriate exception.

    return NULL;

tp_alloc already sets MemoryError (or another exception) on failure.
Calling PyErr_NoMemory() in the shared error path would overwrite it.
Now return NULL directly for tp_alloc failure and only call
PyErr_NoMemory() for XXH*_createState failure (which sets no exception).
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the tp_vectorcall protocol for the xxh32, xxh64, xxh3_64, and xxh3_128 types to optimize constructor calls and reduce overhead. It also refactors internal argument parsing to support optional input requirements and adds new micro-benchmarks to measure call performance. Feedback highlights critical resource leaks in the new vectorcall functions where buffers and owners are not released if memory allocation fails. Additionally, it is recommended to add type checks to the vectorcall implementations to ensure that Python-level subclass overrides (like __init__) are not bypassed.

Comment thread src/_xxhash.c Outdated
Comment on lines +514 to +515
if (self == NULL)
return NULL;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If tp_alloc fails, the function returns NULL without releasing the buffer or the buffer owner acquired by _parse_fastcall_args. This results in a memory and resource leak.

    if (self == NULL) {
        PyBuffer_Release(&buf);
        Py_XDECREF(buf_owner);
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +874 to +875
if (self == NULL)
return NULL;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If tp_alloc fails, the function returns NULL without releasing the buffer or the buffer owner acquired by _parse_fastcall_args. This results in a memory and resource leak.

    if (self == NULL) {
        PyBuffer_Release(&buf);
        Py_XDECREF(buf_owner);
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +1230 to +1231
if (self == NULL)
return NULL;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If tp_alloc fails, the function returns NULL without releasing the buffer or the buffer owner acquired by _parse_fastcall_args. This results in a memory and resource leak.

    if (self == NULL) {
        PyBuffer_Release(&buf);
        Py_XDECREF(buf_owner);
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +1595 to +1596
if (self == NULL)
return NULL;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If tp_alloc fails, the function returns NULL without releasing the buffer or the buffer owner acquired by _parse_fastcall_args. This results in a memory and resource leak.

    if (self == NULL) {
        PyBuffer_Release(&buf);
        Py_XDECREF(buf_owner);
        return NULL;
    }

Comment thread src/_xxhash.c
Comment on lines +498 to +500
PYXXH32_vectorcall(PyObject *type, PyObject *const *args,
size_t nargsf, PyObject *kwnames)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Implementing tp_vectorcall on a type object effectively replaces the standard tp_new and tp_init sequence for constructors. However, this implementation bypasses Python-level overrides in subclasses. If a user subclasses xxh32 in Python and defines a custom __init__, it will be ignored when the class is called via the vectorcall path (which is the default in modern Python). To maintain compatibility with subclasses, you should check if type is exactly &PYXXH32Type and fall back to the standard call path if it is not.

Comment thread src/_xxhash.c
Comment on lines +858 to +860
PYXXH64_vectorcall(PyObject *type, PyObject *const *args,
size_t nargsf, PyObject *kwnames)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to PYXXH32_vectorcall, this implementation bypasses Python-level __init__ overrides in subclasses. Consider adding a check to ensure type == &PYXXH64Type before proceeding with the fast path.

Comment thread src/_xxhash.c
Comment on lines +1214 to +1216
PYXXH3_64_vectorcall(PyObject *type, PyObject *const *args,
size_t nargsf, PyObject *kwnames)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to PYXXH32_vectorcall, this implementation bypasses Python-level __init__ overrides in subclasses. Consider adding a check to ensure type == &PYXXH3_64Type before proceeding with the fast path.

Comment thread src/_xxhash.c
Comment on lines +1579 to +1581
PYXXH3_128_vectorcall(PyObject *type, PyObject *const *args,
size_t nargsf, PyObject *kwnames)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to PYXXH32_vectorcall, this implementation bypasses Python-level __init__ overrides in subclasses. Consider adding a check to ensure type == &PYXXH3_128Type before proceeding with the fast path.

When _parse_fastcall_args succeeds but tp_alloc subsequently fails,
the acquired buffer was leaked. Now release buf/buf_owner before
returning NULL in all four vectorcall functions.
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements vectorcall support for the xxh32, xxh64, xxh3_64, and xxh3_128 hash types to optimize performance. It refactors _parse_fastcall_args to support optional input arguments and updates the benchmark suite to focus on micro-benchmarks and constructor performance. The reviewer suggested using existing do_update helper functions within the new vectorcall implementations to reduce code duplication.

Comment thread src/_xxhash.c Outdated
Comment on lines +530 to +535
if (buf.buf) {
Py_BEGIN_ALLOW_THREADS
XXH32_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic is identical to the existing PYXXH32_do_update helper function. To improve maintainability and reduce code duplication, you should use the helper here. This applies to all four vectorcall implementations added in this PR.

    if (buf.buf) {
        PYXXH32_do_update(self, &buf);
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +893 to +898
if (buf.buf) {
Py_BEGIN_ALLOW_THREADS
XXH64_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using the PYXXH64_do_update helper function here to reduce code duplication.

    if (buf.buf) {
        PYXXH64_do_update(self, &buf);
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +1252 to +1257
if (buf.buf) {
Py_BEGIN_ALLOW_THREADS
XXH3_64bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using the PYXXH3_64_do_update helper function here to reduce code duplication.

    if (buf.buf) {
        PYXXH3_64_do_update(self, &buf);
    }

Comment thread src/_xxhash.c Outdated
Comment on lines +1620 to +1625
if (buf.buf) {
Py_BEGIN_ALLOW_THREADS
XXH3_128bits_update(self->xxhash_state, buf.buf, buf.len);
Py_END_ALLOW_THREADS
PyBuffer_Release(&buf);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using the PYXXH3_128_do_update helper function here to reduce code duplication.

    if (buf.buf) {
        PYXXH3_128_do_update(self, &buf);
    }

Replace duplicated Py_BEGIN_ALLOW_THREADS/update/Py_END_ALLOW_THREADS/
PyBuffer_Release blocks with calls to existing PYXXH*_do_update helpers.
Reduces code duplication across all four vectorcall implementations.
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Python vectorcall protocol for xxhash hash objects (xxh32, xxh64, xxh3_64, and xxh3_128) to optimize performance for small inputs. It also refactors the benchmark suite to include micro-benchmarks for tiny inputs, string encoding paths, and constructors. Feedback was provided regarding the buffer release logic in the new vectorcall implementations, suggesting that checking buf.obj instead of buf.buf is a safer way to ensure PyBuffer_Release is called correctly, especially for empty inputs.

Comment thread src/_xxhash.c Outdated
Comment on lines +530 to +531
if (buf.buf)
PYXXH32_do_update(self, &buf);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check if (buf.buf) to decide whether to call PYXXH32_do_update (which releases the buffer) might be insufficient. If _parse_fastcall_args successfully acquired a buffer (e.g., for an empty input where buf.len == 0), buf.buf might be non-NULL but buf.len would be 0. More importantly, if PyObject_GetBuffer succeeded, buf.obj will be non-NULL. To avoid potential memory leaks or inconsistent state, it is safer to check if (buf.obj) or simply always call PYXXH32_do_update if the input argument was found, as XXH32_update handles zero-length inputs correctly and PyBuffer_Release must be called if a buffer was acquired.

    if (buf.obj)
        PYXXH32_do_update(self, &buf);

@ifduyue ifduyue changed the title tp_vectorcall perf: METH_FASTCALL, tp_vectorcall, and related optimizations Apr 27, 2026
buf.obj is set by PyObject_GetBuffer when a buffer is successfully
acquired, making it a more semantically clear check than buf.buf.
Initialize buf->obj = NULL in _parse_fastcall_args to match.
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements vectorcall support for xxhash hash objects in the C extension, optimizing object creation and initial hashing. It also refactors the benchmark suite to include micro-benchmarks for call overhead and string inputs. Feedback suggests adding a type check within the vectorcall implementations to ensure they only execute for the intended types, falling back to the standard calling mechanism otherwise to safely support potential future subclassing.

Comment thread src/_xxhash.c
Comment on lines +508 to +511
if (_parse_fastcall_args(args, nargs, kwnames, "xxhash.xxh32()", 0,
&buf, &buf_owner, &raw_seed) < 0)
return NULL;
seed = (XXH32_hash_t)raw_seed;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PYXXH32_vectorcall implementation (and others like it) bypasses the standard tp_new and tp_init mechanism. While xxhash.xxh32 is not currently subclassable (it lacks Py_TPFLAGS_BASETYPE), it is best practice to include a check to ensure the type argument is exactly &PYXXH32Type. If it is not, the function should fall back to the default calling mechanism to avoid breaking potential future subclassing or unexpected behavior if the flag is ever added.

    if (type != (PyObject *)&PYXXH32Type) {
        return PyType_Type.tp_call(type, PyTuple_New(0), NULL); // Simplified fallback
    }

    if (_parse_fastcall_args(args, nargs, kwnames, "xxhash.xxh32()", 0,
                             &buf, &buf_owner, &raw_seed) < 0)
        return NULL;

Covers one-shot intdigest/hexdigest and streaming constructors
across all four algorithms at 2MB where hashing dominates.
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements tp_vectorcall for the xxhash types (XXH32, XXH64, XXH3_64, and XXH3_128) to optimize object creation and performance. It also refactors the internal _parse_fastcall_args function to handle optional input arguments and updates the benchmark suite to measure micro-performance improvements. I have no feedback to provide.

@ifduyue ifduyue merged commit b24c873 into master Apr 27, 2026
43 checks passed
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements vectorcall support for xxhash hash objects, aiming to reduce overhead during object creation. It also refactors internal argument parsing to support optional inputs and ensures safer buffer handling by initializing buf->obj. The benchmark suite has been updated to better measure call overhead and string processing. Feedback was provided regarding the use of tp_alloc in the vectorcall constructors, suggesting a potential simplification using PyObject_New if the performance gains from direct allocation are not significant.

Comment thread src/_xxhash.c
Comment on lines +513 to +514
PYXXH32Object *self = (PYXXH32Object *)
((PyTypeObject *)type)->tp_alloc((PyTypeObject *)type, 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of tp_alloc directly in the vectorcall constructor is a valid optimization, but ensure that the type does not have a custom tp_new that performs essential setup not covered here. In this case, PYXXH32_new only performs state creation and reset, which are duplicated in this vectorcall implementation, so it is safe. However, for consistency with the rest of the codebase which uses PyObject_New, you might consider if tp_alloc is strictly necessary for the performance gain or if PyObject_New (which eventually calls tp_alloc) would be cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant