Skip to content

Fastcall#146

Merged
ifduyue merged 12 commits intomasterfrom
fastcall
Apr 27, 2026
Merged

Fastcall#146
ifduyue merged 12 commits intomasterfrom
fastcall

Conversation

@ifduyue
Copy link
Copy Markdown
Owner

@ifduyue ifduyue commented Apr 27, 2026

No description provided.

ifduyue added 2 commits April 27, 2026 04:14
Switch 12 module-level functions from METH_VARARGS|METH_KEYWORDS to
METH_FASTCALL|METH_KEYWORDS, avoiding tuple/dict allocation per call.

Adds _get_buffer_or_str() helper to support str input (UTF-8 encoded)
in addition to bytes-like objects, matching PyArg_ParseTupleAndKeywords
"s*" behavior.

Uses PyLong_AsUnsignedLongLongMask/PyLong_AsUnsignedLong for seed
overflow handling, matching PyArg_ParseTupleAndKeywords "K"/"I"
format silent-wrapping semantics.
- Accept 'input' as keyword argument when no positional args
- Use PyLong_AsUnsignedLongMask for XXH32 seed (matching 'I' format)
- Use PyLong_AsUnsignedLongLongMask for XXH64/XXH3 seeds (matching 'K')
- Add comprehensive vectorcall argument passing tests
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the module-level hashing functions to use the METH_FASTCALL calling convention for improved performance. It introduces a helper function, _get_buffer_or_str, to handle input processing and adds a comprehensive test suite for the new argument parsing logic. Feedback focuses on several critical issues: a bug in exception handling within the buffer helper, missing error checks for integer conversions of the seed argument, and potential null pointer dereferences in the 128-bit integer digest calculation. Additionally, the manual argument parsing logic lacks standard Python validation for duplicate or unknown keyword arguments.

Comment thread src/_xxhash.c Outdated
Comment on lines +55 to +69
if (PyObject_GetBuffer(obj, buf, PyBUF_SIMPLE) == 0) {
*owner = NULL;
return 0;
}
PyErr_Clear();
if (!PyUnicode_Check(obj))
return -1;
*owner = PyUnicode_AsUTF8String(obj);
if (*owner == NULL)
return -1;
if (PyObject_GetBuffer(*owner, buf, PyBUF_SIMPLE) < 0) {
Py_DECREF(*owner);
return -1;
}
return 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of _get_buffer_or_str has a bug where it clears the Python exception too early. If PyObject_GetBuffer fails and the object is not a string, the function returns -1 with no exception set (because it was cleared at line 59). This will cause the caller to return NULL to the interpreter without an exception, triggering a SystemError. The exception should only be cleared if we are certain we are dealing with a string that needs encoding.

    if (PyObject_GetBuffer(obj, buf, PyBUF_SIMPLE) == 0) {
        *owner = NULL;
        return 0;
    }
    if (PyUnicode_Check(obj)) {
        PyErr_Clear();
        *owner = PyUnicode_AsUTF8String(obj);
        if (*owner == NULL)
            return -1;
        if (PyObject_GetBuffer(*owner, buf, PyBUF_SIMPLE) < 0) {
            Py_DECREF(*owner);
            return -1;
        }
        return 0;
    }
    return -1;

Comment thread src/_xxhash.c Outdated
Comment on lines 107 to 117
if (nargs > 1) {
seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[1]);
}
if (kwnames) {
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(kwnames); i++) {
PyObject *key = PyTuple_GET_ITEM(kwnames, i);
if (PyUnicode_CompareWithASCIIString(key, "seed") == 0) {
seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[nargs + i]);
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

PyLong_AsUnsignedLongMask (and PyLong_AsUnsignedLongLongMask in other functions) can fail and set a Python exception if the provided argument cannot be converted to an integer. Since -1 is a valid return value for these functions, you must check PyErr_Occurred() after each call and return NULL if an error is detected. Otherwise, the function will proceed with an invalid seed and eventually return a result while an exception is still set, which is incorrect.

    if (nargs > 1) {
        seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[1]);
        if (PyErr_Occurred()) return NULL;
    }
    if (kwnames) {
        for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(kwnames); i++) {
            PyObject *key = PyTuple_GET_ITEM(kwnames, i);
            if (PyUnicode_CompareWithASCIIString(key, "seed") == 0) {
                seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[nargs + i]);
                if (PyErr_Occurred()) return NULL;
            }
        }
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 662 to 671
PyObject *sixtyfour = PyLong_FromLong(64);
PyObject *low = PyLong_FromUnsignedLongLong(intdigest.low64);
PyObject *high = PyLong_FromUnsignedLongLong(intdigest.high64);
PyObject *result = PyNumber_Lshift(high, sixtyfour);
Py_DECREF(high);
high = result;
result = PyNumber_Add(high, low);
Py_DECREF(high);
Py_DECREF(low);
Py_DECREF(sixtyfour);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block performs multiple Python object allocations and operations without checking for NULL returns. If any of these fail (e.g., due to memory exhaustion), subsequent calls like PyNumber_Lshift or Py_DECREF will operate on NULL pointers, leading to a crash. Use a more robust error handling pattern with checks and proper cleanup.

    PyObject *sixtyfour = PyLong_FromLong(64);
    PyObject *low = PyLong_FromUnsignedLongLong(intdigest.low64);
    PyObject *high = PyLong_FromUnsignedLongLong(intdigest.high64);
    PyObject *result = NULL;

    if (sixtyfour && low && high) {
        PyObject *shifted = PyNumber_Lshift(high, sixtyfour);
        if (shifted) {
            result = PyNumber_Add(shifted, low);
            Py_DECREF(shifted);
        }
    }
    Py_XDECREF(high);
    Py_XDECREF(low);
    Py_XDECREF(sixtyfour);

Comment thread src/_xxhash.c Outdated
Comment on lines 84 to 117
if (nargs < 1) {
if (!kwnames) {
PyErr_SetString(PyExc_TypeError, "xxh32_digest() missing required argument 'input'");
return NULL;
}
int found = 0;
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(kwnames); i++) {
if (PyUnicode_CompareWithASCIIString(PyTuple_GET_ITEM(kwnames, i), "input") == 0) {
if (nargs + i >= 0 && _get_buffer_or_str(args[nargs + i], &buf, &buf_owner) < 0)
return NULL;
found = 1;
break;
}
}
if (!found) {
PyErr_SetString(PyExc_TypeError, "xxh32_digest() missing required argument 'input'");
return NULL;
}
} else {
if (_get_buffer_or_str(args[0], &buf, &buf_owner) < 0)
return NULL;
}

if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*|I:xxh32_digest", keywords, &buf, &seed)) {
return NULL;
if (nargs > 1) {
seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[1]);
}
if (kwnames) {
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(kwnames); i++) {
PyObject *key = PyTuple_GET_ITEM(kwnames, i);
if (PyUnicode_CompareWithASCIIString(key, "seed") == 0) {
seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[nargs + i]);
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual argument parsing for METH_FASTCALL is missing several validation checks that were previously handled by PyArg_ParseTupleAndKeywords. This is a regression in API robustness:

  1. Duplicate arguments: It doesn't detect if input or seed are provided both positionally and as keyword arguments (e.g., xxh32_digest(data, input=data)).
  2. Unknown keyword arguments: It silently ignores any unexpected keys in kwnames.
  3. Extra positional arguments: It doesn't check if nargs > 2.

While FASTCALL is intended for performance, it should still maintain standard Python argument validation behavior.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 27, 2026

Merging this PR will not alter performance

✅ 96 untouched benchmarks


Comparing fastcall (0a22795) with master (ce1cb9f)

Open in CodSpeed

ifduyue added 2 commits April 27, 2026 11:47
PyErr_Clear() is now inside the PyUnicode_Check() branch, so if
PyObject_GetBuffer fails on a non-string object, the original
TypeError propagates correctly instead of being lost.
Checking str before attempting PyObject_GetBuffer avoids a guaranteed
failure on str objects, saving a TypeError set/clear cycle. For non-str
objects that fail GetBuffer, the original TypeError propagates correctly
since no PyErr_Clear is ever called.
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the module-level digest functions to the METH_FASTCALL calling convention to improve performance. However, the manual argument parsing logic introduced is incomplete and deviates from standard Python behavior; it fails to validate unknown keywords, duplicate arguments, or extra positional parameters. Additionally, the feedback identifies missing error handling for Python API calls and inconsistencies in integer overflow behavior compared to the previous implementation. It is recommended to centralize the parsing logic and expand the test suite to cover these edge cases.

Comment thread src/_xxhash.c Outdated
Comment on lines 85 to 118
if (nargs < 1) {
if (!kwnames) {
PyErr_SetString(PyExc_TypeError, "xxh32_digest() missing required argument 'input'");
return NULL;
}
int found = 0;
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(kwnames); i++) {
if (PyUnicode_CompareWithASCIIString(PyTuple_GET_ITEM(kwnames, i), "input") == 0) {
if (nargs + i >= 0 && _get_buffer_or_str(args[nargs + i], &buf, &buf_owner) < 0)
return NULL;
found = 1;
break;
}
}
if (!found) {
PyErr_SetString(PyExc_TypeError, "xxh32_digest() missing required argument 'input'");
return NULL;
}
} else {
if (_get_buffer_or_str(args[0], &buf, &buf_owner) < 0)
return NULL;
}

if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s*|I:xxh32_digest", keywords, &buf, &seed)) {
return NULL;
if (nargs > 1) {
seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[1]);
}
if (kwnames) {
for (Py_ssize_t i = 0; i < PyTuple_GET_SIZE(kwnames); i++) {
PyObject *key = PyTuple_GET_ITEM(kwnames, i);
if (PyUnicode_CompareWithASCIIString(key, "seed") == 0) {
seed = (XXH32_hash_t)PyLong_AsUnsignedLongMask(args[nargs + i]);
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The manual argument parsing for METH_FASTCALL is incomplete and deviates from standard Python behavior in several ways:

  1. Unknown Keywords: Keyword arguments other than input or seed are silently ignored. They should raise a TypeError.
  2. Duplicate Arguments: If an argument is provided both positionally and via keyword (e.g., xxh32_digest(data, input=data)), it should raise a TypeError.
  3. Positional Argument Limit: Extra positional arguments (e.g., xxh32_digest(a, b, c)) are ignored instead of raising a TypeError.
  4. Overflow Behavior: PyLong_AsUnsignedLongMask silently masks values, whereas the previous I format in PyArg_ParseTupleAndKeywords raised an OverflowError for values exceeding 32 bits. This creates an inconsistency between the module-level functions and the object-based API.
  5. Error Handling: PyLong_AsUnsignedLongMask can fail (e.g., if passed a non-integer). You must check PyErr_Occurred() and return NULL if an exception is set.

Since this logic is duplicated across 12 functions, I recommend implementing a shared internal helper function to handle argument parsing correctly and consistently.

Comment thread src/_xxhash.c Outdated
Comment on lines 663 to 672
PyObject *sixtyfour = PyLong_FromLong(64);
PyObject *low = PyLong_FromUnsignedLongLong(intdigest.low64);
PyObject *high = PyLong_FromUnsignedLongLong(intdigest.high64);
PyObject *result = PyNumber_Lshift(high, sixtyfour);
Py_DECREF(high);
high = result;
result = PyNumber_Add(high, low);
Py_DECREF(high);
Py_DECREF(low);
Py_DECREF(sixtyfour);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block lacks error handling for Python API calls. PyLong_FromLong, PyLong_FromUnsignedLongLong, PyNumber_Lshift, and PyNumber_Add can all return NULL on failure (e.g., out of memory). Passing NULL to subsequent calls or to Py_DECREF will cause a crash.

I suggest adding checks for NULL and using Py_XDECREF for safe cleanup.

    PyObject *sixtyfour = PyLong_FromLong(64);
    PyObject *low = PyLong_FromUnsignedLongLong(intdigest.low64);
    PyObject *high = PyLong_FromUnsignedLongLong(intdigest.high64);
    PyObject *result = NULL;
    if (sixtyfour && low && high) {
        PyObject *shifted = PyNumber_Lshift(high, sixtyfour);
        if (shifted) {
            result = PyNumber_Add(shifted, low);
            Py_DECREF(shifted);
        }
    }
    Py_XDECREF(high);
    Py_XDECREF(low);
    Py_XDECREF(sixtyfour);

Comment thread tests/test_fastcall.py Outdated
Comment on lines +109 to +113
def test_missing_input(self):
for a in ('xxh32', 'xxh64', 'xxh3_64', 'xxh3_128'):
for fn in self._funcs(a):
with self.assertRaises(TypeError):
fn()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tests should be expanded to cover edge cases in argument parsing. Specifically, verify that:

  1. Unknown keyword arguments raise TypeError.
  2. Duplicate arguments (positional + keyword) raise TypeError.
  3. Too many positional arguments raise TypeError.
  4. Invalid types for the seed argument raise an appropriate exception.

ifduyue added 5 commits April 27, 2026 12:00
PyLong_AsUnsignedLongMask and PyLong_AsUnsignedLongLongMask return -1
(which is a valid seed value) when the argument cannot be converted to
an integer. Without checking PyErr_Occurred(), the function would
proceed with seed=UINT_MAX/ULLONG_MAX and return a result while an
exception is still set, causing a SystemError.
Replaces 12 duplicated argument-parsing blocks with a single shared
helper that correctly handles:
- Unknown keywords (raises TypeError instead of silently ignoring)
- Duplicate arguments (positional + keyword for same param)
- Too many positional arguments
- Proper error cleanup on failure (PyBuffer_Release + Py_XDECREF)

Uses goto for clean error-path resource release.
PyLong_FromLong, PyLong_FromUnsignedLongLong, PyNumber_Lshift, and
PyNumber_Add can all return NULL on failure. Passing NULL to subsequent
calls or to Py_DECREF would crash.

Use Py_XDECREF for safe cleanup and guard all intermediate results.
Fixes both module-level xxh3_128_intdigest and PYXXH3_128_intdigest.
Cover unknown keywords, duplicate arguments, too many positional args,
and invalid seed types across all four hash algorithms.
- Track seed_found in _parse_fastcall_args to detect duplicate seed
  passed both positionally and via keyword (TypeError)
- Rewrite test_fastcall.py: 32 tests covering normal cases, error
  cases (unknown kw, duplicate args, missing input, invalid types),
  and seed overflow wrapping across all four algorithms
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates module-level hash functions to the METH_FASTCALL calling convention and introduces a unified argument parsing helper. Feedback notes that using PyLong_AsUnsignedLongLongMask for seeds introduces a breaking change by silently masking overflows instead of raising an OverflowError. Additionally, an optimization was suggested to use a static reference for the constant integer 64 in 128-bit intdigest calculations to improve performance.

Comment thread src/_xxhash.c
input_found = 1;
}
if (nargs >= 2) {
*seed = PyLong_AsUnsignedLongLongMask(args[1]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of PyLong_AsUnsignedLongLongMask changes the behavior for seed values that exceed the range of unsigned long long. Previously, PyArg_ParseTupleAndKeywords with the I or K format would raise an OverflowError. Now, these values will be silently masked. While the new tests in tests/test_fastcall.py suggest this wrapping behavior is intended, it is a breaking change from the previous implementation. If strict range checking is desired, consider using PyLong_AsUnsignedLongLong and handling the potential OverflowError.

Comment thread src/_xxhash.c

Py_XDECREF(buf_owner);

PyObject *sixtyfour = PyLong_FromLong(64);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new Python long object for the constant 64 on every call to intdigest is slightly inefficient. While PyLong_FromLong is fast for small integers in modern Python versions, for a performance-oriented 'Fastcall' implementation, you might consider using a static reference or a more direct way to construct the 128-bit integer if performance becomes a bottleneck.

    static PyObject *sixtyfour = NULL;
    if (sixtyfour == NULL) {
        sixtyfour = PyLong_FromLong(64);
    }
    PyObject *low = PyLong_FromUnsignedLongLong(intdigest.low64);
    PyObject *high = PyLong_FromUnsignedLongLong(intdigest.high64);

Py_BuildValue('I'/'K', val) parses a format string on every call.
PyLong_FromUnsignedLong/LongLong constructs the integer directly,
saving the format parse overhead. 10-21% faster for intdigest functions.
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

/gemini review

Ensures inlining across all optimization levels and compilers,
matching CPython's own convention for performance-critical
local helpers.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements METH_FASTCALL for module-level hash functions in _xxhash.c to improve performance by bypassing the overhead of PyArg_ParseTupleAndKeywords. It introduces internal helper functions _get_buffer_or_str and _parse_fastcall_args to handle positional and keyword arguments efficiently. The changes also include refactoring integer digest and seed retrieval to use direct PyLong creation functions and adding a comprehensive test suite in tests/test_fastcall.py to verify argument parsing, error handling, and seed wrapping behavior. I have no feedback to provide.

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented Apr 27, 2026

Call Before (ns) After (ns) Δ
xxh32_intdigest(b'hello') 170 82 -52%
xxh64_intdigest(b'hello') 170 86 -49%
xxh3_64_intdigest(b'hello') 165 82 -50%
xxh3_128_intdigest(b'hello') 267 195 -27%
xxh32_intdigest(b'hello', seed=42) 338 145 -57%
xxh64_intdigest(b'hello', seed=42) 341 151 -56%
xxh32_intdigest('hello') 170 121 -29%
xxh3_64_intdigest(b'x'*4096) 425 338 -20%
xxh32_hexdigest(b'hello') 174 112 -36%
xxh64_hexdigest(b'hello') 161 106 -34%

Key contributors:

  • METH_FASTCALL → eliminates tuple/dict allocation per call (largest win)
  • Py_BuildValue → PyLong_FromUnsigned* → skips format string parse
  • Py_ALWAYS_INLINE on arg-parsing helpers → ensures no call overhead

Py_ALWAYS_INLINE was added in Python 3.12. On older versions
(3.9-3.11) it is not defined, causing a compile error.
@ifduyue ifduyue merged commit b3466b5 into master Apr 27, 2026
43 checks passed
@ifduyue ifduyue deleted the fastcall branch April 27, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant