Commit cbb8652
committed
(improvement) deserializers: use direct PyUnicode_DecodeUTF8/ASCII from C buffer pointer
Replace the two-step to_bytes(buf).decode('utf8') pattern in DesUTF8Type and
DesAsciiType with direct CPython C API calls (PyUnicode_DecodeUTF8 and
PyUnicode_DecodeASCII). This eliminates an intermediate bytes object
allocation per text cell — the old code created a Python bytes object from
the C buffer pointer via to_bytes(buf), then immediately decoded it to str
and discarded the bytes.
Text (UTF8Type/VarcharType) is the most common CQL column type, so this
optimization applies to the majority of cells in typical workloads.
Benchmark results (Cython row parsing pipeline, median times):
| Scenario | Before (original) | After (direct decode) | Speedup |
|---------------------------------|-------------------:|----------------------:|--------:|
| UTF8 1row x 1col short (11B) | 565 ns | 454 ns | 1.24x |
| UTF8 1row x 10col short | 1,594 ns | 1,023 ns | 1.56x |
| UTF8 100rows x 5col medium | 61,396 ns | 28,766 ns | 2.13x |
| UTF8 1000rows x 5col medium | 547,145 ns | 290,361 ns | 1.88x |
| UTF8 100rows x 5col long(200B) | 57,940 ns | 35,680 ns | 1.62x |
| UTF8 100rows x 5col multibyte | 125,149 ns | 103,370 ns | 1.21x |
| ASCII 100rows x 5col medium | 41,608 ns | 35,817 ns | 1.16x |
| ASCII 1000rows x 5col medium | 416,350 ns | 374,341 ns | 1.11x |
| Mixed 100rows 3text+2int | 44,646 ns | 31,189 ns | 1.43x |
All existing unit tests pass (62 type tests, 116 total across key suites).1 parent 9c53d78 commit cbb8652
2 files changed
Lines changed: 408 additions & 3 deletions
0 commit comments