Skip to content

Commit cbb8652

Browse files
committed
(improvement) deserializers: use direct PyUnicode_DecodeUTF8/ASCII from C buffer pointer
Replace the two-step to_bytes(buf).decode('utf8') pattern in DesUTF8Type and DesAsciiType with direct CPython C API calls (PyUnicode_DecodeUTF8 and PyUnicode_DecodeASCII). This eliminates an intermediate bytes object allocation per text cell — the old code created a Python bytes object from the C buffer pointer via to_bytes(buf), then immediately decoded it to str and discarded the bytes. Text (UTF8Type/VarcharType) is the most common CQL column type, so this optimization applies to the majority of cells in typical workloads. Benchmark results (Cython row parsing pipeline, median times): | Scenario | Before (original) | After (direct decode) | Speedup | |---------------------------------|-------------------:|----------------------:|--------:| | UTF8 1row x 1col short (11B) | 565 ns | 454 ns | 1.24x | | UTF8 1row x 10col short | 1,594 ns | 1,023 ns | 1.56x | | UTF8 100rows x 5col medium | 61,396 ns | 28,766 ns | 2.13x | | UTF8 1000rows x 5col medium | 547,145 ns | 290,361 ns | 1.88x | | UTF8 100rows x 5col long(200B) | 57,940 ns | 35,680 ns | 1.62x | | UTF8 100rows x 5col multibyte | 125,149 ns | 103,370 ns | 1.21x | | ASCII 100rows x 5col medium | 41,608 ns | 35,817 ns | 1.16x | | ASCII 1000rows x 5col medium | 416,350 ns | 374,341 ns | 1.11x | | Mixed 100rows 3text+2int | 44,646 ns | 31,189 ns | 1.43x | All existing unit tests pass (62 type tests, 116 total across key suites).
1 parent 9c53d78 commit cbb8652

2 files changed

Lines changed: 408 additions & 3 deletions

File tree

0 commit comments

Comments
 (0)