You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[mypyc] Add native char type + codepoint fast paths for str ops
Adds a first-class `char` native type to mypyc, modeled on i64: stored
unboxed as int32 codepoint, with -1 as the empty-string sentinel, and
bidirectional str<->char promotion. Unblocks several codepoint-level
fast paths in per-char loops.
Core type plumbing:
- `MYPYC_NATIVE_CHAR_NAMES` alongside `MYPYC_NATIVE_INT_NAMES`
- str <-> char bidirectional `_promote` in semanal_classprop
- str covers char in subtypes.covers_at_runtime + overlap in meet
- `char_rprimitive` (int32, is_native_int, error_overlap=False)
- `mypy_extensions.char` stub with `.is*()`, `.upper()`, `.strip()`
Boxing / unboxing:
- `CPyChar_FromObject` (accepts 0/1-char str, -113 on type error)
- `CPyChar_ToStr` (uses interned empty-str singleton for -1)
- `bool(char)` checks `!= -1`, not `!= 0`, so "\\0" stays truthy
Codegen fast paths:
- `char == char` / `char == "x"` / `s[i] == "x"` specializers in
transform_comparison_expr compile to int compare of the codepoint
- `ord(s[i])` refactored to share the codepoint read path
- `char.isspace/isdigit/isalnum/isalpha/isidentifier/upper` method_ops
route to codepoint-taking C helpers in str_extra_ops.h
Two new IR transform passes (run after lower_ir, before dep collection):
- char_str_index_fold: folds `Unbox(CPyStr_GetItem(s, i) -> char)` to a
direct `CPyStr_GetCharAt` int32 read, avoiding the 1-char PyObject alloc
- str_buffer_hoist: for function-arg strings, hoists PyUnicode_KIND/DATA
reads out of per-char loops (strings are immutable so it's safe)
Also adds `str.isalpha()` method_op via `CPyStr_IsAlpha`.
Perf (sqlglot parse benchmarks, char vs stock mypyc):
- tpch: +91.6% (1.27ms -> 0.66ms)
- deep_arithmetic: +80.7%
- many_numbers: +26.5%
- geomean: +17.6% across 16 queries
0 commit comments