You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[mypyc] Add native char type + codepoint fast paths for str ops
Adds a first-class `char` native type to mypyc, modeled on i64: stored
unboxed as int32 codepoint, with -1 as the empty-string sentinel, and
bidirectional str<->char promotion. Unblocks codepoint-level fast paths
in per-char loops.
Core type plumbing:
- MYPYC_NATIVE_CHAR_NAMES alongside MYPYC_NATIVE_INT_NAMES
- str <-> char bidirectional _promote in semanal_classprop
- str covers char in subtypes.covers_at_runtime + overlap in meet
- char_rprimitive (int32, is_native_int, error_overlap=False)
- mypy_extensions.char stub
Boxing / unboxing:
- CPyChar_FromObject (accepts 0/1-char str, -113 on type error)
- CPyChar_ToStr (uses interned empty-str singleton for -1)
- bool(char) checks != -1, not != 0, so "\0" stays truthy
Codegen fast paths:
- try_specialize_codepoint_compare in transform_comparison_expr handles
char/char, char/s[i], char/0-or-1-char-literal, and s[i]/literal
uniformly, compiling to int compare of the codepoint
- ord(s[i]) refactored to share the codepoint read path
- char.isspace/isdigit/isalnum/isalpha/isidentifier/upper method_ops
route to codepoint-taking C helpers in str_extra_ops.h
- CPyChar_IsIdentifier delegates to PyUnicode_IsIdentifier for non-ASCII
(correct XID_Start handling rather than Py_UNICODE_ISALPHA approximation)
- CPyChar_Upper falls back to str.upper() for non-ASCII, returning the
original codepoint when upper() produces multiple chars (e.g. ß -> SS)
since char holds one codepoint
New IR transform pass (runs after lower_ir, before dep collection):
- char_str_index_fold: folds Unbox(CPyStr_GetItem(s, i) -> char) to a
direct CPyStr_GetCharAt int32 read, avoiding the 1-char PyObject alloc
Also adds str.isalpha() method_op via CPyStr_IsAlpha.
Tests:
- run-char.test covers boxing/unboxing, bool semantics (NUL is truthy,
empty is falsy), equality, classification methods (including non-ASCII
XID_Start for isidentifier), upper (including ß -> ß pinning for the
multi-char fallback), str promotion, concatenation, s[i]=="x"
specialization, ord, and astral-plane codepoints.
- char stub added to test-data/unit/lib-stub/mypy_extensions.pyi so the
test harness can resolve the type.
0 commit comments