Skip to content

Commit 76078c0

Browse files
committed
[mypyc] Make separate=True compilation work for real-world projects (sqlglot)
This is the minimal set of fixes needed for `separate=True` to build and run correctly against sqlglot, a ~100-module project with cross-group class inheritance, generator helper classes, non-ext subclasses with fast methods, and mutually-dependent compiled modules. Each of the fixes below is a real bug that was never hit by mypy itself (mypy's setup.py uses multi_file on Windows only, never separate=True) or by the toy fixtures in mypyc's TestRunSeparate. 1. Non-extension classes never have vtables -- short-circuit is_method_final to True for them so codegen doesn't try to index into a vtable that compute_vtable skipped. 2. emit_method_call: under separate=True, a method's FuncIR body may live in another group while only its FuncDecl is visible here. Use method_decl(name) instead of get_method(name).decl -- the decl is enough to emit a direct C call. Split native_function_type to accept a decl too. 3. Cross-group native/Python-wrapper calls weren't routing through the exports-table indirection at a dozen sites in emitwrapper / emitfunc / emitclass. Added Emitter.native_function_call(decl) and Emitter.wrapper_function_call(decl) helpers and migrated all offending sites. Also made CPyPy_* wrapper declarations needs_export=True so those symbols reach the exports table. 4. Defer cross-group imports to shim load time. The shared lib's exec_ function used to PyImport_ImportModule sibling groups at PyInit time, which re-enters the enclosing package's __init__.py mid-flight and blows up on partial-init attribute walks. Split exec_ into a self-contained capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>() (runs from the shim just before per-module init). Shim uses PyImport_ImportModuleLevel with a non-empty fromlist so the lookup returns the leaf directly via sys.modules, and fetches capsules via PyObject_GetAttrString instead of PyCapsule_Import (which itself performs the same dotted attribute walk). 5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried PyObject_GetItem(module, fullname) where it intended PyImport_GetModule (comment says as much). Modules don't implement __getitem__, so the fallback always raised TypeError. Also Py_XDECREF the potentially-NULL package_path in the error path. 6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded SCCs can resolve cross-SCC references. load_type_map tolerates mypy's synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no corresponding mypyc ClassIR. Also adds three regression tests targeted to fail on TestRunSeparate without the fixes above: - testSeparateCrossGroupEnumMethod exercises fix #1. - testSeparateCrossGroupGenerator exercises fix #2. - testSeparateCrossGroupInheritedInit exercises fix #3.
1 parent 2b136f0 commit 76078c0

10 files changed

Lines changed: 275 additions & 62 deletions

File tree

mypyc/codegen/emit.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,23 @@ def c_error_value(self, rtype: RType) -> str:
369369
def native_function_name(self, fn: FuncDecl) -> str:
370370
return f"{NATIVE_PREFIX}{fn.cname(self.names)}"
371371

372+
def native_function_call(self, fn: FuncDecl) -> str:
373+
"""Return the C expression for a call to `fn`'s native (CPyDef_) entry.
374+
375+
For cross-group references under `separate=True`, this prepends the
376+
exports-table indirection (e.g. `exports_other.CPyDef_foo`). Same as
377+
`native_function_name()` for in-group calls.
378+
"""
379+
return f"{self.get_group_prefix(fn)}{NATIVE_PREFIX}{fn.cname(self.names)}"
380+
381+
def wrapper_function_call(self, fn: FuncDecl) -> str:
382+
"""Return the C expression for a call to `fn`'s Python-wrapper (CPyPy_) entry.
383+
384+
Like `native_function_call`, but for the PyObject-level wrapper that
385+
boxes/unboxes arguments. Used from slot generators (tp_init, etc.).
386+
"""
387+
return f"{self.get_group_prefix(fn)}{PREFIX}{fn.cname(self.names)}"
388+
372389
def tuple_c_declaration(self, rtuple: RTuple) -> list[str]:
373390
result = [
374391
f"#ifndef MYPYC_DECLARED_{rtuple.struct_name}",

mypyc/codegen/emitclass.py

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -705,11 +705,15 @@ def emit_null_check() -> None:
705705
emitter.emit_line(f"PyObject *self = {setup_name}({type_arg});")
706706
emit_null_check()
707707
return
708-
prefix = emitter.get_group_prefix(new_fn.decl) + NATIVE_PREFIX if native_prefix else PREFIX
708+
call = (
709+
emitter.native_function_call(new_fn.decl)
710+
if native_prefix
711+
else emitter.wrapper_function_call(new_fn.decl)
712+
)
709713
all_args = type_arg
710714
if new_args != "":
711715
all_args += ", " + new_args
712-
emitter.emit_line(f"PyObject *self = {prefix}{new_fn.cname(emitter.names)}({all_args});")
716+
emitter.emit_line(f"PyObject *self = {call}({all_args});")
713717
emit_null_check()
714718

715719
# skip __init__ if __new__ returns some other type
@@ -743,17 +747,13 @@ def generate_constructor_for_class(
743747

744748
args = ", ".join(["self"] + fn_args)
745749
if init_fn is not None:
746-
prefix = PREFIX if use_wrapper else NATIVE_PREFIX
747-
cast = "!= NULL ? 0 : -1" if use_wrapper else ""
748-
emitter.emit_line(
749-
"char res = {}{}{}({}){};".format(
750-
emitter.get_group_prefix(init_fn.decl),
751-
prefix,
752-
init_fn.cname(emitter.names),
753-
args,
754-
cast,
755-
)
750+
call = (
751+
emitter.wrapper_function_call(init_fn.decl)
752+
if use_wrapper
753+
else emitter.native_function_call(init_fn.decl)
756754
)
755+
cast = "!= NULL ? 0 : -1" if use_wrapper else ""
756+
emitter.emit_line(f"char res = {call}({args}){cast};")
757757
emitter.emit_line("if (res == 2) {")
758758
emitter.emit_line("Py_DECREF(self);")
759759
emitter.emit_line("return NULL;")
@@ -786,9 +786,8 @@ def generate_init_for_class(cl: ClassIR, init_fn: FuncIR, emitter: Emitter) -> s
786786
emitter.emit_line("{")
787787
if cl.allow_interpreted_subclasses or cl.builtin_base or cl.has_method("__new__"):
788788
emitter.emit_line(
789-
"return {}{}(self, args, kwds) != NULL ? 0 : -1;".format(
790-
PREFIX, init_fn.cname(emitter.names)
791-
)
789+
f"return {emitter.wrapper_function_call(init_fn.decl)}"
790+
"(self, args, kwds) != NULL ? 0 : -1;"
792791
)
793792
else:
794793
emitter.emit_line("return 0;")
@@ -834,7 +833,7 @@ def generate_new_for_class(
834833
# can enforce that instances are always properly initialized. This
835834
# is needed to support always defined attributes.
836835
emitter.emit_line(
837-
f"PyObject *ret = {PREFIX}{init_fn.cname(emitter.names)}(self, args, kwds);"
836+
f"PyObject *ret = {emitter.wrapper_function_call(init_fn.decl)}(self, args, kwds);"
838837
)
839838
emitter.emit_lines("if (ret == NULL) {", " Py_DECREF(self);", " return NULL;", "}")
840839
emitter.emit_line("Py_DECREF(ret);")

mypyc/codegen/emitfunc.py

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,12 @@
8888

8989

9090
def native_function_type(fn: FuncIR, emitter: Emitter) -> str:
91-
args = ", ".join(emitter.ctype(arg.type) for arg in fn.args) or "void"
92-
ret = emitter.ctype(fn.ret_type)
91+
return native_function_type_from_decl(fn.decl, emitter)
92+
93+
94+
def native_function_type_from_decl(decl: FuncDecl, emitter: Emitter) -> str:
95+
args = ", ".join(emitter.ctype(arg.type) for arg in decl.sig.args) or "void"
96+
ret = emitter.ctype(decl.sig.ret_type)
9397
return f"{ret} (*)({args})"
9498

9599

@@ -579,8 +583,11 @@ def emit_method_call(self, dest: str, op_obj: Value, name: str, op_args: list[Va
579583
rtype = op_obj.type
580584
assert isinstance(rtype, RInstance), rtype
581585
class_ir = rtype.class_ir
582-
method = rtype.class_ir.get_method(name)
583-
assert method is not None
586+
# Use method_decl (not get_method) because under separate compilation the
587+
# FuncIR body may live in a different group — only its declaration is
588+
# visible here, and a decl is all we need to emit a direct C call
589+
# (the symbol resolves through that group's exports table).
590+
method_decl = rtype.class_ir.method_decl(name)
584591

585592
# Can we call the method directly, bypassing vtable?
586593
is_direct = class_ir.is_method_final(name)
@@ -589,16 +596,15 @@ def emit_method_call(self, dest: str, op_obj: Value, name: str, op_args: list[Va
589596
# turned into the class for class methods
590597
obj_args = (
591598
[]
592-
if method.decl.kind == FUNC_STATICMETHOD
593-
else [f"(PyObject *)Py_TYPE({obj})"] if method.decl.kind == FUNC_CLASSMETHOD else [obj]
599+
if method_decl.kind == FUNC_STATICMETHOD
600+
else [f"(PyObject *)Py_TYPE({obj})"] if method_decl.kind == FUNC_CLASSMETHOD else [obj]
594601
)
595602
args = ", ".join(obj_args + [self.reg(arg) for arg in op_args])
596-
mtype = native_function_type(method, self.emitter)
603+
mtype = native_function_type_from_decl(method_decl, self.emitter)
597604
version = "_TRAIT" if rtype.class_ir.is_trait else ""
598605
if is_direct:
599606
# Directly call method, without going through the vtable.
600-
lib = self.emitter.get_group_prefix(method.decl)
601-
self.emit_line(f"{dest}{lib}{NATIVE_PREFIX}{method.cname(self.names)}({args});")
607+
self.emit_line(f"{dest}{self.emitter.native_function_call(method_decl)}({args});")
602608
else:
603609
# Call using vtable.
604610
method_idx = rtype.method_index(name)

mypyc/codegen/emitmodule.py

Lines changed: 78 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,14 @@ def compile_modules_to_ir(
317317
else:
318318
scc_ir = compile_scc_to_ir(trees, result, mapper, compiler_options, errors)
319319
modules.update(scc_ir)
320+
# A later SCC loaded from cache may reference classes/functions
321+
# defined in this freshly-built SCC; populate deser_ctx so the
322+
# cached IR deserializer can resolve those cross-SCC references.
323+
for module_ir in scc_ir.values():
324+
for cl in module_ir.classes:
325+
deser_ctx.classes.setdefault(cl.fullname, cl)
326+
for fn in module_ir.functions:
327+
deser_ctx.functions.setdefault(fn.decl.id, fn)
320328

321329
return modules
322330

@@ -517,13 +525,16 @@ def generate_function_declaration(fn: FuncIR, emitter: Emitter) -> None:
517525
f"{native_function_header(fn.decl, emitter)};", needs_export=True
518526
)
519527
if fn.name != TOP_LEVEL_NAME and not fn.internal:
528+
# needs_export=True so Python-wrapper (CPyPy_) symbols are reachable from
529+
# other groups via the export table — needed for cross-group inherited
530+
# __init__ / __new__ slot dispatch under `separate=True`.
520531
if is_fastcall_supported(fn, emitter.capi_version):
521532
emitter.context.declarations[PREFIX + fn.cname(emitter.names)] = HeaderDeclaration(
522-
f"{wrapper_function_header(fn, emitter.names)};"
533+
f"{wrapper_function_header(fn, emitter.names)};", needs_export=True
523534
)
524535
else:
525536
emitter.context.declarations[PREFIX + fn.cname(emitter.names)] = HeaderDeclaration(
526-
f"{legacy_wrapper_function_header(fn, emitter.names)};"
537+
f"{legacy_wrapper_function_header(fn, emitter.names)};", needs_export=True
527538
)
528539

529540

@@ -886,6 +897,21 @@ def generate_shared_lib_init(self, emitter: Emitter) -> None:
886897
"goto fail;",
887898
"}",
888899
"",
900+
# Expose ensure_deps_<short> as a capsule so the shim can call
901+
# it before invoking the per-module init.
902+
f"extern int ensure_deps_{short_name}(void);",
903+
'capsule = PyCapsule_New((void *)ensure_deps_{sh}, "{lib}.ensure_deps", NULL);'.format(
904+
sh=short_name, lib=shared_lib_name(self.group_name)
905+
),
906+
"if (!capsule) {",
907+
"goto fail;",
908+
"}",
909+
'res = PyObject_SetAttrString(module, "ensure_deps", capsule);',
910+
"Py_DECREF(capsule);",
911+
"if (res < 0) {",
912+
"goto fail;",
913+
"}",
914+
"",
889915
)
890916

891917
for mod in self.modules:
@@ -917,25 +943,58 @@ def generate_shared_lib_init(self, emitter: Emitter) -> None:
917943
"",
918944
)
919945

920-
for group in sorted(self.context.group_deps):
921-
egroup = exported_name(group)
946+
# End of exec_<short_name>: only sets up capsules/module attributes.
947+
# Cross-group imports (populating `exports_<dep>` tables) are split
948+
# out into ensure_deps_<short_name>() below and run later, from the
949+
# shim's PyInit. See generate_shared_lib_init for details.
950+
emitter.emit_lines("return 0;", "fail:", "return -1;", "}")
951+
952+
if self.compiler_options.separate:
953+
# ensure_deps_<short>(): populates cross-group exports tables. Run
954+
# once, lazily, from the shim's PyInit just before invoking the
955+
# per-module init capsule. This defers cross-group imports out of
956+
# the shared-lib PyInit so they can't transitively trigger a
957+
# sibling package's __init__.py while another package __init__.py
958+
# is still mid-flight.
922959
emitter.emit_lines(
923-
'tmp = PyImport_ImportModule("{}"); if (!tmp) goto fail; Py_DECREF(tmp);'.format(
924-
shared_lib_name(group)
925-
),
926-
'struct export_table_{} *pexports_{} = PyCapsule_Import("{}.exports", 0);'.format(
927-
egroup, egroup, shared_lib_name(group)
928-
),
929-
f"if (!pexports_{egroup}) {{",
930-
"goto fail;",
931-
"}",
932-
"memcpy(&exports_{group}, pexports_{group}, sizeof(exports_{group}));".format(
933-
group=egroup
934-
),
935960
"",
961+
f"int ensure_deps_{short_name}(void)",
962+
"{",
963+
"static int done = 0;",
964+
"if (done) return 0;",
936965
)
937-
938-
emitter.emit_lines("return 0;", "fail:", "return -1;", "}")
966+
if self.context.group_deps:
967+
emitter.emit_line(
968+
'static PyObject *_mypyc_fromlist = NULL; '
969+
'if (!_mypyc_fromlist) { '
970+
'_mypyc_fromlist = Py_BuildValue("(s)", "*"); '
971+
'if (!_mypyc_fromlist) return -1; }'
972+
)
973+
emitter.emit_line("PyObject *tmp;")
974+
emitter.emit_line("PyObject *caps;")
975+
for group in sorted(self.context.group_deps):
976+
egroup = exported_name(group)
977+
# ImportModuleLevel with fromlist returns the leaf via
978+
# sys.modules (no dotted getattr walk), and fetching the
979+
# `exports` capsule directly off that module bypasses
980+
# PyCapsule_Import (which would redo the attribute walk).
981+
emitter.emit_lines(
982+
'tmp = PyImport_ImportModuleLevel("{}", NULL, NULL, _mypyc_fromlist, 0);'.format(
983+
shared_lib_name(group)
984+
),
985+
"if (!tmp) return -1;",
986+
'caps = PyObject_GetAttrString(tmp, "exports");',
987+
"Py_DECREF(tmp);",
988+
"if (!caps) return -1;",
989+
'struct export_table_{g} *pexports_{g} = '
990+
'(struct export_table_{g} *)PyCapsule_GetPointer(caps, "{lib}.exports");'.format(
991+
g=egroup, lib=shared_lib_name(group)
992+
),
993+
"Py_DECREF(caps);",
994+
f"if (!pexports_{egroup}) return -1;",
995+
"memcpy(&exports_{g}, pexports_{g}, sizeof(exports_{g}));".format(g=egroup),
996+
)
997+
emitter.emit_lines("done = 1;", "return 0;", "}")
939998

940999
if self.multi_phase_init:
9411000
emitter.emit_lines(
@@ -980,6 +1039,7 @@ def generate_shared_lib_init(self, emitter: Emitter) -> None:
9801039
"}",
9811040
f"if (exec_{short_name}(module) < 0) {{",
9821041
"Py_DECREF(module);",
1042+
"module = NULL;",
9831043
"return NULL;",
9841044
"}",
9851045
"return module;",

mypyc/codegen/emitwrapper.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,7 @@ def generate_get_wrapper(cl: ClassIR, fn: FuncIR, emitter: Emitter) -> str:
537537
)
538538
)
539539
emitter.emit_line("instance = instance ? instance : Py_None;")
540-
emitter.emit_line(f"return {NATIVE_PREFIX}{fn.cname(emitter.names)}(self, instance, owner);")
540+
emitter.emit_line(f"return {emitter.native_function_call(fn.decl)}(self, instance, owner);")
541541
emitter.emit_line("}")
542542

543543
return name
@@ -600,8 +600,8 @@ def generate_bool_wrapper(cl: ClassIR, fn: FuncIR, emitter: Emitter) -> str:
600600
name = f"{DUNDER_PREFIX}{fn.name}{cl.name_prefix(emitter.names)}"
601601
emitter.emit_line(f"static int {name}(PyObject *self) {{")
602602
emitter.emit_line(
603-
"{}val = {}{}(self);".format(
604-
emitter.ctype_spaced(fn.ret_type), NATIVE_PREFIX, fn.cname(emitter.names)
603+
"{}val = {}(self);".format(
604+
emitter.ctype_spaced(fn.ret_type), emitter.native_function_call(fn.decl)
605605
)
606606
)
607607
emitter.emit_error_check("val", fn.ret_type, "return -1;")
@@ -704,8 +704,10 @@ def generate_set_del_item_wrapper_inner(
704704
generate_arg_check(arg.name, arg.type, emitter, GotoHandler("fail"))
705705
native_args = ", ".join(f"arg_{arg.name}" for arg in args)
706706
emitter.emit_line(
707-
"{}val = {}{}({});".format(
708-
emitter.ctype_spaced(fn.ret_type), NATIVE_PREFIX, fn.cname(emitter.names), native_args
707+
"{}val = {}({});".format(
708+
emitter.ctype_spaced(fn.ret_type),
709+
emitter.native_function_call(fn.decl),
710+
native_args,
709711
)
710712
)
711713
emitter.emit_error_check("val", fn.ret_type, "goto fail;")
@@ -722,8 +724,8 @@ def generate_contains_wrapper(cl: ClassIR, fn: FuncIR, emitter: Emitter) -> str:
722724
emitter.emit_line(f"static int {name}(PyObject *self, PyObject *obj_item) {{")
723725
generate_arg_check("item", fn.args[1].type, emitter, ReturnHandler("-1"))
724726
emitter.emit_line(
725-
"{}val = {}{}(self, arg_item);".format(
726-
emitter.ctype_spaced(fn.ret_type), NATIVE_PREFIX, fn.cname(emitter.names)
727+
"{}val = {}(self, arg_item);".format(
728+
emitter.ctype_spaced(fn.ret_type), emitter.native_function_call(fn.decl)
727729
)
728730
)
729731
emitter.emit_error_check("val", fn.ret_type, "return -1;")
@@ -857,6 +859,9 @@ def set_target(self, fn: FuncIR) -> None:
857859
"""
858860
self.target_name = fn.name
859861
self.target_cname = fn.cname(self.emitter.names)
862+
# Cached native-call expression so cross-group targets go through the
863+
# exports table; same as `NATIVE_PREFIX + cname` for in-group calls.
864+
self.target_native_call = self.emitter.native_function_call(fn.decl)
860865
self.num_bitmap_args = fn.sig.num_bitmap_args
861866
if self.num_bitmap_args:
862867
self.args = fn.args[: -self.num_bitmap_args]
@@ -927,8 +932,8 @@ def emit_call(self, not_implemented_handler: str = "") -> None:
927932
# TODO: The Py_RETURN macros return the correct PyObject * with reference count
928933
# handling. Are they relevant?
929934
emitter.emit_line(
930-
"{}retval = {}{}({});".format(
931-
emitter.ctype_spaced(ret_type), NATIVE_PREFIX, self.target_cname, native_args
935+
"{}retval = {}({});".format(
936+
emitter.ctype_spaced(ret_type), self.target_native_call, native_args
932937
)
933938
)
934939
emitter.emit_lines(*self.cleanups)
@@ -941,9 +946,7 @@ def emit_call(self, not_implemented_handler: str = "") -> None:
941946
if not_implemented_handler and not isinstance(ret_type, RInstance):
942947
# The return value type may overlap with NotImplemented.
943948
emitter.emit_line(
944-
"PyObject *retbox = {}{}({});".format(
945-
NATIVE_PREFIX, self.target_cname, native_args
946-
)
949+
f"PyObject *retbox = {self.target_native_call}({native_args});"
947950
)
948951
emitter.emit_lines(
949952
"if (retbox == Py_NotImplemented) {",
@@ -952,7 +955,7 @@ def emit_call(self, not_implemented_handler: str = "") -> None:
952955
"return retbox;",
953956
)
954957
else:
955-
emitter.emit_line(f"return {NATIVE_PREFIX}{self.target_cname}({native_args});")
958+
emitter.emit_line(f"return {self.target_native_call}({native_args});")
956959
# TODO: Tracebacks?
957960

958961
def error(self) -> ErrorHandler:

mypyc/ir/class_ir.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,12 @@ def has_method(self, name: str) -> bool:
284284
return True
285285

286286
def is_method_final(self, name: str) -> bool:
287+
if not self.is_ext_class:
288+
# Non-extension classes don't use vtable dispatch; their mypyc-compiled
289+
# "fast" methods are always called directly by C name. Treating them as
290+
# final here keeps codegen from trying to index into a vtable that was
291+
# never computed (non-ext classes skip compute_vtable).
292+
return True
287293
subs = self.subclasses()
288294
if subs is None:
289295
return self.is_final_class

mypyc/irbuild/prepare.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,13 @@ def load_type_map(mapper: Mapper, modules: list[MypyFile], deser_ctx: DeserMaps)
173173
and not node.node.is_named_tuple
174174
and node.node.typeddict_type is None
175175
):
176-
ir = deser_ctx.classes[node.node.fullname]
176+
# Some TypeInfo entries are mypy-synthetic (e.g. anonymous
177+
# intersection classes like "<subclass of X and Y>") and have
178+
# no corresponding mypyc ClassIR. Skip those rather than
179+
# aborting the whole cache load.
180+
ir = deser_ctx.classes.get(node.node.fullname)
181+
if ir is None:
182+
continue
177183
mapper.type_to_ir[node.node] = ir
178184
mapper.symbol_fullnames.add(node.node.fullname)
179185
mapper.func_to_decl[node.node] = ir.ctor

0 commit comments

Comments
 (0)