|
| 1 | +# `separate=True` fixes: notes and status |
| 2 | + |
| 3 | +This branch (`separate_flag`) contains two commits that make |
| 4 | +`mypycify(..., separate=True)` work against real-world projects. Prior to |
| 5 | +these commits, `separate=True` was effectively an experimental flag: the |
| 6 | +only coverage in mypyc's CI is `mypyc/test/test_run.py::TestRunSeparate` |
| 7 | +against two toy fixture files (`run-multimodule.test`, `run-mypy-sim.test`), |
| 8 | +and mypy's own `setup.py` never sets `separate=True` (it sets `multi_file` |
| 9 | +on Windows only). Real projects stressed the flag and turned up seven |
| 10 | +latent bugs. |
| 11 | + |
| 12 | +The two commits are split so it's clear which fixes were required to get |
| 13 | +**sqlglot** building & testing cleanly, vs which additional fixes are |
| 14 | +needed to also get **mypy**'s self-compile working. |
| 15 | + |
| 16 | +``` |
| 17 | +$ git log --oneline separate_flag |
| 18 | +<sha2> [mypyc] Additional separate=True fixes needed for mypy self-compile |
| 19 | +<sha1> [mypyc] Make separate=True compilation work for real-world projects (sqlglot) |
| 20 | +``` |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Commit 1: sqlglot-essential fixes |
| 25 | + |
| 26 | +**Scope:** Everything needed for sqlglot to build + test + run end-to-end |
| 27 | +under `separate=True`. sqlglot is a ~100-module SQL parser/transpiler with |
| 28 | +cross-group class inheritance, mypyc-generated generator helper classes, |
| 29 | +non-extension subclasses with fast methods, and mutual cross-module |
| 30 | +dependencies. |
| 31 | + |
| 32 | +Six bug categories: |
| 33 | + |
| 34 | +### 1. Non-extension classes never have vtables |
| 35 | + |
| 36 | +`is_method_final` in `ClassIR` falls back to `is_final_class` when |
| 37 | +`subclasses()` returns `None` (which it does under `separate=True`: |
| 38 | +children tracking is disabled). That makes `emit_method_call` take the |
| 39 | +vtable path. But non-extension classes skip `compute_vtable()`, so |
| 40 | +`vtable_entry()` asserts `self.vtable is not None` and crashes. |
| 41 | + |
| 42 | +**Fix:** short-circuit `is_method_final` to `True` for non-ext classes — |
| 43 | +their methods are compiled as direct C functions, never dispatched via |
| 44 | +vtable. |
| 45 | + |
| 46 | +Trigger in sqlglot: `DType(AutoName)` (an enum) defines `into_expr`, which |
| 47 | +mypyc wrapped as a `__mypyc_fast_into_expr` method. Any `DType.FOO.into_expr()` |
| 48 | +cross-group crashed. |
| 49 | + |
| 50 | +### 2. Cross-group method call requires only the decl, not the body |
| 51 | + |
| 52 | +`emit_method_call` asserted `method = class_ir.get_method(name)` was |
| 53 | +non-None. Under `separate=True`, a method's `FuncIR` body lives only in |
| 54 | +the defining group; consumers see only the `FuncDecl` via `method_decls`. |
| 55 | + |
| 56 | +**Fix:** use `method_decl(name)` (always available, walks MRO) and drop |
| 57 | +the assert. Split `native_function_type(fn)` into a decl-taking overload. |
| 58 | + |
| 59 | +Trigger in sqlglot: mypyc-generated generator helper classes (e.g. |
| 60 | +`find_all_in_scope_gen`) with `__mypyc_generator_helper__` methods called |
| 61 | +cross-group. |
| 62 | + |
| 63 | +### 3. Cross-group native/wrapper calls bypassed the exports table |
| 64 | + |
| 65 | +A dozen call sites in `emitwrapper.py`, `emitfunc.py`, and `emitclass.py` |
| 66 | +hardcoded `NATIVE_PREFIX + cname()` or `PREFIX + cname()` without going |
| 67 | +through `get_group_prefix()`. In separate mode, those symbols live in |
| 68 | +sibling groups — the C compiler needs `exports_other.CPyDef_foo(...)`, |
| 69 | +not `CPyDef_foo(...)`. |
| 70 | + |
| 71 | +**Fix:** add `Emitter.native_function_call(decl)` and |
| 72 | +`Emitter.wrapper_function_call(decl)` helpers that do the right thing, |
| 73 | +and migrate all offending sites. Also make `CPyPy_*` wrapper declarations |
| 74 | +`needs_export=True` so those symbols actually reach the exports table. |
| 75 | + |
| 76 | +Trigger: any class using an `__init__` inherited across groups (e.g. |
| 77 | +`Tables(AbstractMappingSchema)`, `TSQLParser(Parser)`); richcompare |
| 78 | +wrappers on classes whose `__eq__` lives in a parent group. |
| 79 | + |
| 80 | +### 4. Defer cross-group imports out of `PyInit` |
| 81 | + |
| 82 | +The shared lib's `exec_<group>` function used to `PyImport_ImportModule` |
| 83 | +sibling groups at PyInit time, and `PyCapsule_Import` their export |
| 84 | +tables. But the shim's PyInit gets invoked while the enclosing Python |
| 85 | +package's `__init__.py` is still executing, and Python's top-down getattr |
| 86 | +walk for dotted imports tries `getattr(pkg, submod)` — which fails with |
| 87 | +`AttributeError: partially initialized module …` because Python hasn't |
| 88 | +yet set the submodule attribute on the parent. |
| 89 | + |
| 90 | +**Fix:** split `exec_<group>` into two C functions: |
| 91 | +- `exec_<short>`: capsule setup only, safe to run at PyInit time. |
| 92 | +- `ensure_deps_<short>`: cross-group imports + exports-table memcpy. |
| 93 | + Idempotent, exposed as another capsule on the shared lib. |
| 94 | + |
| 95 | +The shim (`module_shim.tmpl`) calls `ensure_deps` just before invoking the |
| 96 | +per-module init capsule — by that time Python has settled the parent |
| 97 | +package state for this shim's load. Also use |
| 98 | +`PyImport_ImportModuleLevel(name, NULL, NULL, fromlist=("*",), 0)` so the |
| 99 | +shared lib lookup returns the leaf via `sys.modules` (no dotted getattr |
| 100 | +walk), and fetch exports capsules directly via `PyObject_GetAttrString` |
| 101 | +instead of `PyCapsule_Import` (which itself performs the dotted walk). |
| 102 | + |
| 103 | +### 5. Fix broken `CPyImport_ImportFrom` fallback |
| 104 | + |
| 105 | +In `lib-rt/misc_ops.c`, the fallback path (when a name isn't found on the |
| 106 | +imported module) intended to look up the submodule in `sys.modules`. The |
| 107 | +comment even says "simplification of `PyImport_GetModule()`". But the code |
| 108 | +called `PyObject_GetItem(module, fullname)` — modules don't implement |
| 109 | +`__getitem__`, so the fallback always raised `TypeError` and fell into |
| 110 | +the error path. Additionally, the error path called `Py_DECREF` on a |
| 111 | +pointer that could be NULL (for built-in modules with no `__file__`). |
| 112 | + |
| 113 | +**Fix:** use `PyImport_GetModule(fullname)` (what the comment actually |
| 114 | +describes) and `Py_XDECREF` for potentially-NULL pointers. |
| 115 | + |
| 116 | +### 6. Incremental-mode plumbing |
| 117 | + |
| 118 | +Two small fixes for the IR cache path used when `separate=True` enables |
| 119 | +mypy incremental mode: |
| 120 | + |
| 121 | +- `compile_modules_to_ir` populates `deser_ctx.classes/functions` with |
| 122 | + freshly built `ClassIR`/`FuncIR` so later cache-loaded SCCs can resolve |
| 123 | + cross-SCC references (otherwise `ClassIR.deserialize` raises `KeyError` |
| 124 | + on a referenced class from a dirty-and-rebuilt SCC). |
| 125 | +- `load_type_map` tolerates mypy-synthetic `TypeInfo` entries (e.g. |
| 126 | + `<subclass of X and Y>`) that have no corresponding mypyc `ClassIR`. |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## Commit 2: mypy-specific additional fixes |
| 131 | + |
| 132 | +**Scope:** Two additional bugs that sqlglot didn't trigger but mypy does. |
| 133 | +These are genuinely generic fixes, not mypy-specific hacks. |
| 134 | + |
| 135 | +### 1. Cross-group struct-field access didn't register a group dep |
| 136 | + |
| 137 | +Emitted code like `((FooObject *)obj)->attr` requires the `FooObject` |
| 138 | +struct declaration to be visible in the current translation unit. That |
| 139 | +struct lives in the defining group's `__native_<group>.h`. Consumer |
| 140 | +TUs only `#include` other groups' headers for groups listed in |
| 141 | +`context.group_deps` — which is populated by `get_module_group_prefix` |
| 142 | +when a cross-group function call or static is emitted. Direct struct- |
| 143 | +field access used `struct_name()` without going through the group |
| 144 | +prefix, so the dep was never registered. |
| 145 | + |
| 146 | +Result for mypy: ~40 clang errors like |
| 147 | +`error: use of undeclared identifier 'mypy___options___OptionsObject'` |
| 148 | +when `checkexpr` reads `options.line_checking_stats` etc. |
| 149 | + |
| 150 | +**Fix:** `Emitter.register_group_dep(cl)` helper, called from |
| 151 | +`get_attr_expr` for both the receiver's class and the declaring class. |
| 152 | + |
| 153 | +### 2. Runtime-file `#include` name collisions |
| 154 | + |
| 155 | +Every generated `__native_<group>.c` starts with |
| 156 | +`#include "int_ops.c"` (and similar for the other runtime C files in |
| 157 | +`mypyc/lib-rt/`). Under `separate=True`, the shim `.c` files generated by |
| 158 | +mypyc share a directory with the `__native_<group>.c` that #includes |
| 159 | +them. Mypy has `mypyc/lower/int_ops.py`, whose shim lands at |
| 160 | +`build/mypyc/lower/int_ops.c` — same basename as the runtime file. Clang |
| 161 | +searches the includer's directory first (`""` form), picks up the shim, |
| 162 | +and every shim file defines `PyInit___init__` → redefinition errors. |
| 163 | + |
| 164 | +**Fix:** switch the runtime includes to `<name>` form (search `-I` paths |
| 165 | +only, skip the includer's directory). The runtime directory is already |
| 166 | +on the `-I` list. |
| 167 | + |
| 168 | +--- |
| 169 | + |
| 170 | +## What still doesn't work: mypy incremental rebuild |
| 171 | + |
| 172 | +**Clean build:** works with `separate=True`. Mypy self-compiles in ~60 s |
| 173 | +with `-j 11` (vs ~1:40 monolithic), produces a functional mypy. |
| 174 | + |
| 175 | +**Incremental edit-rebuild:** broken in one edge case. After a clean |
| 176 | +build, editing any mypy `.py` that references `mypy.nodes.SymbolTable()` |
| 177 | +cross-group re-emits C that calls |
| 178 | +`exports_mypy___nodes.CPyDef_mypy___nodes___SymbolTable()`. But |
| 179 | +`SymbolTable` is a `dict` subclass, and `generate_class_type_decl` skips |
| 180 | +the native `CPyDef_` constructor declaration for classes with |
| 181 | +`builtin_base` set (they're supposed to be instantiated through a |
| 182 | +Python-level call). So `nodes`'s cached exports table has no |
| 183 | +`CPyDef_SymbolTable` entry, and clang errors out. |
| 184 | + |
| 185 | +**What's different about clean vs. incremental:** in the clean build, |
| 186 | +all SCCs are fresh — `checker.py`'s build sees `SymbolTable` with |
| 187 | +`builtin_base="PyDictObject"` populated by `prepare_class_def` and emits |
| 188 | +a Python-level `PyObject_Call`. Under incremental, `nodes`'s SCC is |
| 189 | +cached: `SymbolTable`'s `ClassIR` is deserialized from the `.ir.json` |
| 190 | +cache. The deserialized `ClassIR` has `builtin_base="PyDictObject"` — |
| 191 | +that field does serialize/deserialize correctly (verified by grep) — yet |
| 192 | +codegen still emits the native `CPyDef_` call. Some other flag or |
| 193 | +downstream-computed state must govern the decision; I haven't found it. |
| 194 | + |
| 195 | +### To investigate next |
| 196 | + |
| 197 | +1. Dump the `SymbolTable` `ClassIR` after a fresh clean build and after a |
| 198 | + cache-deserialize and diff them field by field. The divergent field |
| 199 | + that governs codegen's fresh-vs-cached choice is the culprit. |
| 200 | +2. Grep `irbuild/classdef.py` / `irbuild/expression.py` / `irbuild/ |
| 201 | + specialize.py` for the site that decides between `CPyDef_` ctor and |
| 202 | + `PyObject_Call` for class instantiation — what condition does it |
| 203 | + check? Compare that field's value fresh vs deserialized. |
| 204 | +3. If the bug is in the serialize roundtrip, fix either the serializer |
| 205 | + (include the missing field) or the codegen check (use a field that is |
| 206 | + preserved). |
| 207 | +4. Likely also need an analogous check for `BaseException` subclasses. |
| 208 | +5. Once fixed, verify mypy's own test suite still passes in separate |
| 209 | + mode. Then flip `separate=True` permanently in `mypy/setup.py` to |
| 210 | + cut mypy's rebuild cost from ~100 s to ~3 s per edit. |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## sqlglot-side changes |
| 215 | + |
| 216 | +`separate=True` can cause sqlglot's own package init to expose a |
| 217 | +circular-import pattern that the old monolithic flow happened to work |
| 218 | +around. Two minimal sqlglot-side tweaks (in the sqlglot repo, not this |
| 219 | +mypy fork) cover it: |
| 220 | + |
| 221 | +1. **`sqlglot/__init__.py`** — the pre-existing bootstrap that pre-loads |
| 222 | + `*__mypyc*.so` under bare names was written for the legacy hash-named |
| 223 | + single-shared-lib build. Under `separate=True`, per-module shared libs |
| 224 | + sit next to their `.py` siblings and are resolved via normal dotted |
| 225 | + imports — skip those in the bootstrap. 8-line change. |
| 226 | + |
| 227 | +2. **`sqlglot/optimizer/__init__.py`** — was eagerly |
| 228 | + `from sqlglot.optimizer.optimizer import RULES, optimize`, which |
| 229 | + cascades to `from sqlglot import Schema, exp`. Under `separate=True`'s |
| 230 | + eager cross-group init, this can fire before `sqlglot`'s own |
| 231 | + `__init__.py` has bound those names, causing a circular-import |
| 232 | + `ImportError`. Convert to PEP 562 lazy `__getattr__` so the imports |
| 233 | + defer until first use. |
| 234 | + |
| 235 | +Both changes are minor and pure-Python. They ship cleanly on the sqlglot |
| 236 | +main branch independent of this fork. |
| 237 | + |
| 238 | +--- |
| 239 | + |
| 240 | +## Benchmarks (macOS arm64, Python 3.14.2, MYPYC_OPT=0) |
| 241 | + |
| 242 | +### sqlglot (`separate=True` vs monolithic) |
| 243 | + |
| 244 | +| Scenario | Monolithic | separate=True | Speedup | |
| 245 | +|---|---|---|---| |
| 246 | +| Clean build | 110 s | 60 s | 1.8× | |
| 247 | +| No-op rebuild | 110 s | 1.4 s | 80× | |
| 248 | +| Edit 1 file rebuild | 110 s | 3.3 s | 33× | |
| 249 | +| Tests passing | 984/988 | 1227/1229 | parity | |
| 250 | + |
| 251 | +(Pre-existing env-only failures in both runs; diff is tests discovered.) |
| 252 | + |
| 253 | +### mypy self-compile |
| 254 | + |
| 255 | +| Scenario | Monolithic | separate=True | Speedup | |
| 256 | +|---|---|---|---| |
| 257 | +| Clean, no `-j` | 1:40 | 2:59 | 0.56× (more overhead per group) | |
| 258 | +| Clean, `-j 11` | ~1:00 (est.) | 1:04 | parity | |
| 259 | +| No-op rebuild | ~0.5 s | ~1:00 | ⚠️ setuptools-copy overhead | |
| 260 | +| Edit 1 file rebuild | 1:40 | ⚠️ broken | see "What still doesn't work" | |
| 261 | + |
| 262 | +The "no-op" 1:00 is entirely setuptools copying 428 `.so` files serially |
| 263 | +(mypyc itself did zero work). There's a `stamp` file Makefile approach |
| 264 | +documented in the sqlglot CHANGELOG that reduces this to ~0 but it's |
| 265 | +pure plumbing and doesn't touch mypyc. |
0 commit comments