Skip to content

Commit 0bcff13

Browse files
committed
Add SEPARATE_FLAG_NOTES.md documenting the two commits on this branch
1 parent cbdb079 commit 0bcff13

1 file changed

Lines changed: 265 additions & 0 deletions

File tree

SEPARATE_FLAG_NOTES.md

Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
# `separate=True` fixes: notes and status
2+
3+
This branch (`separate_flag`) contains two commits that make
4+
`mypycify(..., separate=True)` work against real-world projects. Prior to
5+
these commits, `separate=True` was effectively an experimental flag: the
6+
only coverage in mypyc's CI is `mypyc/test/test_run.py::TestRunSeparate`
7+
against two toy fixture files (`run-multimodule.test`, `run-mypy-sim.test`),
8+
and mypy's own `setup.py` never sets `separate=True` (it sets `multi_file`
9+
on Windows only). Real projects stressed the flag and turned up seven
10+
latent bugs.
11+
12+
The two commits are split so it's clear which fixes were required to get
13+
**sqlglot** building & testing cleanly, vs which additional fixes are
14+
needed to also get **mypy**'s self-compile working.
15+
16+
```
17+
$ git log --oneline separate_flag
18+
<sha2> [mypyc] Additional separate=True fixes needed for mypy self-compile
19+
<sha1> [mypyc] Make separate=True compilation work for real-world projects (sqlglot)
20+
```
21+
22+
---
23+
24+
## Commit 1: sqlglot-essential fixes
25+
26+
**Scope:** Everything needed for sqlglot to build + test + run end-to-end
27+
under `separate=True`. sqlglot is a ~100-module SQL parser/transpiler with
28+
cross-group class inheritance, mypyc-generated generator helper classes,
29+
non-extension subclasses with fast methods, and mutual cross-module
30+
dependencies.
31+
32+
Six bug categories:
33+
34+
### 1. Non-extension classes never have vtables
35+
36+
`is_method_final` in `ClassIR` falls back to `is_final_class` when
37+
`subclasses()` returns `None` (which it does under `separate=True`:
38+
children tracking is disabled). That makes `emit_method_call` take the
39+
vtable path. But non-extension classes skip `compute_vtable()`, so
40+
`vtable_entry()` asserts `self.vtable is not None` and crashes.
41+
42+
**Fix:** short-circuit `is_method_final` to `True` for non-ext classes —
43+
their methods are compiled as direct C functions, never dispatched via
44+
vtable.
45+
46+
Trigger in sqlglot: `DType(AutoName)` (an enum) defines `into_expr`, which
47+
mypyc wrapped as a `__mypyc_fast_into_expr` method. Any `DType.FOO.into_expr()`
48+
cross-group crashed.
49+
50+
### 2. Cross-group method call requires only the decl, not the body
51+
52+
`emit_method_call` asserted `method = class_ir.get_method(name)` was
53+
non-None. Under `separate=True`, a method's `FuncIR` body lives only in
54+
the defining group; consumers see only the `FuncDecl` via `method_decls`.
55+
56+
**Fix:** use `method_decl(name)` (always available, walks MRO) and drop
57+
the assert. Split `native_function_type(fn)` into a decl-taking overload.
58+
59+
Trigger in sqlglot: mypyc-generated generator helper classes (e.g.
60+
`find_all_in_scope_gen`) with `__mypyc_generator_helper__` methods called
61+
cross-group.
62+
63+
### 3. Cross-group native/wrapper calls bypassed the exports table
64+
65+
A dozen call sites in `emitwrapper.py`, `emitfunc.py`, and `emitclass.py`
66+
hardcoded `NATIVE_PREFIX + cname()` or `PREFIX + cname()` without going
67+
through `get_group_prefix()`. In separate mode, those symbols live in
68+
sibling groups — the C compiler needs `exports_other.CPyDef_foo(...)`,
69+
not `CPyDef_foo(...)`.
70+
71+
**Fix:** add `Emitter.native_function_call(decl)` and
72+
`Emitter.wrapper_function_call(decl)` helpers that do the right thing,
73+
and migrate all offending sites. Also make `CPyPy_*` wrapper declarations
74+
`needs_export=True` so those symbols actually reach the exports table.
75+
76+
Trigger: any class using an `__init__` inherited across groups (e.g.
77+
`Tables(AbstractMappingSchema)`, `TSQLParser(Parser)`); richcompare
78+
wrappers on classes whose `__eq__` lives in a parent group.
79+
80+
### 4. Defer cross-group imports out of `PyInit`
81+
82+
The shared lib's `exec_<group>` function used to `PyImport_ImportModule`
83+
sibling groups at PyInit time, and `PyCapsule_Import` their export
84+
tables. But the shim's PyInit gets invoked while the enclosing Python
85+
package's `__init__.py` is still executing, and Python's top-down getattr
86+
walk for dotted imports tries `getattr(pkg, submod)` — which fails with
87+
`AttributeError: partially initialized module …` because Python hasn't
88+
yet set the submodule attribute on the parent.
89+
90+
**Fix:** split `exec_<group>` into two C functions:
91+
- `exec_<short>`: capsule setup only, safe to run at PyInit time.
92+
- `ensure_deps_<short>`: cross-group imports + exports-table memcpy.
93+
Idempotent, exposed as another capsule on the shared lib.
94+
95+
The shim (`module_shim.tmpl`) calls `ensure_deps` just before invoking the
96+
per-module init capsule — by that time Python has settled the parent
97+
package state for this shim's load. Also use
98+
`PyImport_ImportModuleLevel(name, NULL, NULL, fromlist=("*",), 0)` so the
99+
shared lib lookup returns the leaf via `sys.modules` (no dotted getattr
100+
walk), and fetch exports capsules directly via `PyObject_GetAttrString`
101+
instead of `PyCapsule_Import` (which itself performs the dotted walk).
102+
103+
### 5. Fix broken `CPyImport_ImportFrom` fallback
104+
105+
In `lib-rt/misc_ops.c`, the fallback path (when a name isn't found on the
106+
imported module) intended to look up the submodule in `sys.modules`. The
107+
comment even says "simplification of `PyImport_GetModule()`". But the code
108+
called `PyObject_GetItem(module, fullname)` — modules don't implement
109+
`__getitem__`, so the fallback always raised `TypeError` and fell into
110+
the error path. Additionally, the error path called `Py_DECREF` on a
111+
pointer that could be NULL (for built-in modules with no `__file__`).
112+
113+
**Fix:** use `PyImport_GetModule(fullname)` (what the comment actually
114+
describes) and `Py_XDECREF` for potentially-NULL pointers.
115+
116+
### 6. Incremental-mode plumbing
117+
118+
Two small fixes for the IR cache path used when `separate=True` enables
119+
mypy incremental mode:
120+
121+
- `compile_modules_to_ir` populates `deser_ctx.classes/functions` with
122+
freshly built `ClassIR`/`FuncIR` so later cache-loaded SCCs can resolve
123+
cross-SCC references (otherwise `ClassIR.deserialize` raises `KeyError`
124+
on a referenced class from a dirty-and-rebuilt SCC).
125+
- `load_type_map` tolerates mypy-synthetic `TypeInfo` entries (e.g.
126+
`<subclass of X and Y>`) that have no corresponding mypyc `ClassIR`.
127+
128+
---
129+
130+
## Commit 2: mypy-specific additional fixes
131+
132+
**Scope:** Two additional bugs that sqlglot didn't trigger but mypy does.
133+
These are genuinely generic fixes, not mypy-specific hacks.
134+
135+
### 1. Cross-group struct-field access didn't register a group dep
136+
137+
Emitted code like `((FooObject *)obj)->attr` requires the `FooObject`
138+
struct declaration to be visible in the current translation unit. That
139+
struct lives in the defining group's `__native_<group>.h`. Consumer
140+
TUs only `#include` other groups' headers for groups listed in
141+
`context.group_deps` — which is populated by `get_module_group_prefix`
142+
when a cross-group function call or static is emitted. Direct struct-
143+
field access used `struct_name()` without going through the group
144+
prefix, so the dep was never registered.
145+
146+
Result for mypy: ~40 clang errors like
147+
`error: use of undeclared identifier 'mypy___options___OptionsObject'`
148+
when `checkexpr` reads `options.line_checking_stats` etc.
149+
150+
**Fix:** `Emitter.register_group_dep(cl)` helper, called from
151+
`get_attr_expr` for both the receiver's class and the declaring class.
152+
153+
### 2. Runtime-file `#include` name collisions
154+
155+
Every generated `__native_<group>.c` starts with
156+
`#include "int_ops.c"` (and similar for the other runtime C files in
157+
`mypyc/lib-rt/`). Under `separate=True`, the shim `.c` files generated by
158+
mypyc share a directory with the `__native_<group>.c` that #includes
159+
them. Mypy has `mypyc/lower/int_ops.py`, whose shim lands at
160+
`build/mypyc/lower/int_ops.c` — same basename as the runtime file. Clang
161+
searches the includer's directory first (`""` form), picks up the shim,
162+
and every shim file defines `PyInit___init__` → redefinition errors.
163+
164+
**Fix:** switch the runtime includes to `<name>` form (search `-I` paths
165+
only, skip the includer's directory). The runtime directory is already
166+
on the `-I` list.
167+
168+
---
169+
170+
## What still doesn't work: mypy incremental rebuild
171+
172+
**Clean build:** works with `separate=True`. Mypy self-compiles in ~60 s
173+
with `-j 11` (vs ~1:40 monolithic), produces a functional mypy.
174+
175+
**Incremental edit-rebuild:** broken in one edge case. After a clean
176+
build, editing any mypy `.py` that references `mypy.nodes.SymbolTable()`
177+
cross-group re-emits C that calls
178+
`exports_mypy___nodes.CPyDef_mypy___nodes___SymbolTable()`. But
179+
`SymbolTable` is a `dict` subclass, and `generate_class_type_decl` skips
180+
the native `CPyDef_` constructor declaration for classes with
181+
`builtin_base` set (they're supposed to be instantiated through a
182+
Python-level call). So `nodes`'s cached exports table has no
183+
`CPyDef_SymbolTable` entry, and clang errors out.
184+
185+
**What's different about clean vs. incremental:** in the clean build,
186+
all SCCs are fresh — `checker.py`'s build sees `SymbolTable` with
187+
`builtin_base="PyDictObject"` populated by `prepare_class_def` and emits
188+
a Python-level `PyObject_Call`. Under incremental, `nodes`'s SCC is
189+
cached: `SymbolTable`'s `ClassIR` is deserialized from the `.ir.json`
190+
cache. The deserialized `ClassIR` has `builtin_base="PyDictObject"`
191+
that field does serialize/deserialize correctly (verified by grep) — yet
192+
codegen still emits the native `CPyDef_` call. Some other flag or
193+
downstream-computed state must govern the decision; I haven't found it.
194+
195+
### To investigate next
196+
197+
1. Dump the `SymbolTable` `ClassIR` after a fresh clean build and after a
198+
cache-deserialize and diff them field by field. The divergent field
199+
that governs codegen's fresh-vs-cached choice is the culprit.
200+
2. Grep `irbuild/classdef.py` / `irbuild/expression.py` / `irbuild/
201+
specialize.py` for the site that decides between `CPyDef_` ctor and
202+
`PyObject_Call` for class instantiation — what condition does it
203+
check? Compare that field's value fresh vs deserialized.
204+
3. If the bug is in the serialize roundtrip, fix either the serializer
205+
(include the missing field) or the codegen check (use a field that is
206+
preserved).
207+
4. Likely also need an analogous check for `BaseException` subclasses.
208+
5. Once fixed, verify mypy's own test suite still passes in separate
209+
mode. Then flip `separate=True` permanently in `mypy/setup.py` to
210+
cut mypy's rebuild cost from ~100 s to ~3 s per edit.
211+
212+
---
213+
214+
## sqlglot-side changes
215+
216+
`separate=True` can cause sqlglot's own package init to expose a
217+
circular-import pattern that the old monolithic flow happened to work
218+
around. Two minimal sqlglot-side tweaks (in the sqlglot repo, not this
219+
mypy fork) cover it:
220+
221+
1. **`sqlglot/__init__.py`** — the pre-existing bootstrap that pre-loads
222+
`*__mypyc*.so` under bare names was written for the legacy hash-named
223+
single-shared-lib build. Under `separate=True`, per-module shared libs
224+
sit next to their `.py` siblings and are resolved via normal dotted
225+
imports — skip those in the bootstrap. 8-line change.
226+
227+
2. **`sqlglot/optimizer/__init__.py`** — was eagerly
228+
`from sqlglot.optimizer.optimizer import RULES, optimize`, which
229+
cascades to `from sqlglot import Schema, exp`. Under `separate=True`'s
230+
eager cross-group init, this can fire before `sqlglot`'s own
231+
`__init__.py` has bound those names, causing a circular-import
232+
`ImportError`. Convert to PEP 562 lazy `__getattr__` so the imports
233+
defer until first use.
234+
235+
Both changes are minor and pure-Python. They ship cleanly on the sqlglot
236+
main branch independent of this fork.
237+
238+
---
239+
240+
## Benchmarks (macOS arm64, Python 3.14.2, MYPYC_OPT=0)
241+
242+
### sqlglot (`separate=True` vs monolithic)
243+
244+
| Scenario | Monolithic | separate=True | Speedup |
245+
|---|---|---|---|
246+
| Clean build | 110 s | 60 s | 1.8× |
247+
| No-op rebuild | 110 s | 1.4 s | 80× |
248+
| Edit 1 file rebuild | 110 s | 3.3 s | 33× |
249+
| Tests passing | 984/988 | 1227/1229 | parity |
250+
251+
(Pre-existing env-only failures in both runs; diff is tests discovered.)
252+
253+
### mypy self-compile
254+
255+
| Scenario | Monolithic | separate=True | Speedup |
256+
|---|---|---|---|
257+
| Clean, no `-j` | 1:40 | 2:59 | 0.56× (more overhead per group) |
258+
| Clean, `-j 11` | ~1:00 (est.) | 1:04 | parity |
259+
| No-op rebuild | ~0.5 s | ~1:00 | ⚠️ setuptools-copy overhead |
260+
| Edit 1 file rebuild | 1:40 | ⚠️ broken | see "What still doesn't work" |
261+
262+
The "no-op" 1:00 is entirely setuptools copying 428 `.so` files serially
263+
(mypyc itself did zero work). There's a `stamp` file Makefile approach
264+
documented in the sqlglot CHANGELOG that reduces this to ~0 but it's
265+
pure plumbing and doesn't touch mypyc.

0 commit comments

Comments
 (0)