Skip to content

Commit 02ec968

Browse files
JukkaLilevkivskyi
andauthored
Add work-in-progress implementation of a new Python parser (#20856)
The new "native" parser (`mypy.nativeparse`) will eventually replace the current parser (`mypy.fastparse`). The native parser uses a Rust extension that wraps the Ruff parser to generate a serialized AST, and mypy will deserialize the AST directly into a mypy AST. The binary format is the same one we already use for mypy fixed-format incremental caches. This is still work in progress and some features aren't supported. The most important missing feature is probably function type comments. Also, the Rust extension needs to be manually compiled from https://github.com/mypyc/ast_serialize. Refer to the `ast_serialize` repository for instructions. There is no CI support for the new parser right now -- there are tests, but they are skipped unless the `ast_serialize` extension is installed, and it isn't installed in CI right now. Once the Rust extension is installed, use `--native-parser` to enable the new parser. The main type checker test suite can be run using the native parser via `TEST_NATIVE_PARSER=1 pytest mypy/test/testheck.py` (the `TEST_NATIVE_PARSER` environment variable needs to be set). A bunch of tests are still failing. Related issue with more context: #19776 Remaining work is tracked here for now: https://github.com/mypyc/ast_serialize/issues Here are the expected benefits over the old mypy parser, adapted from the docstring of `mypy/nativeparse.py`: * No intermediate non-mypyc Python-level AST created, to improve performance * Parsing doesn't need GIL => can use multithreading to construct serialized ASTs in parallel * Produce import dependencies without having to build an AST => helps parallel type checking * Support all Python syntax even if mypy is running on an older Python version * Generate an AST even if there are syntax errors * Potential to support incremental parsing (quickly process modified sections in a file) * Stripping function bodies in third-party code can happen earlier, for extra performance * We have the option to easily add support for `# mypy: ignore` comments Most of the code is straightforward and repetitive deserialization code. I used plenty of coding agent assist to implement deserialization and to add tests. The tests are separate from the pre-existing parser tests, but we can unify them later (or delete the old tests once we delete the old parser). @ilevkivskyi contributed to this PR. --------- Co-authored-by: Ivan Levkivskyi <levkivskyi@gmail.com>
1 parent a62b691 commit 02ec968

16 files changed

+6637
-1
lines changed

mypy/cache.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,8 @@ def read(cls, data: ReadBuffer, data_file: str) -> CacheMeta | None:
266266
# Misc classes.
267267
EXTRA_ATTRS: Final[Tag] = 150
268268
DT_SPEC: Final[Tag] = 151
269+
# Four integers representing source file (line, column) range.
270+
LOCATION: Final[Tag] = 152
269271

270272
END_TAG: Final[Tag] = 255
271273

mypy/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1245,6 +1245,8 @@ def add_invertible_flag(
12451245
# --local-partial-types disallows partial types spanning module top level and a function
12461246
# (implicitly defined in fine-grained incremental mode)
12471247
add_invertible_flag("--local-partial-types", default=False, help=argparse.SUPPRESS)
1248+
# --native-parser enables the native parser (experimental)
1249+
add_invertible_flag("--native-parser", default=False, help=argparse.SUPPRESS)
12481250
# --logical-deps adds some more dependencies that are not semantically needed, but
12491251
# may be helpful to determine relative importance of classes and functions for overall
12501252
# type precision in a code base. It also _removes_ some deps, so this flag should be never

0 commit comments

Comments
 (0)