Change String to Text types in AST and source#2786
Open
plajjan wants to merge 6 commits into
Open
Conversation
5105d49 to
5bd19b0
Compare
Parser and source-carrying compiler paths still stored many source-derived payloads as String. Convert those payloads to Text so the parsed AST can retain source slices instead of expanding each identifier, literal, and diagnostic payload into boxed character lists. Update consumers that print, serialize, complete, and report those values. Add the parser heap benchmark used for large generated modules.
added 5 commits
May 21, 2026 12:53
Internal compiler names still carried String payloads after parser payloads moved to Text. Convert those name fields to Text and adjust consumers that compare, print, or thread them through compiler passes. This keeps internal names in the same representation as parsed names, so later passes do not reintroduce boxed character lists after parsing.
Several common compiler records still stored tiny strict fields through separate heap boxes. Unpack source locations and small AST metadata fields that are carried in large numbers. The field types and semantics stay the same; this only changes the heap layout of values that are already strict.
The parser treats _ as a keyword for ordinary identifiers, but lambda parameters can use it as a throwaway name. After the Text cleanup, that path rejected call arguments containing lambda c, _, err. Accept _ through parameter parsing only, where it is valid. Keep normal identifier diagnostics unchanged and add a parser regression covering the lambda call-argument case.
Literal and token parsers still converted through String or built Text one character at a time. Capture chunks from the Text input directly for identifiers, numbers, string fragments, format specs, quote runs, and escape digits. This keeps source spans stable and makes the affected numeric diagnostics use more precise decimal and hexadecimal digit labels, while reducing intermediate allocation before AST construction. Update the kinds and types bench readers to keep benchmark sources in Text too.
The top-level chunk scanner still advanced through ordinary source, comments, and string text one character at a time. Batch those runs with Text operations while keeping newline, delimiter, interpolation, and escape handling on the existing state-machine paths. The scanner still preserves previous-character state, line-start tracking, continuations, and chunk boundaries, while avoiding per-character work for large ordinary regions of generated modules.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.