Skip to content

Change String to Text types in AST and source#2786

Open
plajjan wants to merge 6 commits into
mainfrom
src-ast-text
Open

Change String to Text types in AST and source#2786
plajjan wants to merge 6 commits into
mainfrom
src-ast-text

Conversation

@plajjan
Copy link
Copy Markdown
Contributor

@plajjan plajjan commented May 15, 2026

No description provided.

@plajjan plajjan force-pushed the src-ast-text branch 7 times, most recently from 5105d49 to 5bd19b0 Compare May 20, 2026 22:37
Parser and source-carrying compiler paths still stored many
source-derived payloads as String. Convert those payloads to Text so
the parsed AST can retain source slices instead of expanding each
identifier, literal, and diagnostic payload into boxed character lists.

Update consumers that print, serialize, complete, and report those
values. Add the parser heap benchmark used for large generated
modules.
Kristian Larsson added 5 commits May 21, 2026 12:53
Internal compiler names still carried String payloads after parser
payloads moved to Text. Convert those name fields to Text and adjust
consumers that compare, print, or thread them through compiler passes.

This keeps internal names in the same representation as parsed names,
so later passes do not reintroduce boxed character lists after parsing.
Several common compiler records still stored tiny strict fields through
separate heap boxes. Unpack source locations and small AST metadata
fields that are carried in large numbers.

The field types and semantics stay the same; this only changes the heap
layout of values that are already strict.
The parser treats _ as a keyword for ordinary identifiers, but lambda
parameters can use it as a throwaway name. After the Text cleanup, that
path rejected call arguments containing lambda c, _, err.

Accept _ through parameter parsing only, where it is valid. Keep normal
identifier diagnostics unchanged and add a parser regression covering
the lambda call-argument case.
Literal and token parsers still converted through String or built Text
one character at a time. Capture chunks from the Text input directly
for identifiers, numbers, string fragments, format specs, quote runs,
and escape digits.

This keeps source spans stable and makes the affected numeric
diagnostics use more precise decimal and hexadecimal digit labels,
while reducing intermediate allocation before AST construction. Update the
kinds and types bench readers to keep benchmark sources in Text too.
The top-level chunk scanner still advanced through ordinary source,
comments, and string text one character at a time. Batch those runs with
Text operations while keeping newline, delimiter, interpolation, and
escape handling on the existing state-machine paths.

The scanner still preserves previous-character state, line-start
tracking, continuations, and chunk boundaries, while avoiding
per-character work for large ordinary regions of generated modules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant