Date: 2026-06-11
Accepted
Implemented on 2026-06-12. internal/syntax owns the shared tokenizer,
internal/parser.ParseSyntax consumes the shared token rules behind the
gwdkast.File seam, page/component/layout entry points lower that AST into
gwdkir, parser diagnostics accumulate across declaration and block boundaries,
and the former internal/parser lexLine path has been removed.
The compiler front-end is line-oriented. internal/parser.ParseSyntax reads
source with a bufio.Scanner, matches patterns against each trimmed line
(internal/parser/patterns.go lexLine), tracks nesting with a separate stateful
brace scanner (internal/parser/braces.go), and returns on the first syntax
error with no recovery. Source positions are 1-based line/column with no byte
offset, so many spans are line-wide approximations (sourceLineSpan). The
formatter (internal/lang/format.go) is independent whitespace-only string
manipulation that counts braces without skipping strings or comments.
This single foundation is the upstream constraint behind most of the deferred parser/formatter/diagnostics work (#250): error recovery, an AST-backed formatter, exact token spans, and granular per-construct diagnostic codes are all downstream of having a real token stream and a node-producing parser. Right now the line-oriented parser is deferred by omission rather than by an explicit decision.
Two facts make the direction clear rather than open-ended:
- The documented target pipeline (
docs/compiler/pipeline.md) already names alex/parse full AST -> semantic analysis -> stable internal IRfront-end. This ADR makes explicit the parser-internals decision that target already implies. - A real character-level tokenizer already exists.
internal/lang.Lex(internal/lang/lexer.go) scans runes into typed tokens with line/column positions, but only editor and CLI tooling consume it. The compiler parser ignores it and re-lexes per line. The codebase therefore maintains two divergent front-ends for the same language.
Crucially, the typed AST is already a stable seam. internal/parser.ParseSyntax
produces the internal/gwdkast AST, and every downstream pass
(internal/gwdkanalysis lowering to internal/gwdkir.Program, validation, and
generation) consumes that AST. The parser can be replaced behind that seam
without disturbing IR, validation, reports, or codegen.
Commit to a single shared tokenizer and a recursive-descent parser with error
recovery, producing the existing internal/gwdkast AST. Migrate incrementally
behind the AST seam.
Concretely:
- One tokenizer. Promote the
internal/langrune scanner into the shared lexer that both the compiler parser and editor/CLI tooling consume. Retire the per-linelexLinepath ininternal/parser. There is one lexical definition of.gwdk, not two. - Recursive-descent parser over tokens. Parse the token stream into
gwdkast.Filewith explicit declaration, block, and view productions instead of line-pattern matching. The brace scanner's string/comment/template state becomes ordinary lexer state rather than a separate counter. - Custom grammar for
.gwdk, the real Go parser for embedded Go. The recursive-descent parser owns only the framework grammar — package, imports, uses, metadata, blocks, view markup, contracts, and endpoints. Wherever a construct embeds Go —go {}/client {}block bodies and thepkg.Type/pkg.NewFn()references instore/props/statecontracts — the parser delegates togo/parser(go/ast) on the extracted source span rather than re-implementing Go lexing or parsing. The framework tokenizer only locates the boundaries (e.g. the=separating a contract type from its initializer); the Go operands are handed to the Go parser, which is then constrained to the shapes the language accepts (a singlepkg.Nameselector, a zero-argument constructor call). This keeps one definition of Go syntax — the Go toolchain's — and means generics, multi-segment selectors, and call arguments are recognized and accepted or rejected by Go's own grammar, not a hand-rolled approximation. - Error recovery. The parser synchronizes at top-level declaration boundaries and block braces so one syntax error does not hide the rest of the file. It accumulates diagnostics instead of returning on the first error.
- Exact spans. Tokens carry byte offsets (ADR depends on #294), so AST nodes and diagnostics get exact token ranges instead of line-wide approximations.
- AST is the frozen seam.
internal/gwdkast.Fileis the contract. The new parser must produce the same AST as the line-oriented parser for the currently supported subset;gwdkanalysis,gwdkir, validation, reports, and codegen do not change as part of this work. - Formatter follows. Once the parser yields full nodes, the AST-backed
formatter deferred in #250 becomes possible and replaces line-oriented
format.go. Until then, the line-oriented formatter keeps its documented limits (see #296).
Migration is incremental and non-breaking. The line-oriented parser keeps working
while the new parser is built to produce identical gwdkast.File output for the
supported subset, gated by golden AST-equivalence tests and the language
conformance corpus (#295). Cutover happens per declaration kind once equivalence
holds, then the line-oriented path and lexLine are removed.
- One lexical and grammatical definition of
.gwdkshared by the compiler and the language server, instead of a line parser plus a separate tooling lexer. - Error recovery, exact spans, AST-backed formatting, and granular diagnostic codes become reachable; #250 stops being blocked by the front-end.
- Diagnostics point at tokens rather than whole lines, improving CLI output and LSP precision.
- Braces inside strings, comments, and template literals are handled by lexer state, removing a class of parser and formatter miscounts by construction.
- A recursive-descent parser plus recovery is materially more code than the current line parser, and the migration must preserve AST output exactly to stay non-breaking.
- Equivalence testing across every declaration kind is required before cutover; this is real up-front cost before any user-visible benefit lands.
- Recovery and span precision depend on byte offsets (#294) landing first.
- The public language surface does not change. This is a front-end implementation decision, not a grammar change; the conformance corpus (#295) pins behavior across the migration.
- Downstream passes are untouched because the AST seam is stable.
- Keep the line-oriented parser, document its limits. Lowest cost, but permanently caps span precision, error recovery, and AST-backed formatting, and keeps two divergent front-ends. Rejected: it contradicts the already-documented target pipeline and leaves #250 structurally blocked.
- Adopt a parser generator or third-party combinator library (ANTLR, participle, goyacc). Rejected: adds a dependency and a generated/runtime layer against the project's lean-dependency stance, and a hand-written recursive-descent parser gives better control over recovery and diagnostics for a small surface language.
- Incremental/streaming parser from day one. Useful for an editor, but premature. The AST seam lets an incremental layer be added later without another front-end decision.
- Hand-roll Go lexing/parsing for embedded Go. Re-implementing qualified
identifiers, call expressions, and (eventually) type expressions inside the
.gwdktokenizer would duplicate a moving target and drift fromgo/buildsemantics. Rejected:go/parseralready parses Go exactly, so embedded Go is delegated to it and only the framework-level boundaries are tokenized here.
- #294 (byte offsets in source positions) is the prerequisite; land it first.
- Build the shared tokenizer by promoting
internal/lang's scanner; retireinternal/parserlexLine. - Build the recursive-descent parser to
gwdkast.Filewith recovery, gated by golden AST-equivalence tests and the conformance corpus (#295). - Cut over per declaration kind; remove the line-oriented parser when equivalence holds across the supported subset.
- AST-backed formatter and granular per-construct diagnostic codes (#250) consume the new parser; #296 is the interim formatter guard.
- Link this ADR from the #250 deferral so the line-oriented limitation is a conscious choice with a committed exit.
- Keep
docs/compiler/pipeline.mdanddocs/engineering/architecture.mdaligned as the migration proceeds.