Skip to content

Commit 9105455

Browse files
committed
Document PSLR paper compliance updates
1 parent 70b6d17 commit 9105455

1 file changed

Lines changed: 26 additions & 8 deletions

File tree

NEWS.md

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ This adds the following PSLR-related grammar directives and integration points:
5353
- `%symbol-set` declares reusable sets of terminal tokens for PSLR lexical declarations
5454
- `%lex-tie` expands parser-state acceptable-token sets for tied terminals
5555
- `%lex-no-tie` records an explicit no-tie decision for terminals with overlapping token patterns
56+
- `YYLAYOUT*` token patterns are recognized in every parser state and discarded by PSLR-aware lexers
5657
- `%define pslr.max-states` and `%define pslr.max-state-ratio` are Lrama-specific safety guards for state growth
5758
- `%define api.pslr.state-member` names the parser-state field to be shared with the lexer when using the generated helper macros
5859

@@ -73,9 +74,16 @@ Typical usage looks like this:
7374
%lex-prec RANGLE -s RSHIFT
7475
```
7576

76-
In this setup, `%token-pattern` lists the tokens that the PSLR scanner should consider, and `%lex-prec`
77-
resolves conflicts between overlapping matches. For example, `%lex-prec RANGLE -s RSHIFT` tells Lrama to
78-
prefer `RANGLE` over `RSHIFT` when the shorter token should win.
77+
In this setup, `%token-pattern` lists the tokens that the generated pseudo-scanner FSA should consider, and
78+
`%lex-prec` resolves conflicts between overlapping matches. For example, `%lex-prec RANGLE -s RSHIFT` tells
79+
Lrama to prefer `RANGLE` over `RSHIFT` when the shorter token should win.
80+
81+
For normal parser-state scanner rows, unresolved pseudo-scanner conflicts are not resolved by token declaration
82+
order. They are reported as errors so the grammar can add an explicit `%lex-prec`, `%lex-tie`, or `%lex-no-tie`
83+
declaration. Lrama also emits a fallback scanner row for syntax error handling; only that fallback row uses
84+
traditional lexical fallback behavior, choosing the longest match and then token declaration order. If no token
85+
pattern matches at all, the PSLR helper consumes one byte and returns `YYUNDEF` as a character-token fallback, so
86+
error paths do not loop forever.
7987

8088
`%lex-prec` uses ASCII spellings for the PSLR lexical precedence operators:
8189

@@ -103,14 +111,24 @@ Here, `IF` can be considered when the parser state accepts `ID`, but `%lex-tie`
103111
The `%lex-prec ID <~ keywords` declaration resolves the `if` identity conflict in favor of `IF` while keeping
104112
longer identifiers such as `ifx` as `ID`.
105113

114+
`%lex-no-tie` suppresses lexical tie candidate warnings; it does not break a final transitive tie closure. Generic
115+
declarations such as `%lex-no-tie yyall yyall` can suppress broad candidate reports, and a more specific `%lex-tie`
116+
can still tie the relevant token pair.
117+
118+
Token patterns named `YYLAYOUT` or starting with `YYLAYOUT` are layout tokens. They are included in every
119+
parser-state scanner row and should be consumed and skipped by the PSLR-aware lexer instead of being returned to
120+
the parser. The generated helpers include `YYPSLR_TOKEN_IS_LAYOUT(Token)` and the structured
121+
`YYPSLR_PSEUDO_SCAN_RESULT(...)` API for this purpose.
122+
106123
When the parser and lexer share a context through `%parse-param` / `%lex-param`, the generated header also
107124
provides helpers such as `YYPSLR_PSEUDO_SCAN(...)`, so the lexer can choose a token based on the current parser
108-
state.
125+
state. The paper-compatible scanning path needs the lexer to pass the unconsumed input prefix, not only an
126+
already-decided token fragment, so legacy external lexer bridges may still be limited by the text they provide.
109127

110-
The implementation reports unresolved pseudo-scanner conflicts instead of silently resolving them by declaration
111-
order. PSLR support is still experimental. Scoped lexical declarations, lexical nonterminals, `%lex`,
112-
`%token-action`, LAC, fallback rows, character-token fallback, and full layout-token semantics are not implemented
113-
yet. If you find any bugs, please report them.
128+
PSLR parsers enable a lightweight LAC check in the generated parser so syntax errors caused by LR state merging,
129+
default reductions, or `%nonassoc` error actions are detected before user semantic actions are run for the bad
130+
lookahead. PSLR support is still experimental. Scoped lexical declarations, lexical nonterminals, `%lex`, and
131+
`%token-action` are not implemented yet. If you find any bugs, please report them.
114132

115133
## Lrama 0.7.1 (2025-12-24)
116134

0 commit comments

Comments
 (0)