Skip to content

Commit 3780dc2

Browse files
committed
docs: document experimental PSLR support and paper deviations
1 parent 07f6da0 commit 3780dc2

1 file changed

Lines changed: 72 additions & 0 deletions

File tree

NEWS.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,78 @@ program: args_list(f_opt(number), opt_tail(string), number)
4040

4141
https://github.com/ruby/lrama/pull/779
4242

43+
### [EXPERIMENTAL] Support core PSLR(1) parser generation
44+
45+
Added experimental support for generating a PSLR(1)-style parser based on this dissertation.
46+
https://open.clemson.edu/all_dissertations/519/
47+
48+
This adds the following PSLR-related grammar directives and integration points:
49+
50+
- `%define lr.type pslr` enables PSLR parser generation
51+
- `%token-pattern` declares token candidates and their regular expressions for PSLR-aware lexical disambiguation
52+
- `%lex-prec` declares explicit lexical precedence for overlapping token patterns
53+
- `%symbol-set` declares reusable sets of terminal tokens for PSLR lexical declarations
54+
- `%lex-tie` expands parser-state acceptable-token sets for tied terminals
55+
- `%lex-no-tie` records an explicit no-tie decision for terminals with overlapping token patterns
56+
- `%define pslr.max-states` and `%define pslr.max-state-ratio` are Lrama-specific safety guards for state growth
57+
- `%define api.pslr.state-member` names the parser-state field to be shared with the lexer when using the generated helper macros
58+
59+
Typical usage looks like this:
60+
61+
```yacc
62+
%define api.pure
63+
%define lr.type pslr
64+
%define api.pslr.state-member current_state
65+
66+
%parse-param {struct parse_params *p}
67+
%lex-param {struct parse_params *p}
68+
69+
%token-pattern RSHIFT />>/ "right shift"
70+
%token-pattern RANGLE />/ "right angle"
71+
%token-pattern ID /[a-z]+/
72+
73+
%lex-prec RANGLE -s RSHIFT
74+
```
75+
76+
In this setup, `%token-pattern` lists the tokens that the PSLR scanner should consider, and `%lex-prec`
77+
resolves conflicts between overlapping matches. For example, `%lex-prec RANGLE -s RSHIFT` tells Lrama to
78+
prefer `RANGLE` over `RSHIFT` when the shorter token should win.
79+
80+
`%lex-prec` uses ASCII spellings for the PSLR lexical precedence operators:
81+
82+
| Lrama | Meaning |
83+
|---|---|
84+
| `<~` | identity conflict: right token wins; length conflict: longest match wins |
85+
| `<-` | identity conflict: right token wins |
86+
| `-~` | length conflict: longest match wins |
87+
| `<<` | identity and length conflicts: right token wins |
88+
| `-<` | length conflict: right token wins |
89+
| `<s` | identity conflict: right token wins; length conflict: shortest match wins |
90+
| `-s` | length conflict: shortest match wins |
91+
92+
Lexical ties are separate from precedence. For example:
93+
94+
```yacc
95+
%token-pattern IF /if/
96+
%token-pattern ID /[a-z]+/
97+
%symbol-set keywords IF
98+
%lex-tie ID keywords
99+
%lex-prec ID <~ keywords
100+
```
101+
102+
Here, `IF` can be considered when the parser state accepts `ID`, but `%lex-tie` does not choose a winner.
103+
The `%lex-prec ID <~ keywords` declaration resolves the `if` identity conflict in favor of `IF` while keeping
104+
longer identifiers such as `ifx` as `ID`.
105+
106+
When the parser and lexer share a context through `%parse-param` / `%lex-param`, the generated header also
107+
provides helpers such as `YYPSLR_PSEUDO_SCAN(...)`, so the lexer can choose a token based on the current parser
108+
state.
109+
110+
The implementation reports unresolved pseudo-scanner conflicts instead of silently resolving them by declaration
111+
order. PSLR support is still experimental. Scoped lexical declarations, lexical nonterminals, `%lex`,
112+
`%token-action`, LAC, fallback rows, character-token fallback, and full layout-token semantics are not implemented
113+
yet. If you find any bugs, please report them.
114+
43115
## Lrama 0.7.1 (2025-12-24)
44116

45117
### Optimize IELR

0 commit comments

Comments
 (0)