@@ -40,6 +40,78 @@ program: args_list(f_opt(number), opt_tail(string), number)
4040
4141https://github.com/ruby/lrama/pull/779
4242
43+ ### [ EXPERIMENTAL] Support core PSLR(1) parser generation
44+
45+ Added experimental support for generating a PSLR(1)-style parser based on this dissertation.
46+ https://open.clemson.edu/all_dissertations/519/
47+
48+ This adds the following PSLR-related grammar directives and integration points:
49+
50+ - ` %define lr.type pslr ` enables PSLR parser generation
51+ - ` %token-pattern ` declares token candidates and their regular expressions for PSLR-aware lexical disambiguation
52+ - ` %lex-prec ` declares explicit lexical precedence for overlapping token patterns
53+ - ` %symbol-set ` declares reusable sets of terminal tokens for PSLR lexical declarations
54+ - ` %lex-tie ` expands parser-state acceptable-token sets for tied terminals
55+ - ` %lex-no-tie ` records an explicit no-tie decision for terminals with overlapping token patterns
56+ - ` %define pslr.max-states ` and ` %define pslr.max-state-ratio ` are Lrama-specific safety guards for state growth
57+ - ` %define api.pslr.state-member ` names the parser-state field to be shared with the lexer when using the generated helper macros
58+
59+ Typical usage looks like this:
60+
61+ ``` yacc
62+ %define api.pure
63+ %define lr.type pslr
64+ %define api.pslr.state-member current_state
65+
66+ %parse-param {struct parse_params *p}
67+ %lex-param {struct parse_params *p}
68+
69+ %token-pattern RSHIFT />>/ "right shift"
70+ %token-pattern RANGLE />/ "right angle"
71+ %token-pattern ID /[a-z]+/
72+
73+ %lex-prec RANGLE -s RSHIFT
74+ ```
75+
76+ In this setup, ` %token-pattern ` lists the tokens that the PSLR scanner should consider, and ` %lex-prec `
77+ resolves conflicts between overlapping matches. For example, ` %lex-prec RANGLE -s RSHIFT ` tells Lrama to
78+ prefer ` RANGLE ` over ` RSHIFT ` when the shorter token should win.
79+
80+ ` %lex-prec ` uses ASCII spellings for the PSLR lexical precedence operators:
81+
82+ | Lrama | Meaning |
83+ | ---| ---|
84+ | ` <~ ` | identity conflict: right token wins; length conflict: longest match wins |
85+ | ` <- ` | identity conflict: right token wins |
86+ | ` -~ ` | length conflict: longest match wins |
87+ | ` << ` | identity and length conflicts: right token wins |
88+ | ` -< ` | length conflict: right token wins |
89+ | ` <s ` | identity conflict: right token wins; length conflict: shortest match wins |
90+ | ` -s ` | length conflict: shortest match wins |
91+
92+ Lexical ties are separate from precedence. For example:
93+
94+ ``` yacc
95+ %token-pattern IF /if/
96+ %token-pattern ID /[a-z]+/
97+ %symbol-set keywords IF
98+ %lex-tie ID keywords
99+ %lex-prec ID <~ keywords
100+ ```
101+
102+ Here, ` IF ` can be considered when the parser state accepts ` ID ` , but ` %lex-tie ` does not choose a winner.
103+ The ` %lex-prec ID <~ keywords ` declaration resolves the ` if ` identity conflict in favor of ` IF ` while keeping
104+ longer identifiers such as ` ifx ` as ` ID ` .
105+
106+ When the parser and lexer share a context through ` %parse-param ` / ` %lex-param ` , the generated header also
107+ provides helpers such as ` YYPSLR_PSEUDO_SCAN(...) ` , so the lexer can choose a token based on the current parser
108+ state.
109+
110+ The implementation reports unresolved pseudo-scanner conflicts instead of silently resolving them by declaration
111+ order. PSLR support is still experimental. Scoped lexical declarations, lexical nonterminals, ` %lex ` ,
112+ ` %token-action ` , LAC, fallback rows, character-token fallback, and full layout-token semantics are not implemented
113+ yet. If you find any bugs, please report them.
114+
43115## Lrama 0.7.1 (2025-12-24)
44116
45117### Optimize IELR
0 commit comments