Skip to content

Commit e871b6f

Browse files
committed
Add comprehensive documentation structure with 13 chapters covering concepts, examples, grammar files, and usage
1 parent b5eba1e commit e871b6f

14 files changed

Lines changed: 445 additions & 1 deletion

doc/Index.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
[![Gem Version](https://badge.fury.io/rb/lrama.svg)](https://badge.fury.io/rb/lrama)
44
[![build](https://github.com/ruby/lrama/actions/workflows/test.yaml/badge.svg)](https://github.com/ruby/lrama/actions/workflows/test.yaml)
55

6-
76
## Overview
87

98
Lrama is LALR (1) parser generator written by Ruby. The first goal of this project is providing error tolerant parser for CRuby with minimal changes on CRuby parse.y file.
@@ -47,6 +46,29 @@ Enter the formula:
4746
=> 9
4847
```
4948

49+
## Documentation (Draft)
50+
51+
Chapters are split into individual files under `doc/` to make the structure easy to extend.
52+
53+
1. [Concepts](chapters/01-concepts.md)
54+
2. [Examples](chapters/02-examples.md)
55+
3. [Grammar Files](chapters/03-grammar-files.md)
56+
4. [Parser Interface](chapters/04-parser-interface.md)
57+
5. [Parser Algorithm](chapters/05-parser-algorithm.md)
58+
6. [Error Recovery](chapters/06-error-recovery.md)
59+
7. [Handling Context Dependencies](chapters/07-context-dependencies.md)
60+
8. [Debugging](chapters/08-debugging.md)
61+
9. [Invoking Lrama](chapters/09-invoking-lrama.md)
62+
10. [Parsers in Other Languages](chapters/10-other-languages.md)
63+
11. [History](chapters/11-history.md)
64+
12. [Version Compatibility](chapters/12-version-compatibility.md)
65+
13. [FAQ](chapters/13-faq.md)
66+
67+
## Development
68+
69+
1. [Compressed State Table](development/compressed_state_table/main.md)
70+
2. [Profiling](development/profiling.md)
71+
5072
## Supported Ruby version
5173

5274
Lrama is executed with BASERUBY when building ruby from source code. Therefore Lrama needs to support BASERUBY, currently 3.1, or later version.

doc/chapters/01-concepts.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Concepts
2+
3+
This section introduces the ideas behind Lrama and how it differs from GNU Bison.
4+
Lrama is a Ruby implementation of an LALR(1) parser generator, built to be a
5+
drop-in replacement for the Ruby parser toolchain while keeping compatibility
6+
with Bison-style grammars.
7+
8+
## Lrama at a glance
9+
10+
- **LALR(1) parser generator**: Lrama produces C parsers from grammar files.
11+
- **Bison-style grammar files**: Most Bison directives are accepted, but there
12+
are compatibility constraints (see below).
13+
- **Error tolerant parsing**: Lrama can generate parsers that attempt recovery
14+
using a subset of the algorithm described in *Repairing Syntax Errors in LR
15+
Parsers*.
16+
- **Ruby-focused**: Lrama is written in Ruby and is used in the CRuby build
17+
process.
18+
19+
## Compatibility assumptions
20+
21+
Lrama is not a full Bison reimplementation. It intentionally assumes the
22+
following Bison configuration when reading a grammar file:
23+
24+
- `b4_locations_if` is always true (location tracking is enabled).
25+
- `b4_pure_if` is always true (pure parser).
26+
- `b4_pull_if` is always false (no pull parser interface).
27+
- `b4_lac_if` is always false (no LAC).
28+
29+
These assumptions simplify the code generation path and reflect how CRuby uses
30+
a Bison-compatible parser.
31+
32+
## Inputs and outputs
33+
34+
A typical Lrama run takes a `.y` grammar file and produces:
35+
36+
- A parser implementation in C (default `y.tab.c`, or the file passed by `-o`).
37+
- A header file (`y.tab.h`) when `-d` or `-H` is provided.
38+
- Optional reports (`--report` / `--report-file`).
39+
- Optional syntax diagram output (`--diagram`).
40+
41+
## Workflow stages
42+
43+
1. Write a grammar file (`.y`) using Bison-compatible syntax.
44+
2. Run Lrama to generate the parser C code.
45+
3. Compile the generated C code with the rest of your project.
46+
47+
For worked examples, see the [Examples](02-examples.md) section.

doc/chapters/02-examples.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Examples
2+
3+
This chapter mirrors the structure of the Bison manual examples, but focuses on
4+
what is present in the Lrama repository today.
5+
6+
## Calculator example (sample/calc.y)
7+
8+
The [`sample/calc.y`](../../sample/calc.y) grammar is the canonical example
9+
for running Lrama.
10+
11+
```shell
12+
$ lrama -d sample/calc.y -o calc.c
13+
$ gcc -Wall calc.c -o calc
14+
$ ./calc
15+
```
16+
17+
The grammar demonstrates:
18+
19+
- Declaring tokens and precedence.
20+
- Attaching semantic actions in C.
21+
- Generating a header file with `-d`.
22+
23+
## Minimal parser example (sample/parse.y)
24+
25+
[`sample/parse.y`](../../sample/parse.y) is a smaller grammar intended to be
26+
used by the build instructions and smoke tests.
27+
28+
```shell
29+
$ lrama -d sample/parse.y
30+
```
31+
32+
## Additional grammars
33+
34+
The `sample/` directory includes additional grammars that cover different
35+
syntax styles:
36+
37+
- [`sample/json.y`](../../sample/json.y)
38+
- [`sample/sql.y`](../../sample/sql.y)
39+
40+
These are good starting points when verifying compatibility or experimenting
41+
with new directives.

doc/chapters/03-grammar-files.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Grammar Files
2+
3+
Lrama reads Bison-style grammar files. Each grammar file has four sections in
4+
order:
5+
6+
1. **Prologue**: C code copied verbatim into the generated parser.
7+
2. **Declarations**: Bison-style directives such as `%token` and `%start`.
8+
3. **Grammar rules**: The productions and semantic actions.
9+
4. **Epilogue**: C code appended to the end of the generated parser.
10+
11+
A minimal grammar looks like this:
12+
13+
```yacc
14+
%token INTEGER
15+
%%
16+
input: INTEGER '\n';
17+
%%
18+
```
19+
20+
## Symbols
21+
22+
- **Terminals** are tokens returned by the lexer.
23+
- **Nonterminals** are syntactic groupings defined by rules.
24+
25+
Lrama accepts the common `%token`, `%type`, `%left`, `%right`, and
26+
`%precedence` declarations in the declarations section.
27+
28+
## Rules and actions
29+
30+
Grammar rules use the standard Bison syntax. Semantic actions are C code blocks
31+
that run when a rule is reduced.
32+
33+
```yacc
34+
expr:
35+
expr '+' expr { $$ = $1 + $3; }
36+
| INTEGER { $$ = $1; }
37+
;
38+
```
39+
40+
## Parameterized rules
41+
42+
Lrama extends Bison-style rules with parameterization. A nonterminal definition
43+
may accept other symbols as parameters, allowing you to reuse rule templates.
44+
Parameterized rules are defined with `%rule` and invoked like a nonterminal.
45+
46+
```yacc
47+
%rule option(X)
48+
: /* empty */
49+
| X
50+
;
51+
52+
program:
53+
option(statement)
54+
;
55+
```
56+
57+
When Lrama expands a parameterized rule, it creates a concrete nonterminal
58+
whose name encodes the parameters. The example above expands to a rule named
59+
`option_statement`.
60+
61+
### Parameterized rules in the standard library
62+
63+
Lrama ships a standard library of reusable parameterized rules in
64+
[`lib/lrama/grammar/stdlib.y`](../../lib/lrama/grammar/stdlib.y). Common
65+
patterns include:
66+
67+
- `option(X)`: optional symbol.
68+
- `list(X)`: zero or more repetitions.
69+
- `nonempty_list(X)`: one or more repetitions.
70+
- `separated_list(separator, X)`: separated list with optional empty case.
71+
- `separated_nonempty_list(separator, X)`: separated list with at least one
72+
element.
73+
- `delimited(opening, X, closing)`: wrap a symbol with delimiters.
74+
75+
You can reference these directly by including the standard library in your
76+
grammar or copy them into your own grammar file.
77+
78+
### Semantic values and locations
79+
80+
Parameterized rules support the same semantic action syntax as ordinary rules.
81+
If you add actions to a parameterized rule, the generated nonterminal keeps the
82+
action and location references intact. When you call a parameterized rule, the
83+
resulting nonterminal can be used like any other symbol in subsequent rules.
84+
85+
## Inlining
86+
87+
The `%inline` directive replaces all references to a symbol with its
88+
definition. It is useful for eliminating extra nonterminals, removing
89+
shift/reduce conflicts, or keeping small helper rules from polluting the symbol
90+
list.
91+
92+
```yacc
93+
%inline opt_newline
94+
: /* empty */
95+
| '\n'
96+
;
97+
98+
lines:
99+
lines opt_newline line
100+
| line
101+
;
102+
```
103+
104+
An inline rule does not create a standalone nonterminal in the output. Instead,
105+
its productions are substituted wherever the inline symbol is referenced. This
106+
is why `%inline` is often paired with parameterized rules (for example,
107+
`%inline ioption(X)` in the standard library) to build reusable templates
108+
without growing the symbol table.
109+
110+
## Error recovery
111+
112+
Use `error` tokens in rules and enable recovery with `-e` when generating the
113+
parser. For guidance, see the [Error Recovery](06-error-recovery.md) chapter.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Parser Interface
2+
3+
Lrama generates a C parser that follows the same API style as Bison’s default
4+
C interface. The entry point is `yyparse`, which calls `yylex` to obtain tokens
5+
from the lexer and uses `yyerror` for error reporting.
6+
7+
## Required functions
8+
9+
- `int yylex(void)` returns the next token and sets semantic values.
10+
- `int yyparse(void)` drives the parser.
11+
- `void yyerror(const char *message)` reports syntax errors.
12+
13+
The signatures may vary if you configure `%parse-param` or `%lex-param`
14+
arguments in your grammar.
15+
16+
## Location tracking
17+
18+
Location tracking is always enabled in Lrama’s compatibility model. Use `@n`
19+
for the location of a right-hand side symbol and `@$` for the location of the
20+
left-hand side. Define a location type via `%define api.location.type` or by
21+
customizing the generated code.
22+
23+
## Header generation
24+
25+
Use `-d` or `-H` to emit a header file containing token definitions and shared
26+
structures:
27+
28+
```shell
29+
$ lrama -d sample/parse.y
30+
```
31+
32+
## Pure parser assumptions
33+
34+
Lrama assumes a pure parser (`b4_pure_if` is always true). This means semantic
35+
value and location information are passed explicitly rather than using globals.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Parser Algorithm
2+
3+
Lrama produces LALR(1) parsers. The generated parser uses the standard LR
4+
algorithm with shift/reduce and reduce/reduce conflict resolution.
5+
6+
## Conflicts and precedence
7+
8+
Use `%left`, `%right`, and `%precedence` declarations to resolve
9+
shift/reduce conflicts. Lrama reports conflicts in the `--report` output and
10+
with `-v` (alias for `--report=state`).
11+
12+
## Reports and diagnostics
13+
14+
Lrama can emit detailed state and conflict reports during parser generation.
15+
Common report options include:
16+
17+
- `--report=state`: state machine summary (also `-v`).
18+
- `--report=counterexamples`: generate conflict counterexamples.
19+
- `--report=all`: include all reports.
20+
21+
You can write the report to a file with `--report-file`.
22+
23+
```shell
24+
$ lrama -v --report-file=parser.report sample/parse.y
25+
```
26+
27+
## Error tolerant parsing
28+
29+
When `-e` is supplied, Lrama enables its error recovery extensions. This uses a
30+
subset of the algorithm described in *Repairing Syntax Errors in LR Parsers*.
31+
Refer to [Error Recovery](06-error-recovery.md) for guidance on structuring
32+
rules.

doc/chapters/06-error-recovery.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Error Recovery
2+
3+
Lrama supports error tolerant parsing inspired by the algorithm described in
4+
*Repairing Syntax Errors in LR Parsers*.
5+
6+
## Enabling recovery
7+
8+
Pass `-e` when generating the parser to enable recovery support.
9+
10+
```shell
11+
$ lrama -e sample/parse.y
12+
```
13+
14+
## Writing recovery rules
15+
16+
Use the special `error` token in grammar rules to specify recovery points. A
17+
common pattern is to skip to a statement terminator or newline.
18+
19+
```yacc
20+
statement:
21+
expr ';'
22+
| error ';' { /* discard the rest of the statement */ }
23+
;
24+
```
25+
26+
## Handling recovery in actions
27+
28+
Make sure semantic actions can cope with partially parsed input. Keep actions
29+
small and defensively check inputs for null values when necessary.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Handling Context Dependencies
2+
3+
Some grammars are difficult to express with pure context-free rules.
4+
In these cases, the typical approach is to make the lexer or semantic actions
5+
context aware.
6+
7+
## Token-level context
8+
9+
Emit different tokens depending on parser state. For example, you can track
10+
whether you are inside a type declaration and return a distinct token for
11+
identifiers in that context.
12+
13+
## Semantic predicates
14+
15+
Lrama does not provide Bison-style `%prec` predicates or GLR semantic
16+
predicates. Instead, use regular semantic actions and explicit tokens to keep
17+
state.
18+
19+
## Parameterized rules
20+
21+
Parameterized rules can help express repeated patterns without introducing
22+
ambiguity. Use them to factor context-specific constructs while keeping the
23+
grammar readable. See the [Grammar Files](03-grammar-files.md) chapter.

doc/chapters/08-debugging.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Debugging
2+
3+
Lrama offers both generation-time and runtime diagnostics.
4+
5+
## Generator traces
6+
7+
Use `--trace` to print internal generation traces. Useful values are:
8+
9+
- `automaton`: print state transitions.
10+
- `rules`: print grammar rules.
11+
- `actions`: print rules with semantic actions.
12+
- `time`: report generation time.
13+
- `all`: enable all traces.
14+
15+
```shell
16+
$ lrama --trace=automaton,rules sample/parse.y
17+
```
18+
19+
## Reports
20+
21+
`--report` produces structured reports about states, conflicts, and unused
22+
rules/terminals. See [Parser Algorithm](05-parser-algorithm.md) for details.
23+
24+
## Syntax diagrams
25+
26+
Use `--diagram` to emit an HTML diagram of the grammar rules.
27+
28+
```shell
29+
$ lrama --diagram=diagram.html sample/calc.y
30+
```
31+
32+
The repository includes a sample output in [`sample/diagram.html`](../../sample/diagram.html).

0 commit comments

Comments
 (0)