Skip to content

Commit 024c03a

Browse files
committed
Revise frontmatter grammar, add ⊥ rule
The prior commit added a grammar for frontmatter, but the grammar notation available at the time that commit was prepared couldn't express all of the invariants the language requires. Opening and closing fences must have the same dash count. Indented fences must be rejected as an error. And once an opening fence is recognized, the parser must commit -- it can't backtrack and reinterpret the dashes as tokens. Since then, we've added named range repeats, hard cut, and negative lookahead to the grammar notation. With these, we can express the invariants directly. In this commit, we rewrite the frontmatter grammar. Named range repeats let the closing fence reference the opening fence's dash count. Hard cut commits the parse after the opening dashes. And `FRONTMATTER_INVALID` uses hard cut followed by the bottom rule (`^ ⊥`) to express that indented fences are a recognized-and-rejected syntactic form. We also add `⊥` as a primitive rule in the Notation chapter, move `HORIZONTAL_WHITESPACE` to Whitespace, and fix some minor editorial matters such as indentation and comment style.
1 parent fafe84b commit 024c03a

3 files changed

Lines changed: 38 additions & 14 deletions

File tree

src/frontmatter.md

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,35 @@ r[frontmatter]
44
r[frontmatter.syntax]
55
```grammar,lexer
66
@root FRONTMATTER ->
7-
FRONTMATTER_FENCE HORIZONTAL_WHITESPACE* INFOSTRING? HORIZONTAL_WHITESPACE* LF
8-
(FRONTMATTER_LINE LF )*
9-
FRONTMATTER_FENCE[^matched-fence] HORIZONTAL_WHITESPACE* LF
7+
WHITESPACE_ONLY_LINE*
8+
!FRONTMATTER_INVALID
9+
FRONTMATTER_MAIN
1010
11-
FRONTMATTER_FENCE -> `-`{3..255}
11+
WHITESPACE_ONLY_LINE -> (!LF WHITESPACE)* LF
1212
13-
INFOSTRING -> (XID_Start | `_`) ( XID_Continue | `-` | `.` )*
13+
FRONTMATTER_INVALID -> (!LF WHITESPACE)+ `---` ^ ⊥
1414
15-
FRONTMATTER_LINE -> (~INVALID_FRONTMATTER_LINE_START (~INVALID_FRONTMATTER_LINE_CONTINUE)*)?
15+
FRONTMATTER_MAIN ->
16+
`-`{n:3..=255} ^ FRONTMATTER_REST
1617
17-
INVALID_FRONTMATTER_LINE_START -> (FRONTMATTER_FENCE[^escaped-fence] | CR | LF)
18+
FRONTMATTER_REST ->
19+
FRONTMATTER_FENCE_START
20+
FRONTMATTER_LINE*
21+
FRONTMATTER_FENCE_END
1822
19-
INVALID_FRONTMATTER_LINE_CONTINUE -> CR | LF
23+
FRONTMATTER_FENCE_START ->
24+
MAYBE_INFOSTRING_OR_WS LF
2025
21-
HORIZONTAL_WHITESPACE ->
22-
U+0009 // horizontal tab, `'\t'`
23-
| U+0020 // space, `' '`
24-
```
26+
FRONTMATTER_FENCE_END ->
27+
`-`{n} HORIZONTAL_WHITESPACE* ( LF | EOF )
28+
29+
FRONTMATTER_LINE -> !`-`{n} ~[LF CR]* LF
2530
26-
[^matched-fence]: The closing fence must have the same number of `-` as the opening fence
27-
[^escaped-fence]: A `FRONTMATTER_FENCE` at the beginning of a `FRONTMATTER_LINE` is only invalid if it has the same or more `-` as the `FRONTMATTER_FENCE`
31+
MAYBE_INFOSTRING_OR_WS ->
32+
HORIZONTAL_WHITESPACE* INFOSTRING? HORIZONTAL_WHITESPACE*
33+
34+
INFOSTRING -> (XID_Start | `_`) ( XID_Continue | `-` | `.` )*
35+
```
2836

2937
r[frontmatter.intro]
3038
Frontmatter is an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar.

src/notation.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,18 @@ Mizushima et al. introduced [cut operators][cut operator paper] to parsing expre
4545

4646
The hard cut operator is necessary because some tokens in Rust begin with a prefix that is itself a valid token. For example, `c"` begins a C string literal, but `c` alone is a valid identifier. Without the cut, if `c"\0"` failed to lex as a C string literal (because null bytes are not allowed in C strings), the parser could backtrack and lex it as two tokens: the identifier `c` and the string literal `"\0"`. The [cut after `c"`] prevents this --- once the opening delimiter is recognized, the parser cannot go back. The same reasoning applies to [byte literals], [byte string literals], [raw string literals], and other literals with prefixes that are themselves valid tokens.
4747

48+
r[notation.grammar.bottom]
49+
### The bottom rule
50+
51+
In logic, ⊥ (*bottom*) represents *absurdity* --- a proposition that is always false. In type theory, it is the *empty type* --- a type with no inhabitants. The grammar borrows both senses: the rule ⊥ matches nothing --- not any character, not even the end of input.
52+
53+
```grammar,notation
54+
// The bottom rule does not match anything.
55+
⊥ -> !(CHAR | EOF)
56+
```
57+
58+
Placed after a [hard cut operator], ⊥ makes a rule fail unconditionally once the parser has committed past the cut. This gives the grammar a way to express recognition without acceptance. The parser identifies the input, commits so that no other alternative can be tried, and then rejects it. In the frontmatter grammar, for example, [FRONTMATTER_INVALID] uses `^ ⊥` to recognize an opening fence preceded by whitespace on the same line --- input that is close enough to frontmatter to rule out other interpretations but is not valid.
59+
4860
r[notation.grammar.string-tables]
4961
### String table productions
5062

src/whitespace.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ WHITESPACE ->
1616
| U+2028 // Line separator
1717
| U+2029 // Paragraph separator
1818
19+
HORIZONTAL_WHITESPACE ->
20+
U+0009 // Horizontal tab, `'\t'`
21+
| U+0020 // Space, `' '`
22+
1923
TAB -> U+0009 // Horizontal tab, `'\t'`
2024
2125
LF -> U+000A // Line feed, `'\n'`

0 commit comments

Comments
 (0)