-
-
Notifications
You must be signed in to change notification settings - Fork 128
Expand file tree
/
Copy pathsyntax.Rmd
More file actions
506 lines (367 loc) · 16.9 KB
/
syntax.Rmd
File metadata and controls
506 lines (367 loc) · 16.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
# Language Syntax
This chapter defines the basic syntax of the Stan modeling language
using a Backus-Naur form (BNF) grammar plus extra-grammatical
constraints on function typing and operator precedence and
associativity.
## BNF grammars
### Syntactic conventions {-}
In the following BNF grammars, tokens are represented in ALLCAPS.
Grammar non-terminals are surrounded by `<` and `>`.
A square brackets (`[A]`) indicates optionality of `A`.
A postfixed Kleene star (`A*`) indicates zero or more occurrences
of `A`.
Parenthesis can be used to group symbols together in productions.
Finally, this grammar uses the concept of "parameterized nonterminals"
as used in the parsing library
[Menhir](http://gallium.inria.fr/~fpottier/menhir/manual.html#sec30).
A rule like `<list(x)> ::= x (COMMA x)*` declares a generic list rule,
which can later be applied to others by the symbol `<list(<expression>)>`.
The following representation is constructed directly from the OCaml
[reference
parser](https://github.com/stan-dev/stanc3/blob/release/v2.27.0/src/frontend/parser.mly)
using a tool called [Obelisk](https://github.com/Lelio-Brun/Obelisk).
The raw output is available [here](grammar.txt).
<!-- This is the direct output of `obelisk -i parser.mly`
copied and pasted into reasonable section distinctions.
Additionally, items which exist only for error messaging
were removed, like allowing FUNCTIONBLOCK as a decl identifier
-->
### Programs {-}
\fontsize{9pt}{9.2}\selectfont
```
<program> ::= [<function_block>] [<data_block>] [<transformed_data_block>]
[<parameters_block>] [<transformed_parameters_block>]
[<model_block>] [<generated_quantities_block>] EOF
<function_block> ::= FUNCTIONBLOCK LBRACE <function_def>* RBRACE
<data_block> ::= DATABLOCK LBRACE <top_var_decl_no_assign>* RBRACE
<transformed_data_block> ::= TRANSFORMEDDATABLOCK LBRACE
<top_vardecl_or_statement>* RBRACE
<parameters_block> ::= PARAMETERSBLOCK LBRACE <top_var_decl_no_assign>*
RBRACE
<transformed_parameters_block> ::= TRANSFORMEDPARAMETERSBLOCK LBRACE
<top_vardecl_or_statement>* RBRACE
<model_block> ::= MODELBLOCK LBRACE <vardecl_or_statement>* RBRACE
<generated_quantities_block> ::= GENERATEDQUANTITIESBLOCK LBRACE
<top_vardecl_or_statement>* RBRACE
```
\normalsize
### Function declarations and definitions {-}
\fontsize{9pt}{9.2}\selectfont
```
<function_def> ::= <return_type> <decl_identifier> LPAREN [<arg_decl> (COMMA
<arg_decl>)*] RPAREN <statement>
<return_type> ::= VOID
| <unsized_type>
<arg_decl> ::= [DATABLOCK] <unsized_type> <decl_identifier>
<unsized_type> ::= ARRAY <unsized_dims> <basic_type>
| <basic_type> [<unsized_dims>]
<basic_type> ::= INT
| REAL
| COMPLEX
| VECTOR
| ROWVECTOR
| MATRIX
<unsized_dims> ::= LBRACK COMMA* RBRACK
```
\normalsize
### Variable declarations and compound definitions {-}
\fontsize{9pt}{9.2}\selectfont
```
<identifier> ::= IDENTIFIER
| TRUNCATE
| OFFSET
| MULTIPLIER
| LOWER
| UPPER
| ARRAY
<decl_identifier> ::= <identifier>
<no_assign> ::= UNREACHABLE
<optional_assignment(rhs)> ::= [ASSIGN rhs]
<id_and_optional_assignment(rhs)> ::= <decl_identifier>
<optional_assignment(rhs)>
<decl(type_rule, rhs)> ::= type_rule <decl_identifier> <dims>
<optional_assignment(rhs)> SEMICOLON
| [<lhs>] type_rule
<id_and_optional_assignment(rhs)> (COMMA
<id_and_optional_assignment(rhs)>)* SEMICOLON
<var_decl> ::= <decl(<sized_basic_type>, <expression>)>
<top_var_decl> ::= <decl(<top_var_type>, <expression>)>
<top_var_decl_no_assign> ::= <decl(<top_var_type>, <no_assign>)>
<vardecl_or_statement> ::= <statement>
| <var_decl>
<top_vardecl_or_statement> ::= <statement>
| <top_var_decl>
<sized_basic_type> ::= INT
| REAL
| VECTOR LBRACK <expression> RBRACK
| ROWVECTOR LBRACK <expression> RBRACK
| MATRIX LBRACK <expression> COMMA <expression> RBRACK
<top_var_type> ::= INT [LABRACK <range> RABRACK]
| REAL <type_constraint>
| VECTOR <type_constraint> LBRACK <expression> RBRACK
| ROWVECTOR <type_constraint> LBRACK <expression> RBRACK
| MATRIX <type_constraint> LBRACK <expression> COMMA
<expression> RBRACK
| ORDERED LBRACK <expression> RBRACK
| POSITIVEORDERED LBRACK <expression> RBRACK
| SIMPLEX LBRACK <expression> RBRACK
| UNITVECTOR LBRACK <expression> RBRACK
| CHOLESKYFACTORCORR LBRACK <expression> RBRACK
| CHOLESKYFACTORCOV LBRACK <expression> [COMMA <expression>]
RBRACK
| CORRMATRIX LBRACK <expression> RBRACK
| COVMATRIX LBRACK <expression> RBRACK
<type_constraint> ::= [LABRACK <range> RABRACK]
| LABRACK <offset_mult> RABRACK
<range> ::= LOWER ASSIGN <constr_expression> COMMA UPPER ASSIGN
<constr_expression>
| UPPER ASSIGN <constr_expression> COMMA LOWER ASSIGN
<constr_expression>
| LOWER ASSIGN <constr_expression>
| UPPER ASSIGN <constr_expression>
<offset_mult> ::= OFFSET ASSIGN <constr_expression> COMMA MULTIPLIER ASSIGN
<constr_expression>
| MULTIPLIER ASSIGN <constr_expression> COMMA OFFSET ASSIGN
<constr_expression>
| OFFSET ASSIGN <constr_expression>
| MULTIPLIER ASSIGN <constr_expression>
<dims> ::= LBRACK <expression> (COMMA <expression>)* RBRACK
```
\normalsize
### Expressions {-}
\fontsize{9pt}{9.2}\selectfont
```
<expression> ::= <lhs>
| <non_lhs>
<lhs> ::= <identifier>
| <lhs> LBRACK <indexes> RBRACK
<non_lhs> ::= <expression> QMARK <expression> COLON <expression>
| <expression> <infixOp> <expression>
| <prefixOp> <expression>
| <expression> <postfixOp>
| <non_lhs> LBRACK <indexes> RBRACK
| <common_expression>
<constr_expression> ::= <constr_expression> <arithmeticBinOp>
<constr_expression>
| <prefixOp> <constr_expression>
| <constr_expression> <postfixOp>
| <constr_expression> LBRACK <indexes> RBRACK
| <common_expression>
| <identifier>
<common_expression> ::= INTNUMERAL
| REALNUMERAL
| IMAGNUMERAL
| LBRACE <expression> (COMMA <expression>)* RBRACE
| LBRACK [<expression> (COMMA <expression>)*] RBRACK
| <identifier> LPAREN [<expression> (COMMA
<expression>)*] RPAREN
| TARGET LPAREN RPAREN
| GETLP LPAREN RPAREN
| <identifier> LPAREN <expression> BAR [<expression>
(COMMA <expression>)*] RPAREN
| LPAREN <expression> RPAREN
<prefixOp> ::= BANG
| MINUS
| PLUS
<postfixOp> ::= TRANSPOSE
<infixOp> ::= <arithmeticBinOp>
| <logicalBinOp>
<arithmeticBinOp> ::= PLUS
| MINUS
| TIMES
| DIVIDE
| IDIVIDE
| MODULO
| LDIVIDE
| ELTTIMES
| ELTDIVIDE
| HAT
| ELTPOW
<logicalBinOp> ::= OR
| AND
| EQUALS
| NEQUALS
| LABRACK
| LEQ
| RABRACK
| GEQ
<indexes> ::= epsilon
| COLON
| <expression>
| <expression> COLON
| COLON <expression>
| <expression> COLON <expression>
| <indexes> COMMA <indexes>
<printables> ::= <expression>
| <string_literal>
| <printables> COMMA <printables>
```
\normalsize
### Statements {-}
\fontsize{9pt}{9.2}\selectfont
```
<statement> ::= <atomic_statement>
| <nested_statement>
<atomic_statement> ::= <lhs> <assignment_op> <expression> SEMICOLON
| <identifier> LPAREN [<expression> (COMMA
<expression>)*] RPAREN SEMICOLON
| INCREMENTLOGPROB LPAREN <expression> RPAREN SEMICOLON
| <expression> TILDE <identifier> LPAREN [<expression>
(COMMA <expression>)*] RPAREN [<truncation>] SEMICOLON
| TARGET PLUSASSIGN <expression> SEMICOLON
| BREAK SEMICOLON
| CONTINUE SEMICOLON
| PRINT LPAREN <printables> RPAREN SEMICOLON
| REJECT LPAREN <printables> RPAREN SEMICOLON
| RETURN <expression> SEMICOLON
| RETURN SEMICOLON
| SEMICOLON
<assignment_op> ::= ASSIGN
| ARROWASSIGN
| PLUSASSIGN
| MINUSASSIGN
| TIMESASSIGN
| DIVIDEASSIGN
| ELTTIMESASSIGN
| ELTDIVIDEASSIGN
<string_literal> ::= STRINGLITERAL
<truncation> ::= TRUNCATE LBRACK [<expression>] COMMA [<expression>] RBRACK
<nested_statement> ::= IF LPAREN <expression> RPAREN <statement> ELSE
<statement>
| IF LPAREN <expression> RPAREN <statement>
| WHILE LPAREN <expression> RPAREN <statement>
| FOR LPAREN <identifier> IN <expression> COLON
<expression> RPAREN <statement>
| FOR LPAREN <identifier> IN <expression> RPAREN
<statement>
| PROFILE LPAREN <string_literal> RPAREN LBRACE
<vardecl_or_statement>* RBRACE
| LBRACE <vardecl_or_statement>* RBRACE
```
\normalsize
## Tokenizing rules
Many of the tokens used in the BNF grammars follow obviously
from their names: `DATABLOCK` is the literal string 'data',
`COMMA` is a single ',' character, etc. The literal representation
of each operator is additionally provided in the [operator
precedence table](#operator-precedence-table).
A few tokens are not so obvious, and are defined here in
regular expressions:
```
IDENTIFIER = [a-zA-Z] [a-zA-Z0-9_]*
STRINGLITERAL = ".*"
INTNUMERAL = [0-9]+ (_ [0-9]+)*
EXPLITERAL = [eE] [+-]? INTNUMERAL
REALNUMERAL = INTNUMERAL \. INTNUMERAL? EXPLITERAL?
| \. INTNUMERAL EXPLITERAL?
| INTNUMERAL EXPLITERAL
IMAGNUMERAL = (REALNUMERAL | INTNUMERAL) i
```
## Extra-grammatical constraints
### Type constraints {-}
A well-formed Stan program must satisfy the type constraints imposed
by functions and distributions. For example, the binomial
distribution requires an integer total count parameter and integer
variate and when truncated would require integer truncation points.
If these constraints are violated, the program will be rejected during
compilation with an error message indicating the location of the problem.
### Operator precedence and associativity {-}
In the Stan grammar provided in this chapter, the expression `1 + 2 *
3` has two parses. As described in the [operator precedence
table](#operator-precedence-table), Stan disambiguates between the meaning $1
+ (2 \times 3)$ and the meaning $(1 + 2) \times 3$ based on operator
precedences and associativities.
### Typing of compound declaration and definition {-}
In a compound variable declaration and definition, the type of the
right-hand side expression must be assignable to the variable being
declared. The assignability constraint restricts compound
declarations and definitions to local variables and variables declared
in the transformed data, transformed parameters, and generated
quantities blocks.
### Typing of array expressions {-}
The types of expressions used for elements in array expressions
(`'{' expressions '}'`) must all be of the same type or a mixture
of scalar (`int`, `real` and `complex`) types (in which case the result
is promoted to be of the highest type on the `int -> real -> complex`
hierarchy).
### Forms of numbers {-}
Integer literals longer than one digit may not start with 0 and real
literals cannot consist of only a period or only an exponent.
### Conditional arguments {-}
Both the conditional if-then-else statement and while-loop statement
require the expression denoting the condition to be a primitive type,
integer or real.
### For loop containers {-}
The for loop statement requires that we specify in addition to the
loop identifier, either a range consisting of two expressions
denoting an integer, separated by ':', or a single expression denoting
a container. The loop variable will be of type integer in the former case
and of the contained type in the latter case. Furthermore, the loop
variable must not be in scope (i.e., there is no masking of variables).
### Print arguments {-}
The arguments to a print statement cannot be void.
### Only break and continue in loops {-}
The `break` and `continue` statements may only be used
within the body of a for-loop or while-loop.
### PRNG function locations {-}
Functions ending in `_rng` may only be called in the transformed
data and generated quantities block, and within the bodies of
user-defined functions with names ending in `_rng`.
### Probability function naming {-}
A probability function literal must have one of the following
suffixes: `_lpdf`, `_lpmf`, `_lcdf`, or `_lccdf`.
### Algebraic solver argument types and origins {-}
The `algebra_solver` function may be used without control
parameters; in this case
* its first argument refers to a function with signature
`( vector, vector, array[] real, array[] int) : vector`,
* the remaining four arguments must be assignable to types
`vector`, `vector`, `array[] real`, `array[] int`, respectively and
* the fourth and fifth arguments must be expressions
containing only variables originating from the data or transformed
data blocks.
The `algebra_solver` function may accept three additional arguments,
which like the second, fourth, and fifth arguments, must be expressions free
of parameter references. The final free arguments must be assignable to types
`real`, `real`, and `int`, respectively.
### Integrate 1D argument types and origins {-}
The `integrate_1d` function requires
* its first argument to refer to a function wth signature
`(real, real, array[] real, array[] real, array[] int) : real`,
* the remaining six arguments are assignable to types
`real`, `real`, `array[] real`, `array[] real`, and `array[] int`, and
* the fourth and fifth arguments must be expressions not containing
any variables not originating in the data or transformed data blocks.
`integrate_1d` can accept an extra argument, which, like the
fourth and fifth arguments, must be expressions free of parameter
references. This optional sixth argument must be assignable to a
`real` type.
### ODE solver argument types and origins {-}
The `integrate_ode`, `integrate_ode_rk45`, and
`integrate_ode_bdf` functions may be used without control
parameters; in this case
* its first argument to refer to a function with signature
`(real, array[] real, array[] real, array[] real, array[] int) : array[] real`,
* the remaining six arguments must assignable to types
`array[] real`, `real`, `array[] real`, `array[] real`, `array[] real`, and `array[] int`,
respectively, and
* the third, fourth, and sixth arguments must be expressions not
containing any variables not originating in the data or transformed
data blocks.
The `integrate_ode_rk45` and `integrate_ode_bdf`
functions may accept three additional arguments, which like the third,
fourth, and sixth arguments, must be expressions free of parameter
references. The final three arguments must be assignable to types
`real`, `real`, and `int`, respectively.
### Indexes {-}
Standalone expressions used as indexes must denote either an integer
(`int`) or an integer array (`array[] int`). Expressions
participating in range indexes (e.g., `a` and `b` in
`a : b`) must denote integers (`int`).
A second condition is that there not be more indexes provided than
dimensions of the underlying expression (in general) or variable (on
the left side of assignments) being indexed. A vector or row vector
adds 1 to the array dimension and a matrix adds 2. That is, the type
`matrix[ , , ]`, a three-dimensional array of matrices, has five
index positions: three for the array, one for the row of the matrix
and one for the column.