Skip to content

Commit 889029f

Browse files
emrberkclaude
andcommitted
add generate:cst script and DEVELOPMENT.md workflow guide
Add yarn generate:cst command to auto-regenerate CST type definitions from the parser grammar using @chevrotain/cst-dts-gen. Add DEVELOPMENT.md documenting the full development workflow: adding keywords, statement types, modifying autocomplete, and key architectural concepts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9046e27 commit 889029f

File tree

6 files changed

+215
-10
lines changed

6 files changed

+215
-10
lines changed

DEVELOPMENT.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# Contributing to @questdb/sql-parser
2+
3+
## Setup
4+
5+
```bash
6+
yarn # Install dependencies
7+
yarn build # Compile TypeScript (tsup + tsc)
8+
yarn test # Run all tests (6,100+ tests)
9+
yarn test:watch # Run tests in watch mode
10+
yarn typecheck # Type-check without emitting
11+
yarn lint # Run ESLint
12+
yarn lint:fix # Auto-fix lint issues
13+
yarn generate:cst # Regenerate CST type definitions from parser grammar
14+
yarn clean # Remove dist/ and coverage/
15+
```
16+
17+
## Pipeline Overview
18+
19+
Every SQL string flows through this pipeline:
20+
21+
```
22+
SQL String ──> Lexer (tokens.ts/lexer.ts) ──> Token[]
23+
24+
Token[] ──────> Parser (parser.ts) ───────> CST (Concrete Syntax Tree)
25+
26+
CST ──────────> Visitor (visitor.ts) ──────> AST (typed, clean)
27+
28+
AST ──────────> toSql (toSql.ts) ──────────> SQL String (round-trip)
29+
```
30+
31+
The **CST** is Chevrotain's lossless tree that preserves every token. The **visitor** transforms it into a clean, typed **AST** that is easy to work with. `toSql()` converts any AST node back to valid SQL.
32+
33+
For **autocomplete**, the flow is:
34+
35+
```
36+
SQL + cursor offset ──> content-assist.ts ──> parser.computeContentAssist()
37+
38+
nextTokenTypes + tablesInScope + cteColumns
39+
40+
suggestion-builder.ts ──> Suggestion[] (filtered, prioritized)
41+
```
42+
43+
## How Tokens Work
44+
45+
Grammar arrays (`src/grammar/keywords.ts`, `dataTypes.ts`, `constants.ts`) are the source of truth. `src/parser/tokens.ts` auto-generates Chevrotain tokens from them:
46+
47+
1. Each keyword string is converted to a PascalCase token name (`"select"``Select`, `"data_page_size"``DataPageSize`)
48+
2. Each token gets a case-insensitive regex pattern with word boundary (e.g., `/select\b/i`)
49+
3. Non-reserved keywords are assigned to the `IdentifierKeyword` category, which lets the parser accept them as table/column names via a single `CONSUME(IdentifierKeyword)` rule
50+
51+
The `IDENTIFIER_KEYWORD_NAMES` set in `tokens.ts` controls which keywords are non-reserved. Reserved keywords (SELECT, FROM, WHERE, JOIN, etc.) are **not** in this set and cannot be used as unquoted identifiers.
52+
53+
## Workflow: Adding a New Keyword
54+
55+
Example: adding a hypothetical `RETENTION` keyword.
56+
57+
**1. Add to grammar**`src/grammar/keywords.ts`:
58+
```typescript
59+
export const keywords: string[] = [
60+
// ...existing keywords in alphabetical order...
61+
"retention",
62+
// ...
63+
]
64+
```
65+
This auto-generates a `Retention` token in `tokens.ts`.
66+
67+
**2. If non-reserved, mark it**`src/parser/tokens.ts`:
68+
69+
If the keyword can be used as an identifier (table/column name), add it to `IDENTIFIER_KEYWORD_NAMES`:
70+
```typescript
71+
export const IDENTIFIER_KEYWORD_NAMES = new Set([
72+
// ...
73+
"Retention",
74+
])
75+
```
76+
77+
Skip this step if the keyword is reserved (i.e., it introduces structural ambiguity as an identifier).
78+
79+
**3. Use in parser grammar**`src/parser/parser.ts`:
80+
81+
Reference the token in a grammar rule:
82+
```typescript
83+
private retentionClause = this.RULE("retentionClause", () => {
84+
this.CONSUME(Retention)
85+
this.CONSUME(NumberLiteral)
86+
this.SUBRULE(this.partitionPeriod) // DAY, MONTH, etc.
87+
})
88+
```
89+
90+
Make sure to import the token from `lexer.ts` at the top of `parser.ts`. The token is available by its PascalCase name.
91+
92+
**4. Regenerate CST types**:
93+
```bash
94+
yarn generate:cst
95+
```
96+
This reads the parser's grammar rules and regenerates `src/parser/cst-types.d.ts`. The new rule's CST children type will appear automatically (e.g., `RetentionClauseCstChildren`).
97+
98+
**5. Add visitor method**`src/parser/visitor.ts`:
99+
100+
Import the new CST type from `cst-types.d.ts`, then add a visitor method:
101+
```typescript
102+
retentionClause(ctx: RetentionClauseCstChildren): AST.RetentionClause {
103+
return {
104+
type: "retentionClause",
105+
value: parseInt(ctx.NumberLiteral[0].image, 10),
106+
unit: this.visit(ctx.partitionPeriod[0]),
107+
}
108+
}
109+
```
110+
111+
**6. Add AST type**`src/parser/ast.ts`:
112+
```typescript
113+
export interface RetentionClause extends AstNode {
114+
type: "retentionClause"
115+
value: number
116+
unit: string
117+
}
118+
```
119+
120+
**7. Add toSql serialization**`src/parser/toSql.ts`:
121+
```typescript
122+
function retentionClauseToSql(clause: AST.RetentionClause): string {
123+
return `RETENTION ${clause.value} ${clause.unit}`
124+
}
125+
```
126+
Wire it into the parent statement's toSql function.
127+
128+
**8. Add tests**`tests/parser.test.ts`:
129+
```typescript
130+
it("should parse RETENTION clause", () => {
131+
const result = parseToAst("CREATE TABLE t (x INT) RETENTION 30 DAY")
132+
expect(result.errors).toHaveLength(0)
133+
// assert AST structure...
134+
})
135+
136+
it("should round-trip RETENTION clause", () => {
137+
const sql = "CREATE TABLE t (x INT) RETENTION 30 DAY"
138+
const result = parseToAst(sql)
139+
const roundtrip = toSql(result.ast[0])
140+
const result2 = parseToAst(roundtrip)
141+
expect(result2.errors).toHaveLength(0)
142+
})
143+
```
144+
145+
**9. Run tests**:
146+
```bash
147+
yarn test
148+
```
149+
150+
## Workflow: Adding a New Statement Type
151+
152+
Same as adding a keyword, but the scope is larger:
153+
154+
1. **Grammar**: add all tokens to `src/grammar/keywords.ts` (and `src/parser/tokens.ts` if non-reserved)
155+
2. **Parser**: add a new top-level rule in `parser.ts`, register it in the `statement` rule's alternatives
156+
3. **CST types**: `yarn generate:cst`
157+
4. **AST**: add the statement interface to `ast.ts`, add it to the `Statement` union type
158+
5. **Visitor**: add visitor method in `visitor.ts`
159+
6. **toSql**: add serializer in `toSql.ts`, add the case to the `statementToSql` switch
160+
7. **Tests**: parse tests, AST structure assertions, and round-trip tests
161+
162+
## Workflow: Modifying Autocomplete Behavior
163+
164+
Autocomplete has four layers:
165+
166+
1. **`content-assist.ts`** — determines what the parser expects at the cursor position. Extracts tables in scope (FROM/JOIN clauses), CTE definitions, and qualified references (e.g., `t1.`). You rarely need to modify this unless you're changing how scope is detected.
167+
168+
2. **`token-classification.ts`** — classifies tokens into categories: `SKIP_TOKENS` (never suggested), `EXPRESSION_OPERATORS` (lower priority), `IDENTIFIER_KEYWORD_TOKENS` (trigger schema suggestions). When adding a new token, decide which category it belongs to.
169+
170+
3. **`suggestion-builder.ts`** — converts parser token types + schema into `Suggestion[]`. Controls priority (columns > keywords > functions > tables), handles qualified references, and manages deduplication.
171+
172+
4. **`provider.ts`** — orchestrates the above and adds context detection: after FROM → suggest tables, after SELECT → suggest columns, after `*` → suppress columns (alias position), etc. The `getIdentifierSuggestionScope()` function is the main context switcher.
173+
174+
## Key Concepts
175+
176+
**Reserved vs. non-reserved keywords**: QuestDB has ~60 reserved keywords. Everything else (data types, time units, config keys like `maxUncommittedRows`) is non-reserved and can be used as an unquoted identifier. The `IdentifierKeyword` token category in Chevrotain handles this — the parser's `identifier` rule accepts any `IdentifierKeyword` token.
177+
178+
**CST vs. AST**: The CST preserves every token (including keywords, punctuation, whitespace position). The AST is a clean semantic representation. The visitor decides what to keep. For example, the CST has separate `Select`, `Star`, `From` tokens; the AST just has `{ type: "select", columns: [{ type: "star" }], from: [...] }`.
179+
180+
**Round-trip correctness**: `toSql(parseToAst(sql).ast)` must produce SQL that parses to an equivalent AST. This is verified against 1,726 real queries in `docs-roundtrip.test.ts`. When adding new features, always test round-trip.
181+
182+
**Error recovery**: The parser uses Chevrotain's semicolon-based error recovery. When a statement fails to parse, it skips to the next semicolon and continues. The visitor handles incomplete CST nodes with try-catch. This means `parseToAst()` can return both `ast` (partial) and `errors` simultaneously.

README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -210,12 +210,19 @@ import { keywords, functions, dataTypes, operators, constants } from "@questdb/s
210210
## Development
211211

212212
```bash
213-
yarn # Install dependencies
214-
yarn build # Compile TypeScript
215-
yarn test # Run all tests
216-
yarn test:watch # Run tests in watch mode
213+
yarn # Install dependencies
214+
yarn build # Compile TypeScript (tsup + tsc)
215+
yarn test # Run all tests (6,100+ tests)
216+
yarn test:watch # Run tests in watch mode
217+
yarn typecheck # Type-check without emitting
218+
yarn lint # Run ESLint
219+
yarn lint:fix # Auto-fix lint issues
220+
yarn generate:cst # Regenerate CST type definitions from parser grammar
221+
yarn clean # Remove dist/ and coverage/
217222
```
218223

224+
See [DEVELOPMENT.md](DEVELOPMENT.md) for the full development workflow guide — how to add keywords, statement types, modify autocomplete, and more.
225+
219226
## License
220227

221228
Apache-2.0

package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
"test": "vitest run",
3737
"test:watch": "vitest",
3838
"test:coverage": "vitest run --coverage",
39+
"generate:cst": "node --import jiti/register scripts/generate-cst-types.ts",
3940
"clean": "rm -rf dist coverage",
4041
"lint": "eslint src/ tests/",
4142
"lint:fix": "eslint src/ tests/ --fix",
@@ -70,6 +71,7 @@
7071
"chevrotain": "^11.1.1"
7172
},
7273
"devDependencies": {
74+
"@chevrotain/cst-dts-gen": "^11.1.1",
7375
"@eslint/js": "^10.0.1",
7476
"@types/node": "^25.2.0",
7577
"eslint": "^10.0.0",

scripts/generate-cst-types.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import { generateCstDts } from "@chevrotain/cst-dts-gen"
2+
import { parser } from "../src/parser/parser.ts"
3+
import { writeFileSync } from "node:fs"
4+
import { resolve, dirname } from "node:path"
5+
import { fileURLToPath } from "node:url"
6+
7+
const __dirname = dirname(fileURLToPath(import.meta.url))
8+
const outPath = resolve(__dirname, "../src/parser/cst-types.d.ts")
9+
10+
const dts = generateCstDts(parser.getGAstProductions())
11+
writeFileSync(outPath, dts)
12+
13+
console.log(`Written ${outPath}`)

src/parser/cst-types.d.ts

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -590,9 +590,7 @@ export type CreateTableBodyCstChildren = {
590590
If?: IToken[];
591591
Not?: IToken[];
592592
Exists?: IToken[];
593-
stringOrQualifiedName?: StringOrQualifiedNameCstNode[];
594-
StringLiteral?: (IToken)[];
595-
qualifiedName?: (QualifiedNameCstNode)[];
593+
stringOrQualifiedName: StringOrQualifiedNameCstNode[];
596594
As?: IToken[];
597595
LParen?: (IToken)[];
598596
selectStatement?: SelectStatementCstNode[];
@@ -602,6 +600,7 @@ export type CreateTableBodyCstChildren = {
602600
indexDefinition?: (IndexDefinitionCstNode)[];
603601
columnDefinition?: (ColumnDefinitionCstNode)[];
604602
Like?: IToken[];
603+
qualifiedName?: QualifiedNameCstNode[];
605604
Timestamp?: IToken[];
606605
columnRef?: ColumnRefCstNode[];
607606
Partition?: IToken[];
@@ -626,6 +625,7 @@ export type CreateTableBodyCstChildren = {
626625
tableParam?: (TableParamCstNode)[];
627626
In?: (IToken)[];
628627
Volume?: IToken[];
628+
StringLiteral?: IToken[];
629629
identifier?: IdentifierCstNode[];
630630
dedupClause?: DedupClauseCstNode[];
631631
Owned?: IToken[];
@@ -1893,10 +1893,10 @@ export interface Ipv4ContainmentExpressionCstNode extends CstNode {
18931893

18941894
export type Ipv4ContainmentExpressionCstChildren = {
18951895
additiveExpression: (AdditiveExpressionCstNode)[];
1896-
IPv4ContainedBy?: IToken[];
18971896
IPv4ContainedByOrEqual?: IToken[];
1898-
IPv4Contains?: IToken[];
1897+
IPv4ContainedBy?: IToken[];
18991898
IPv4ContainsOrEqual?: IToken[];
1899+
IPv4Contains?: IToken[];
19001900
};
19011901

19021902
export interface AdditiveExpressionCstNode extends CstNode {

yarn.lock

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ __metadata:
55
version: 8
66
cacheKey: 10c0
77

8-
"@chevrotain/cst-dts-gen@npm:11.1.1":
8+
"@chevrotain/cst-dts-gen@npm:11.1.1, @chevrotain/cst-dts-gen@npm:^11.1.1":
99
version: 11.1.1
1010
resolution: "@chevrotain/cst-dts-gen@npm:11.1.1"
1111
dependencies:
@@ -419,6 +419,7 @@ __metadata:
419419
version: 0.0.0-use.local
420420
resolution: "@questdb/sql-parser@workspace:."
421421
dependencies:
422+
"@chevrotain/cst-dts-gen": "npm:^11.1.1"
422423
"@eslint/js": "npm:^10.0.1"
423424
"@types/node": "npm:^25.2.0"
424425
chevrotain: "npm:^11.1.1"

0 commit comments

Comments
 (0)