DTStack
diff --git a/‎AGENTS.md‎
Lines changed: 171 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 171 additions & 0 deletions
diff --git a/‎package.json‎
Lines changed: 1 addition & 1 deletion b/‎package.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/grammar/flink/FlinkSqlParser.g4‎
Lines changed: 1 addition & 0 deletions b/‎src/grammar/flink/FlinkSqlParser.g4‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/grammar/hive/HiveSqlParser.g4‎
Lines changed: 1 addition & 0 deletions b/‎src/grammar/hive/HiveSqlParser.g4‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/grammar/impala/ImpalaSqlParser.g4‎
Lines changed: 1 addition & 0 deletions b/‎src/grammar/impala/ImpalaSqlParser.g4‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/grammar/mysql/MySqlParser.g4‎
Lines changed: 6 additions & 11 deletions b/‎src/grammar/mysql/MySqlParser.g4‎
Lines changed: 6 additions & 11 deletions
diff --git a/‎src/grammar/postgresql/PostgreSqlParser.g4‎
Lines changed: 11 additions & 3 deletions b/‎src/grammar/postgresql/PostgreSqlParser.g4‎
Lines changed: 11 additions & 3 deletions
diff --git a/‎src/grammar/spark/SparkSqlParser.g4‎
Lines changed: 1 addition & 0 deletions b/‎src/grammar/spark/SparkSqlParser.g4‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/lib/SQLParserBase.ts‎
Lines changed: 46 additions & 5 deletions b/‎src/lib/SQLParserBase.ts‎
Lines changed: 46 additions & 5 deletions
diff --git a/‎src/lib/flink/FlinkSqlParser.interp‎
Lines changed: 1 addition & 1 deletion b/‎src/lib/flink/FlinkSqlParser.interp‎
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,171 @@
+# AGENTS.md
+
+## 项目概述
+
+**dt-sql-parser** 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 构建的 SQL 解析器库，面向 **大数据** 领域。它提供以下功能：
+
+- SQL 语法校验
+- AST 遍历（Visitor / Listener 模式）
+- 代码补全（基于 [antlr4-c3](https://github.com/mike-lischke/antlr4-c3)）
+- 实体提取（表、列等）
+- SQL 语句拆分
+
+**支持的 SQL 方言**：MySQL、Flink、Spark、Hive、PostgreSQL、Trino、Impala。
+
+## 技术栈
+
+| 类别           | 说明                                                  |
+| -------------- | ----------------------------------------------------- |
+| 语言           | TypeScript（目标 ES6，模块 ESNext）                   |
+| 运行环境       | Node.js >= 18                                         |
+| 包管理器       | pnpm 9.7.0                                            |
+| 构建工具       | `tsc`（TypeScript 编译器）                            |
+| 测试框架       | Jest（配合 `@swc/jest` 转换器）                       |
+| 解析器生成     | ANTLR4，通过 `antlr4ng` + `antlr4ng-cli`              |
+| 代码补全       | `antlr4-c3`                                           |
+| 代码格式化     | Prettier、`antlr-format-cli`（用于 `.g4` 文件）       |
+| Git Hooks      | Husky + lint-staged + commitlint                      |
+
+## 仓库结构
+
+```
+dt-sql-parser/
+├── src/
+│   ├── grammar/          # ANTLR4 .g4 语法文件（每个方言一个子目录）
+│   ├── lib/              # 从 .g4 文件生成的 Lexer/Parser/Listener/Visitor
+│   ├── parser/           # SQL Parser 类实现
+│   │   ├── common/       # 基类（BasicSQL）、工具方法、共享类型
+│   │   ├── mysql/        # MySQL 专属解析器、实体收集器等
+│   │   ├── flink/
+│   │   ├── spark/
+│   │   ├── hive/
+│   │   ├── postgresql/
+│   │   ├── trino/
+│   │   └── impala/
+│   ├── locale/           # 国际化资源
+│   └── index.ts          # 公共 API 导出
+├── test/                 # 单元测试（结构与 src/ 对应）
+│   ├── parser/           # 按方言组织的测试
+│   │   ├── mysql/
+│   │   │   ├── syntax/   # 语法规则测试
+│   │   │   ├── suggestion/ # 代码补全测试
+│   │   │   └── contextCollect/ # 实体收集测试
+│   │   └── ...
+│   └── common/           # 共享测试工具
+├── benchmark/            # 性能基准测试
+├── scripts/              # 构建/发布实用脚本
+├── gen/                  # 生成产物
+├── dist/                 # 编译输出（npm 包）
+├── .husky/               # Git Hook 配置
+├── package.json
+├── tsconfig.json
+├── jest.config.js
+└── CONTRIBUTING.md       # 贡献指南（新增方言步骤）
+```
+
+## 核心开发命令
+
+```bash
+pnpm install              # 安装依赖
+pnpm antlr4               # 从所有 .g4 文件生成 TS
+pnpm antlr4 --lang mysql  # 为指定方言生成
+pnpm build                # 编译 TypeScript（rm -rf dist && tsc）
+pnpm test                 # 运行 Jest 单元测试
+pnpm benchmark            # 运行性能基准测试
+pnpm check-types          # 对 src/ 和 test/ 进行类型检查
+pnpm format               # 用 Prettier 格式化所有文件
+pnpm format-g4            # 格式化 .g4 语法文件
+pnpm prettier-check       # 检查格式是否符合要求（不修改）
+```
+
+## 架构说明
+
+### Parser 类层级
+
+```
+BasicSQL (src/parser/common/)
+  ├── MySQL
+  ├── FlinkSQL
+  ├── SparkSQL
+  ├── HiveSQL
+  ├── PostgreSQL
+  ├── TrinoSQL
+  └── ImpalaSQL
+```
+
+每个方言的 Parser 类（例如 `src/parser/mysql/index.ts` 中的 `MySQL`）继承自 `BasicSQL`，并实现以下方法：
+
+- `createLexerFromCharStream()` — 创建 ANTLR4 Lexer
+- `createParserFromTokenStream()` — 创建 ANTLR4 Parser
+- `splitListener` getter — 返回用于语句拆分的 `SQLSplitListener`
+- `createEntityCollector()` — 返回用于上下文/实体提取的 `SQLEntityCollector`
+- `processCandidates()` / `preferredRules()` — 代码补全逻辑（antlr4-c3）
+
+### 每方言模块结构
+
+每个 `src/parser/<dialect>/` 目录包含：
+
+| 文件                                        | 用途                                          |
+| ------------------------------------------- | --------------------------------------------- |
+| `index.ts`                                  | 主解析器类，继承 `BasicSQL`                    |
+| `<dialect>EntityCollector.ts`               | 从 AST 中提取表、列、函数等实体                |
+| `<dialect>SplitListener.ts`                 | 通过分号/AST 拆分多语句 SQL                    |
+| `<dialect>ErrorListener.ts`                 | 自定义语法错误处理                             |
+| `<dialect>SemanticContextCollector.ts`      | 收集语义上下文（如 `isStatementBeginning`）    |
+
+### 语法文件 → 代码生成流程
+
+1. `.g4` 文件存放在 `src/grammar/<dialect>/` 目录
+2. 执行 `pnpm antlr4 [--lang <dialect>]` 生成：
+   - `src/lib/<dialect>/<Dialect>Lexer.ts`
+   - `src/lib/<dialect>/<Dialect>Parser.ts`
+   - `src/lib/<dialect>/<Dialect>ParserListener.ts`
+   - `src/lib/<dialect>/<Dialect>ParserVisitor.ts`
+3. `src/parser/<dialect>/` 中的 Parser 类消费这些生成的文件
+
+### 语法规则约定
+
+- 根规则必须命名为 `program`
+- 支持解析多条 SQL 语句
+- 关键字规则前缀为 `KW_`（例如 `KW_SELECT`）
+- 不区分大小写的方言启用 case-insensitive 选项
+
+## 公共 API（来自 `src/index.ts`）
+
+**类**：`MySQL`、`FlinkSQL`、`SparkSQL`、`HiveSQL`、`PostgreSQL`、`TrinoSQL`、`ImpalaSQL`
+
+**Listener/Visitor 类型**：`MySqlParserListener`、`MySqlParserVisitor` 等（每个方言一对）
+
+**枚举**：`EntityContextType`、`StmtContextType`
+
+**类型**：`CaretPosition`、`Suggestions`、`SyntaxSuggestion`、`WordRange`、`TextSlice`、`SyntaxError`、`ParseError`、`ErrorListener`、`StmtContext`、`EntityContext`、`CommonEntityContext`、`ColumnEntityContext`、`FuncEntityContext`
+
+## 测试规范
+
+- 测试文件结构与 `src/` 对应，位于 `test/parser/<dialect>/`
+- 子目录：`syntax/`（语法）、`suggestion/`（补全）、`contextCollect/`（实体收集）
+- 使用 Jest 配合 `@swc/jest` 实现快速编译
+- 自定义匹配器定义在 `test/matchers.ts`
+- 运行命令：`pnpm test`
+
+## 新增 SQL 方言（步骤概要）
+
+1. 在 `src/grammar/<name>/` 下添加 `.g4` 文件（PascalCase 命名，根规则 = `program`，关键字 = `KW_*`）
+2. 执行 `pnpm antlr4 --lang <name>` → 生成 `src/lib/<name>/`
+3. 创建 `src/parser/<name>/index.ts` 继承 `BasicSQL`
+4. 在 `test/parser/<name>/` 下添加测试（lexer、visitor、listener、validate）
+5. 实现 `SQLSplitListener` → 添加 `splitListener` getter
+6. 实现代码补全 → `processCandidates` + `preferredRules`，并在 `suggestion/` 下添加测试
+7. 实现 `SQLEntityCollector` + `createEntityCollector()`，并在 `contextCollect/` 下添加测试
+8. 在 `src/parser/index.ts` 和 `src/index.ts` 中导出新类
+
+## AI Agent 注意事项
+
+- 修改 `.g4` 文件后 **必须** 执行 `pnpm antlr4`，确保 `src/lib/` 中的生成文件保持同步
+- **不要** 手动编辑 `src/lib/` 中的文件 —— 它们是由 `.g4` 自动生成的
+- 语法文件遵循 ANTLR4 约定；关键字规则必须带有 `KW_` 前缀
+- 项目使用 `antlr4ng`（非 Java antlr4 运行时）作为 TypeScript 目标
+- 代码补全依赖 `antlr4-c3`；修改补全逻辑前先了解该库
+- 实体收集器（`SQLEntityCollector`）是实现丰富代码补全的关键 —— 需理解作用域深度和 `isAccessible` 逻辑
+- 位置/范围约定：行号从 1 开始，列号从 1 开始，索引从 0 开始
+- Prettier 格式化通过 husky + lint-staged 在提交时强制执行
@@ -1,6 +1,6 @@
 {
   "name": "dt-sql-parser",
-  "version": "4.5.0-beta.0",
+  "version": "4.5.0-beta.1",
   "authors": "DTStack Corporation",
   "description": "SQL Parsers for BigData, built with antlr4",
   "keywords": [
 
@@ -509,6 +509,7 @@ columnProjectItem
     | selectLiteralColumnName (columnAlias | KW_AS? expression)?
     | tableAllColumns columnAlias?
     | selectExpressionColumnName (columnAlias | KW_AS? columnName)?
+    | {this.shouldMatchEmpty()}? emptyColumn
     ;
 
 selectWindowItemColumnName
 
@@ -1536,6 +1536,7 @@ selectItem
             | KW_AS LPAREN alias=id_ (COMMA alias=id_)* RPAREN
         )?
     )
+    | {this.shouldMatchEmpty()}? emptyColumn
     ;
 
 selectLiteralColumnName
 
@@ -823,6 +823,7 @@ selectItem
     : selectLiteralColumnName columnAlias?
     | selectExpressionColumnName columnAlias?
     | tableAllColumns
+    | {this.shouldMatchEmpty()}? emptyColumn
     ;
 
 columnAlias
 
@@ -1205,9 +1205,10 @@ selectElements
     ;
 
 selectElement
-    : tableAllColumns
-    | selectLiteralColumnName (KW_AS? alias=uid)?
-    | selectExpressionColumnName (KW_AS? alias=uid)?
+    : tableAllColumns                                # selectElement_star
+    | selectLiteralColumnName (KW_AS? alias=uid)?    # selectElement_label
+    | selectExpressionColumnName (KW_AS? alias=uid)? # selectElement_expr
+    | uid DOT {this.shouldMatchEmpty()}? emptyColumn # selectElement_dot_empty
     ;
 
 tableAllColumns
@@ -2424,7 +2425,7 @@ emptyColumn
     ;
 
 columnName
-    : uid (dottedIdAllowEmpty dottedIdAllowEmpty?)?
+    : uid (dottedId dottedId?)?
     | .? dottedId dottedId?
     | {this.shouldMatchEmpty()}? emptyColumn
     ;
@@ -2436,7 +2437,7 @@ columnNamePath
 
 columnNamePathAllowEmpty
     : {this.shouldMatchEmpty()}? emptyColumn
-    | uid (dottedIdAllowEmpty dottedIdAllowEmpty?)?
+    | uid (dottedId dottedId?)?
     ;
 
 tableSpaceNameCreate
@@ -2574,12 +2575,6 @@ dottedId
     | '.' uid
     ;
 
-dottedIdAllowEmpty
-    : DOT ID
-    | '.' uid
-    | {this.shouldMatchEmpty()}? DOT emptyColumn
-    ;
-
 decimalLiteral
     : DECIMAL_LITERAL
     | ZERO_DECIMAL
 
@@ -2615,7 +2615,8 @@ when_clause
     ;
 
 indirectionEl
-    : DOT (colLabel | STAR)
+    : DOT indirectionLabel
+    | DOT STAR
     | OPEN_BRACKET (expression | expression? COLON expression?) CLOSE_BRACKET
     ;
 
@@ -2634,6 +2635,8 @@ targetList
 targetEl
     : tableAllColumns                                                                    # target_star
     | (selectLiteralColumnName | selectExpressionColumnName) (KW_AS? alias=identifier |) # target_label
+    | colId DOT {this.entityCollecting}? emptyColumn                                     # target_dot_empty
+    | {this.entityCollecting}? emptyColumn                                               # target_empty
     ;
 
 tableAllColumns
@@ -2722,18 +2725,17 @@ procedureNameCreate
     | colId indirection
     ;
 
+// Empty column rule for entity collection
 emptyColumn
     :
     ;
 
 columnName
     : colId optIndirection
-    | {this.shouldMatchEmpty()}? (colId DOT emptyColumn | emptyColumn)
     ;
 
 columnNamePath
     : colId optIndirection
-    | {this.shouldMatchEmpty()}? (colId DOT emptyColumn | emptyColumn)
     ;
 
 columnNameCreate
@@ -2800,6 +2802,12 @@ colLabel
     | reservedKeyword
     ;
 
+indirectionLabel
+    : identifier
+    | colNameKeyword
+    | typeFuncNameKeyword
+    ;
+
 identifier
     : Identifier (KW_UESCAPE anysconst)?
     | stringConst
 
@@ -850,6 +850,7 @@ namedExpression
     : (tableAllColumns | selectLiteralColumnName | selectExpressionColumnName) (
         KW_AS? (alias=errorCapturingIdentifier | identifierList)
     )?
+    | {this.shouldMatchEmpty()}? emptyColumn
     ;
 
 namedExpressionSeq
 
@@ -10,9 +10,50 @@ export abstract class SQLParserBase<T = antlr.ParserRuleContext> extends antlr.P
 
     public entityCollecting = false;
 
-    public shouldMatchEmpty () {
-        return this.entityCollecting
-            && (this.tokenStream.LT(-1)?.tokenIndex ?? Infinity) <= this.caretTokenIndex
-            && (this.tokenStream.LT(1)?.tokenIndex ?? -Infinity) >= this.caretTokenIndex
+    /**
+     * Semantic predicate to determine whether to match empty column.
+     * 
+     * Key design:
+     * 1. Only match empty column in entityCollecting mode
+     * 2. Check if caret position is at the empty column position
+     * 3. In validate mode (entityCollecting=false), this predicate returns false
+     *    and reports an error to ensure incomplete SQL is caught
+     * 
+     * IMPORTANT: This predicate should be used carefully to avoid affecting
+     * prediction in non-entity-collecting contexts.
+     */
+    public shouldMatchEmpty (ruleName?: string) {
+        // Only match in entityCollecting mode or when caret position is specified (suggestion mode)
+        if (this.entityCollecting || this.caretTokenIndex >= 0) {
+            // If no caret position specified, match all empty columns
+            if (this.caretTokenIndex < 0) {
+                return true;
+            }
+            
+            // Check if caret is at the position where empty column would be
+            const prevTokenIndex = this.tokenStream.LT(-1)?.tokenIndex;
+            const nextTokenIndex = this.tokenStream.LT(1)?.tokenIndex;
+
+            // Match if caret is between previous and next token
+            if (prevTokenIndex !== undefined && nextTokenIndex !== undefined) {
+                return prevTokenIndex <= this.caretTokenIndex && nextTokenIndex >= this.caretTokenIndex;
+            }
+            
+            // If only previous token exists, match if caret is after it
+            if (prevTokenIndex !== undefined) {
+                return prevTokenIndex <= this.caretTokenIndex;
+            }
+            
+            // If only next token exists, match if caret is before it
+            if (nextTokenIndex !== undefined) {
+                return nextTokenIndex >= this.caretTokenIndex;
+            }
+            
+            return false;
+        }
+        
+        // In pure validate mode, don't match empty columns
+        // This allows ANTLR to report errors naturally
+        return false;
     }
-}
+}
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "dt-sql-parser",`
`3`		`- "version": "4.5.0-beta.0",`
	`3`	`+ "version": "4.5.0-beta.1",`
`4`	`4`	`"authors": "DTStack Corporation",`
`5`	`5`	`"description": "SQL Parsers for BigData, built with antlr4",`
`6`	`6`	`"keywords": [`
Original file line number	Diff line number	Diff line change
`@@ -509,6 +509,7 @@ columnProjectItem`
`509`	`509`	`\| selectLiteralColumnName (columnAlias \| KW_AS? expression)?`
`510`	`510`	`\| tableAllColumns columnAlias?`
`511`	`511`	`\| selectExpressionColumnName (columnAlias \| KW_AS? columnName)?`
	`512`	`+ \| {this.shouldMatchEmpty()}? emptyColumn`
`512`	`513`	`;`
`513`	`514`
`514`	`515`	`selectWindowItemColumnName`
Original file line number	Diff line number	Diff line change
`@@ -1536,6 +1536,7 @@ selectItem`
`1536`	`1536`	`\| KW_AS LPAREN alias=id_ (COMMA alias=id_)* RPAREN`
`1537`	`1537`	`)?`
`1538`	`1538`	`)`
	`1539`	`+ \| {this.shouldMatchEmpty()}? emptyColumn`
`1539`	`1540`	`;`
`1540`	`1541`
`1541`	`1542`	`selectLiteralColumnName`
Original file line number	Diff line number	Diff line change
`@@ -823,6 +823,7 @@ selectItem`
`823`	`823`	`: selectLiteralColumnName columnAlias?`
`824`	`824`	`\| selectExpressionColumnName columnAlias?`
`825`	`825`	`\| tableAllColumns`
	`826`	`+ \| {this.shouldMatchEmpty()}? emptyColumn`
`826`	`827`	`;`
`827`	`828`
`828`	`829`	`columnAlias`
Original file line number	Diff line number	Diff line change
`@@ -2615,7 +2615,8 @@ when_clause`
`2615`	`2615`	`;`
`2616`	`2616`
`2617`	`2617`	`indirectionEl`
`2618`		`- : DOT (colLabel \| STAR)`
	`2618`	`+ : DOT indirectionLabel`
	`2619`	`+ \| DOT STAR`
`2619`	`2620`	`\| OPEN_BRACKET (expression \| expression? COLON expression?) CLOSE_BRACKET`
`2620`	`2621`	`;`
`2621`	`2622`
`@@ -2634,6 +2635,8 @@ targetList`
`2634`	`2635`	`targetEl`
`2635`	`2636`	`: tableAllColumns # target_star`
`2636`	`2637`	`\| (selectLiteralColumnName \| selectExpressionColumnName) (KW_AS? alias=identifier \|) # target_label`
	`2638`	`+ \| colId DOT {this.entityCollecting}? emptyColumn # target_dot_empty`
	`2639`	`+ \| {this.entityCollecting}? emptyColumn # target_empty`
`2637`	`2640`	`;`
`2638`	`2641`
`2639`	`2642`	`tableAllColumns`
`@@ -2722,18 +2725,17 @@ procedureNameCreate`
`2722`	`2725`	`\| colId indirection`
`2723`	`2726`	`;`
`2724`	`2727`
	`2728`	`+// Empty column rule for entity collection`
`2725`	`2729`	`emptyColumn`
`2726`	`2730`	`:`
`2727`	`2731`	`;`
`2728`	`2732`
`2729`	`2733`	`columnName`
`2730`	`2734`	`: colId optIndirection`
`2731`		`- \| {this.shouldMatchEmpty()}? (colId DOT emptyColumn \| emptyColumn)`
`2732`	`2735`	`;`
`2733`	`2736`
`2734`	`2737`	`columnNamePath`
`2735`	`2738`	`: colId optIndirection`
`2736`		`- \| {this.shouldMatchEmpty()}? (colId DOT emptyColumn \| emptyColumn)`
`2737`	`2739`	`;`
`2738`	`2740`
`2739`	`2741`	`columnNameCreate`
`@@ -2800,6 +2802,12 @@ colLabel`
`2800`	`2802`	`\| reservedKeyword`
`2801`	`2803`	`;`
`2802`	`2804`
	`2805`	`+indirectionLabel`
	`2806`	`+ : identifier`
	`2807`	`+ \| colNameKeyword`
	`2808`	`+ \| typeFuncNameKeyword`
	`2809`	`+ ;`
	`2810`	`+`
`2803`	`2811`	`identifier`
`2804`	`2812`	`: Identifier (KW_UESCAPE anysconst)?`
`2805`	`2813`	`\| stringConst`
Original file line number	Diff line number	Diff line change
`@@ -850,6 +850,7 @@ namedExpression`
`850`	`850`	`: (tableAllColumns \| selectLiteralColumnName \| selectExpressionColumnName) (`
`851`	`851`	`KW_AS? (alias=errorCapturingIdentifier \| identifierList)`
`852`	`852`	`)?`
	`853`	`+ \| {this.shouldMatchEmpty()}? emptyColumn`
`853`	`854`	`;`
`854`	`855`
`855`	`856`	`namedExpressionSeq`