Skip to content

Commit f2dc64f

Browse files
authored
merge branch next to main, next version(4.5.0) content (#474)
* feat: support query result and derived table entity collecting (#434) * feat: support queryResult and derived table entities collecting * feat: support query result and derived table entity collecting * test: enhance hive and spark entity collect test case * fix: remove _ctx and add tokenIndex into position * fix: rename declareType COMMON to LITERAL * fix: optimize entity collector and update grammar * test: add derived table and query result entities test case * fix: remove isCaretInDerivedTableStmt and set default isAccessible to null * fix: update _caretStmt docs * test: add isAccessible test case * fix: skip _caretStmt ts check * docs: update README to include additional entity information * test: fix create view test case * fix: import from error sql module * test: update entity collection tests * fix: remove unused type * chore: remove duplicate changelog in v4.4.1 * chore(release): 4.5.0-beta.0 * Next merge main (#468) * fix(flink): #455 fix json functions' params problem in flink * fix(flink): some grammar rules (#465) * fix: #464 order by + expression * fix: #464 EXTRACT function * test: #464 flink JSON_VALUE RETURNING * chore(release): 4.4.2 --------- Co-authored-by: zhaoge <> Co-authored-by: JackWang032 <64318393+JackWang032@users.noreply.github.com> * fix(parser): #283 collect errors from all erroneous statements in multi-statement input (#470) * test(parser): #283 add multi-statement error validation tests for all dialects * fix(parser): #283 collect errors from all erroneous statements in multi-statement input * feat: add generic SQL language support (#469) * fix(generic): fix INTERSECT/EXCEPT support, trim keywords to ~90 - Add INTERSECT and EXCEPT to queryNoWith rule for set operations - Remove 173 unused KW_* lexer rules for removed features (views, indexes, grants, transactions, stored procedures, window functions, triggers, etc.) - Trim nonReserved list to only keywords actually used in parser rules - Remove unused UNICODE_STRING and DIGIT_IDENTIFIER lexer rules - Keyword count reduced from 263 to 90 (close to ~100 target) - All 197 test suites pass (5627 tests) * fix(generic): reserve core structural keywords and add DIGIT_IDENTIFIER - Remove core structural keywords from nonReserved so they cannot be used as identifiers: SELECT, FROM, WHERE, CREATE, TABLE, INSERT, UPDATE, DELETE, DROP, ALTER, SET, JOIN, GROUP, HAVING, ORDER, ON, UNION, INTERSECT, EXCEPT, INTO, NOT, AND, OR, IN, BETWEEN, LIKE, IS, EXISTS, CASE, WHEN, THEN, ELSE, END, CAST, AS, DISTINCT, PRIMARY, CONSTRAINT, REFERENCES, COLUMN, UNIQUE, CHECK, FOREIGN, RENAME, RECURSIVE, WITH, NULL, ESCAPE, NULLIF - Add DIGIT_IDENTIFIER lexer token for identifiers starting with a digit (e.g. 123abc, 1st_column) - Include DIGIT_IDENTIFIER in identifier rule alternatives * fix(generic): add missing Listener/Visitor exports and diagnostics option - Add GenericSqlListener and GenericSqlVisitor exports to src/index.ts - Add GenericSQLOptions interface with configurable diagnostics flag - Override validate() to return empty array when diagnostics disabled - Export GenericSQLOptions type from src/index.ts * fix(generic): add QUERY_RESULT and SELECT column entity collection - Add exitQuerySpecification for QUERY_RESULT entity tracking - Add exitSelectItem for column entity collection in SELECT clauses - Track wildcard columns (ColumnDeclareType.ALL) for * and table.* - Track expression columns with alias support (ColumnDeclareType.EXPRESSION) - Stage previously untracked files (errorListener, splitListener, semanticContextCollector) * test: add GenericSQL tests * test: ensure all dialect tests pass with GenericSQL * test(generic): add more sql test - Add comprehensive syntax tests for all supported statement types - Add context collect tests for entity and semantic collectors - Add suggestion tests for token, syntax, and multi-statement scenarios - Add error strategy, listener, visitor, and validation tests - Fix entity collector to distinguish simple columns from expressions * feat: match empty column when in entityCollecting context (#457) (#472) * chore(release): 4.3.0 * fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset (#426) * test: #424 syntax after comments * fix(common): #424 allTokens slice when caretTokenIndex use tokenIndexOffset * chore(release): 4.3.1 * fix(postgresql): #432 remove error rule * test: #432 validate unComplete sql * fix: #432 remove error rule * feat: mark as entityCollecting in getAllEntities context to allow empty column * chore: update jest.config.js to hide console.log * fix(flink): #442 fix flink's insert values() can't support function problem * feat: remove noReserved keywords in completions * test: add filter keywords test case * test: #438 sync suggestion no duplicate syntaxContextType * fix: #438 syntaxContextType not duplicate * chore(release): 4.4.0-beta.0 * chore(release): 4.4.0 * feat: support query result and derived table entity collecting (#434) * feat: support queryResult and derived table entities collecting * feat: support query result and derived table entity collecting * test: enhance hive and spark entity collect test case * fix: remove _ctx and add tokenIndex into position * fix: rename declareType COMMON to LITERAL * fix: optimize entity collector and update grammar * test: add derived table and query result entities test case * fix: remove isCaretInDerivedTableStmt and set default isAccessible to null * fix: update _caretStmt docs * test: add isAccessible test case * fix: skip _caretStmt ts check * docs: update README to include additional entity information * test: fix create view test case * fix: import from error sql module * test: update entity collection tests * fix: remove unused type * feat: match empty column when in entityCollecting context * feat: optimize collecting entity when match empty column in entityCollecting context (#467) Co-authored-by: Cythia828 <942884029@qq.com>
1 parent d6e1df2 commit f2dc64f

150 files changed

Lines changed: 75670 additions & 46906 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
44

5+
## [4.5.0-beta.0](https://github.com/DTStack/dt-sql-parser/compare/v4.4.1...v4.5.0-beta.0) (2025-12-30)
56
### [4.4.2](https://github.com/DTStack/dt-sql-parser/compare/v4.4.1...v4.4.2) (2026-03-06)
67

78

@@ -21,9 +22,18 @@ All notable changes to this project will be documented in this file. See [standa
2122

2223
### Bug Fixes
2324

24-
* [#438](https://github.com/DTStack/dt-sql-parser/issues/438) syntaxContextType not duplicate ([4705620](https://github.com/DTStack/dt-sql-parser/commit/47056209150ef6aaade7c612700b7a289c279208))
25-
* **flink:** [#442](https://github.com/DTStack/dt-sql-parser/issues/442) fix flink's insert values() can't support function problem ([98ab7d4](https://github.com/DTStack/dt-sql-parser/commit/98ab7d459b189b7ef07cc1bf820524594a0cc78c))
26-
* **postgresql:** [#432](https://github.com/DTStack/dt-sql-parser/issues/432) remove error rule ([3684ae7](https://github.com/DTStack/dt-sql-parser/commit/3684ae71e92cefe4bac18174b757e162d3f89457))
25+
* **flink:** [#455](https://github.com/DTStack/dt-sql-parser/issues/455) fix json functions' params problem in flink ([12ef949](https://github.com/DTStack/dt-sql-parser/commit/12ef949339ffb0889e428c97c54149cd567b6ae8))
26+
* **flink:** some grammar rules ([#465](https://github.com/DTStack/dt-sql-parser/issues/465)) ([20e352c](https://github.com/DTStack/dt-sql-parser/commit/20e352cb32142c825ee086e36051ff9797798e09)), closes [#464](https://github.com/DTStack/dt-sql-parser/issues/464) [#464](https://github.com/DTStack/dt-sql-parser/issues/464) [#464](https://github.com/DTStack/dt-sql-parser/issues/464)
27+
28+
29+
### Features
30+
31+
* support query result and derived table entity collecting ([#434](https://github.com/DTStack/dt-sql-parser/issues/434)) ([a176253](https://github.com/DTStack/dt-sql-parser/commit/a176253799b514fcd169c82f2706c2fe2810d85c))
32+
33+
### [4.4.1](https://github.com/DTStack/dt-sql-parser/compare/v4.4.0...v4.4.1) (2025-12-22)
34+
35+
### Bug Fixes
36+
2737
* **trino:** add selectItem rule to candidates for column suggestions ([81a361f](https://github.com/DTStack/dt-sql-parser/commit/81a361fb8e45e81c8826cba212f0007443bf12b5))
2838

2939
## [4.4.0](https://github.com/DTStack/dt-sql-parser/compare/v4.4.0-beta.0...v4.4.0) (2025-11-26)

README-zh_CN.md

Lines changed: 173 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ dt-sql-parser 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 开发
2626
- PostgreSQL
2727
- Trino
2828
- Impala
29+
- GenericSQL
2930

3031
> 提示:当前所有的 SQL Parser 是 `Typescript` 语言版本,如果有需要,可以尝试编译 Grammar 文件到其他目标语言。
3132
@@ -51,7 +52,7 @@ yarn add dt-sql-parser
5152
## 使用
5253
在开始使用前,需要先了解基本用法。`dt-sql-parser` 为不同类型的 SQL 分别提供相应的 SQL 类:
5354
```typescript
54-
import { MySQL, FlinkSQL, SparkSQL, HiveSQL, PostgreSQL, TrinoSQL, ImpalaSQL } from 'dt-sql-parser';
55+
import { MySQL, FlinkSQL, SparkSQL, HiveSQL, PostgreSQL, TrinoSQL, ImpalaSQL, GenericSQL } from 'dt-sql-parser';
5556
```
5657

5758
在使用语法校验,自动补全等功能之前,需要先实例化对应 SQL 类,以 `MySQL` 为例:
@@ -330,6 +331,8 @@ console.log(sqlSlices)
330331
{
331332
entityContextType: 'table',
332333
text: 'tb',
334+
declareType: 0,
335+
isAccessible: true,
333336
position: {
334337
line: 1,
335338
startIndex: 14,
@@ -346,16 +349,181 @@ console.log(sqlSlices)
346349
},
347350
relatedEntities: null,
348351
columns: null,
349-
isAlias: false,
350-
origin: null,
351-
alias: null
352-
}
352+
_alias: null,
353+
_comment: null
354+
},
355+
{
356+
entityContextType: 'queryResult',
357+
text: '*',
358+
declareType: undefined,
359+
isAccessible: null,
360+
position: {
361+
line: 1,
362+
startIndex: 7,
363+
endIndex: 7,
364+
startColumn: 8,
365+
endColumn: 9
366+
},
367+
belongStmt: {
368+
stmtContextType: 'selectStmt',
369+
position: [Object],
370+
rootStmt: [Object],
371+
parentStmt: [Object],
372+
isContainCaret: true
373+
},
374+
relatedEntities: [
375+
// relate to table entity
376+
],
377+
columns: [
378+
// relate to `*` column entity
379+
],
380+
_alias: null,
381+
_comment: null,
382+
},
353383
]
354384
*/
355385
```
356386

357387
行列号信息不是必传的,如果传了行列号信息,那么收集到的实体中,如果实体位于对应行列号所在的语句下,那么实体的所属的语句对象上会带有 `isContainCaret` 标识,这在与自动补全功能结合时,可以帮助你快速筛选出需要的实体信息。
358388

389+
在子查询嵌套的情况下,`isContainCaret` 可能不足以筛选出需要的实体,例如对于SQL: `SELECT id FROM t1 LEFT JOIN (SELECT id, name FROM t2) AS t3 ON t1.id = t3.id`, 当我们光标处在内部查询`t3`派生表内时, 期望提供`t2`表下的字段补全, 但由于`t1``t2``isContainCaret`都为`true`, 无法更细节的区分出可用的表实体。
390+
391+
所以, 针对`entityContextType``table`的实体类型, 收集到的实体上会带有`isAccessible`标识, 用于表示该实体是否可访问。`isAccessible`内部利用作用域深度来判断, 当实体的语句作用域深度与光标所在语句的作用域深度相同且`isContainCaret``true`时, 则认为该实体可访问(当然这种判断方法并非绝对,但能排除大多数无关实体)。
392+
393+
#### 实体额外信息说明
394+
395+
**别名(Alias)信息**
396+
397+
当实体具有别名时,会在实体对象中包含 `_alias` 字段:
398+
- `_alias`: 别名的详细信息,包含文本内容和位置信息
399+
400+
```typescript
401+
// 示例:SELECT u.name FROM users AS u
402+
{
403+
entityContextType: 'table',
404+
text: 'users',
405+
_alias: { // 表的别名信息
406+
text: 'u',
407+
startIndex: 29,
408+
endIndex: 29,
409+
startColumn: 30,
410+
endColumn: 31,
411+
line: 1
412+
}
413+
}
414+
415+
// 示例:SELECT name AS username FROM users
416+
{
417+
entityContextType: 'column',
418+
text: 'name',
419+
_alias: { // 列的别名信息
420+
text: 'username',
421+
startIndex: 15,
422+
endIndex: 22,
423+
startColumn: 16,
424+
endColumn: 24,
425+
line: 1
426+
}
427+
}
428+
```
429+
430+
**声明类型(DeclareType**
431+
432+
`declareType` 字段用于标识实体的声明方式,不同类型的实体有不同的声明类型:
433+
434+
**表实体的声明类型(TableDeclareType):**
435+
- `LITERAL`:字面量表名,如 `SELECT * FROM users`
436+
- `EXPRESSION`:表达式定义的表,如子查询 `SELECT * FROM (SELECT * FROM users) AS t`
437+
438+
**列实体的声明类型(ColumnDeclareType):**
439+
- `LITERAL`:字面量列名,如 `SELECT id, name FROM users`
440+
- `ALL`:通配符语法,如 `SELECT users.* FROM users`
441+
- `EXPRESSION`:复杂表达式,如子查询、CASE语句、函数调用等
442+
443+
```typescript
444+
// 示例:不同 declareType 的示例
445+
// 1. 字面量列
446+
{
447+
entityContextType: 'column',
448+
text: 'name',
449+
declareType: ColumnDeclareType.LITERAL,
450+
}
451+
452+
// 2. 通配符列
453+
{
454+
entityContextType: 'column',
455+
text: 'users.*',
456+
declareType: ColumnDeclareType.ALL,
457+
}
458+
459+
// 3. 表达式列
460+
{
461+
entityContextType: 'column',
462+
text: 'CASE WHEN age > 18 THEN "adult" ELSE "minor" END',
463+
declareType: ColumnDeclareType.EXPRESSION,
464+
}
465+
```
466+
467+
468+
**其他元信息字段**
469+
470+
**注释信息(Comment**
471+
- `_comment`:实体的注释信息,主要用于 CREATE 语句中的列注释或表注释
472+
473+
```typescript
474+
// 示例:CREATE TABLE users (id INT COMMENT 'USERID', name VARCHAR(50) COMMENT 'USERNAME')
475+
{
476+
entityContextType: 'column',
477+
text: 'id',
478+
_comment: {
479+
text: "'USERID'",
480+
startIndex: 35,
481+
endIndex: 42,
482+
startColumn: 36,
483+
endColumn: 44,
484+
line: 1
485+
},
486+
_colType: {
487+
text: 'INT',
488+
startIndex: 23,
489+
endIndex: 42,
490+
startColumn: 24,
491+
endColumn: 44,
492+
line: 1
493+
}
494+
}
495+
```
496+
497+
**列类型信息(Column Type**
498+
- `_colType`:列的数据类型信息,仅用于建表语句中的列实体,包含类型名称和位置信息
499+
500+
```typescript
501+
// 示例:CREATE TABLE users (name VARCHAR(50) NOT NULL)
502+
{
503+
entityContextType: 'columnCreate',
504+
text: 'name',
505+
_colType: {
506+
text: 'VARCHAR(50)',
507+
startIndex: 25,
508+
endIndex: 35,
509+
startColumn: 26,
510+
endColumn: 37,
511+
line: 1
512+
}
513+
}
514+
```
515+
516+
**关联信息字段**
517+
- `relatedEntities`:与当前实体相关的其他实体列表,例如查询结果实体关联的表实体
518+
- `columns`:包含的字段列表
519+
520+
一个简单的实体关联实例:
521+
522+
```sql
523+
CREATE TABLE tb1 AS SELECT id FROM tb2;
524+
```
525+
526+
![relation-image](./docs/images/relation.png)
359527

360528
### 获取语义上下文信息
361529
调用 SQL 实例上的 `getSemanticContextAtCaretPosition` 方法,传入 sql 文本和指定位置的行列号, 例如:

0 commit comments

Comments
 (0)