Skip to content

Commit 928de4e

Browse files
ajitpratap0Ajit Pratap Singhclaude
authored
feat(dialect): add ClickHouse SQL dialect support (#418)
* feat(keywords): add ClickHouse dialect support - Add DialectClickHouse constant and register in AllDialects()/DialectKeywords() - Add CLICKHOUSE_SPECIFIC keyword set (FINAL, ENGINE, CODEC, TTL, REPLICATED, DISTRIBUTED, MATERIALIZED, ALIAS, FixedString, LowCardinality, Nullable, DateTime64, IPv4, IPv6, PASTE) - Wire CLICKHOUSE_SPECIFIC into keywords.New() switch - Add PREWHERE/FINAL/SETTINGS/FORMAT to tokenizer's hardcoded keywordTokenTypes map so they are tokenized as TokenTypeKeyword (not IDENTIFIER), enabling correct clause boundary detection during FROM clause parsing - Add parsePrewhereClause() to SELECT parsing for ClickHouse dialect - Add PrewhereClause field to SelectStatement AST node (with Children(), SQL(), and PutSelectStatement cleanup) - Update snowflake_test.go dialect registry test to include ClickHouse - Add clickhouse_test.go with dialect registration and PREWHERE parsing tests Closes #392 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(dialect): add ClickHouse SQL dialect support - Add DialectClickHouse to keywords package with 25+ ClickHouse-specific keywords: PREWHERE, FINAL, ENGINE, GLOBAL, ASOF, TTL, FORMAT, CODEC, SETTINGS, DISTRIBUTED, MERGETREE family, and more - Add PreWhere field to SelectStatement AST node - Parse PREWHERE clause in ClickHouse dialect mode (pre-filter executed before WHERE for MergeTree optimization) - Register ClickHouse in AllDialects() and DialectKeywords() - Add keyword and parser integration tests Closes #392 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(clickhouse): address review feedback on PR #418 - Remove SETTINGS/FORMAT from global tokenizer keyword map; only PREWHERE and FINAL belong there (SETTINGS/FORMAT are generic words that would conflict with column names in other dialects) - Add Final bool field to TableReference AST node; parse FINAL modifier after table reference in ClickHouse dialect - Handle GLOBAL IN / GLOBAL NOT IN in ClickHouse dialect expression parser (GLOBAL is consumed as a modifier, IN parsed normally) - Fix TestClickHouseFinal to assert FINAL=true on TableReference - Convert TestClickHouseKeywordRecognition to t.Skip (SAMPLE clause not yet implemented) - Add [Unreleased] CHANGELOG.md entry for ClickHouse dialect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(clickhouse): fix stale comment in CLICKHOUSE_SPECIFIC Clarify that PREWHERE and FINAL appear in both CLICKHOUSE_SPECIFIC and the tokenizer's hardcoded keywordTokenTypes map (not the "base keyword set"), and that SETTINGS/FORMAT are not there. Removes the inaccurate statement that these keywords are "in the base keyword set". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(clickhouse): handle GLOBAL JOIN and document FINAL limitation - Parse GLOBAL JOIN in ClickHouse dialect by consuming the GLOBAL modifier before join type detection in parseJoinType(); prevents regression where GLOBAL (now TokenTypeKeyword) would cause join parsing to fail - Add FINAL limitation comment: parser only supports FINAL on the first (primary) table reference in a FROM clause - Add TestClickHouseGlobalJoin and TestClickHouseGlobalIn parser tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(clickhouse): add FINAL modifier edge case tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(clickhouse): emit FINAL in SQL() round-trip output Add FINAL keyword emission to tableRefSQL() in sql.go so that TableReference.Final=true is preserved during AST→SQL round-trip serialization. Add TestClickHouseFinalRoundtrip to verify the SQL() output contains FINAL after parsing SELECT * FROM orders FINAL with DialectClickHouse. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Ajit Pratap Singh <ajitpratapsingh@Ajits-Mac-mini-2655.local> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent e12c0b1 commit 928de4e

File tree

15 files changed

+470
-0
lines changed

15 files changed

+470
-0
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,16 @@ All notable changes to GoSQLX will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
- ClickHouse SQL dialect (`DialectClickHouse = "clickhouse"`) with 30+ keywords: PREWHERE, FINAL, ENGINE, GLOBAL, ASOF, TTL, CODEC, FORMAT, SETTINGS, DISTRIBUTED, MergeTree family engines, ClickHouse-specific data types (FixedString, LowCardinality, Nullable, DateTime64, IPv4, IPv6)
12+
- `PrewhereClause` field on `SelectStatement` AST node for ClickHouse's pre-filter optimization clause
13+
- `Final` field on `TableReference` for ClickHouse's FINAL table modifier (forces MergeTree part merge before read)
14+
- PREWHERE clause parsing in ClickHouse dialect mode
15+
- FINAL modifier parsing in ClickHouse dialect mode
16+
- GLOBAL IN / GLOBAL NOT IN expression parsing in ClickHouse dialect mode
17+
818
## [1.12.1] - 2026-03-15 — Website Performance & Mobile Optimization
919

1020
### Improved

pkg/sql/ast/ast.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -227,6 +227,7 @@ type TableReference struct {
227227
Subquery *SelectStatement // For derived tables: (SELECT ...) AS alias
228228
Lateral bool // LATERAL keyword for correlated subqueries (PostgreSQL)
229229
TableHints []string // SQL Server table hints: WITH (NOLOCK), WITH (ROWLOCK, UPDLOCK), etc.
230+
Final bool // ClickHouse FINAL modifier: forces MergeTree part merge
230231
}
231232

232233
func (t *TableReference) statementNode() {}
@@ -392,6 +393,7 @@ type SelectStatement struct {
392393
From []TableReference
393394
TableName string // Added for pool operations
394395
Joins []JoinClause
396+
PrewhereClause Expression // ClickHouse PREWHERE clause (applied before WHERE, before reading data)
395397
Where Expression
396398
GroupBy []Expression
397399
Having Expression
@@ -492,6 +494,9 @@ func (s SelectStatement) Children() []Node {
492494
join := join // G601: Create local copy to avoid memory aliasing
493495
children = append(children, &join)
494496
}
497+
if s.PrewhereClause != nil {
498+
children = append(children, s.PrewhereClause)
499+
}
495500
if s.Where != nil {
496501
children = append(children, s.Where)
497502
}

pkg/sql/ast/pool.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -690,6 +690,7 @@ func PutSelectStatement(stmt *SelectStatement) {
690690
stmt.OrderBy = stmt.OrderBy[:0]
691691

692692
stmt.TableName = ""
693+
stmt.PrewhereClause = nil
693694
stmt.Where = nil
694695
stmt.Limit = nil
695696
stmt.Offset = nil

pkg/sql/ast/sql.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -560,6 +560,11 @@ func (s *SelectStatement) SQL() string {
560560
sb.WriteString(joinSQL(&j))
561561
}
562562

563+
if s.PrewhereClause != nil {
564+
sb.WriteString(" PREWHERE ")
565+
sb.WriteString(exprSQL(s.PrewhereClause))
566+
}
567+
563568
if s.Where != nil {
564569
sb.WriteString(" WHERE ")
565570
sb.WriteString(exprSQL(s.Where))
@@ -1298,6 +1303,9 @@ func tableRefSQL(t *TableReference) string {
12981303
sb.WriteString(" ")
12991304
sb.WriteString(t.Alias)
13001305
}
1306+
if t.Final {
1307+
sb.WriteString(" FINAL")
1308+
}
13011309
return sb.String()
13021310
}
13031311

pkg/sql/keywords/clickhouse.go

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
// Copyright 2026 GoSQLX Authors
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
package keywords
16+
17+
import "github.com/ajitpratap0/GoSQLX/pkg/models"
18+
19+
// CLICKHOUSE_SPECIFIC contains ClickHouse-specific SQL keywords and extensions.
20+
// These keywords are recognized when using DialectClickHouse.
21+
//
22+
// Note: PREWHERE and FINAL also appear in the tokenizer's hardcoded keywordTokenTypes
23+
// map (tokenizer.go) to ensure they are emitted as TokenTypeKeyword rather than
24+
// TokenTypeIdentifier. This is required for correct clause boundary detection during
25+
// FROM clause parsing. All other keywords here are dialect-scoped only.
26+
//
27+
// Examples: PREWHERE, FINAL, ENGINE, MERGETREE, CODEC, TTL, DISTRIBUTED, GLOBAL, ASOF
28+
var CLICKHOUSE_SPECIFIC = []Keyword{
29+
// ClickHouse-specific query clauses
30+
{Word: "PREWHERE", Type: models.TokenTypeKeyword, Reserved: true, ReservedForTableAlias: true},
31+
{Word: "FINAL", Type: models.TokenTypeKeyword, Reserved: true, ReservedForTableAlias: true},
32+
{Word: "SAMPLE", Type: models.TokenTypeKeyword, Reserved: true, ReservedForTableAlias: true},
33+
{Word: "GLOBAL", Type: models.TokenTypeKeyword, Reserved: true, ReservedForTableAlias: true},
34+
{Word: "ASOF", Type: models.TokenTypeKeyword, Reserved: true, ReservedForTableAlias: true},
35+
36+
// ClickHouse DDL — table engine and column options
37+
{Word: "ENGINE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
38+
{Word: "CODEC", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
39+
{Word: "TTL", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
40+
{Word: "GRANULARITY", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
41+
{Word: "SETTINGS", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
42+
{Word: "FORMAT", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
43+
{Word: "ALIAS", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
44+
{Word: "MATERIALIZED", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
45+
{Word: "TUPLE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
46+
47+
// MergeTree engine family
48+
{Word: "MERGETREE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
49+
{Word: "REPLACINGMERGETREE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
50+
{Word: "AGGREGATINGMERGETREE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
51+
{Word: "COLLAPSINGMERGETREE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
52+
{Word: "SUMMINGMERGETREE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
53+
{Word: "REPLICATEDMERGETREE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
54+
{Word: "REPLICATED", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
55+
56+
// Other table engines
57+
{Word: "DISTRIBUTED", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
58+
{Word: "MEMORY", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
59+
{Word: "LOG", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
60+
{Word: "TINYLOG", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
61+
{Word: "STRIPELOG", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
62+
63+
// ClickHouse data types (as keywords)
64+
{Word: "FIXEDSTRING", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
65+
{Word: "LOWCARDINALITY", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
66+
{Word: "NULLABLE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
67+
{Word: "DATETIME64", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
68+
{Word: "IPV4", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
69+
{Word: "IPV6", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
70+
71+
// JOIN modifiers
72+
{Word: "PASTE", Type: models.TokenTypeKeyword, Reserved: false, ReservedForTableAlias: false},
73+
}
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
// Copyright 2026 GoSQLX Authors
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
package keywords_test
16+
17+
import (
18+
"testing"
19+
20+
"github.com/ajitpratap0/GoSQLX/pkg/sql/keywords"
21+
)
22+
23+
func TestClickHouseDialectKeywords(t *testing.T) {
24+
kws := keywords.DialectKeywords(keywords.DialectClickHouse)
25+
if len(kws) == 0 {
26+
t.Fatal("expected ClickHouse keywords, got none")
27+
}
28+
found := map[string]bool{}
29+
for _, kw := range kws {
30+
found[kw.Word] = true
31+
}
32+
required := []string{"PREWHERE", "FINAL", "ENGINE", "GLOBAL", "ASOF", "TTL", "FORMAT"}
33+
for _, w := range required {
34+
if !found[w] {
35+
t.Errorf("missing expected ClickHouse keyword: %s", w)
36+
}
37+
}
38+
}
39+
40+
func TestClickHouseInAllDialects(t *testing.T) {
41+
found := false
42+
for _, d := range keywords.AllDialects() {
43+
if d == keywords.DialectClickHouse {
44+
found = true
45+
break
46+
}
47+
}
48+
if !found {
49+
t.Error("DialectClickHouse not in AllDialects()")
50+
}
51+
}
52+
53+
func TestIsValidDialectClickHouse(t *testing.T) {
54+
if !keywords.IsValidDialect("clickhouse") {
55+
t.Error("IsValidDialect should return true for 'clickhouse'")
56+
}
57+
}

pkg/sql/keywords/dialect.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,12 @@ const (
5858

5959
// DialectRedshift represents Amazon Redshift-specific keywords and extensions
6060
DialectRedshift SQLDialect = "redshift"
61+
62+
// DialectClickHouse represents ClickHouse-specific keywords and extensions.
63+
// Includes ClickHouse-specific clauses (PREWHERE, FINAL, SAMPLE), engine
64+
// definitions (ENGINE, CODEC, TTL), ClickHouse data types (FixedString,
65+
// LowCardinality, Nullable, DateTime64), and replication keywords (ON CLUSTER, GLOBAL).
66+
DialectClickHouse SQLDialect = "clickhouse"
6167
)
6268

6369
// DialectKeywords returns the additional keywords for a specific dialect.
@@ -86,6 +92,8 @@ func DialectKeywords(dialect SQLDialect) []Keyword {
8692
return SQLSERVER_SPECIFIC
8793
case DialectOracle:
8894
return ORACLE_SPECIFIC
95+
case DialectClickHouse:
96+
return CLICKHOUSE_SPECIFIC
8997
default:
9098
return nil
9199
}
@@ -131,6 +139,7 @@ func AllDialects() []SQLDialect {
131139
DialectSnowflake,
132140
DialectBigQuery,
133141
DialectRedshift,
142+
DialectClickHouse,
134143
}
135144
}
136145

pkg/sql/keywords/keywords.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,8 @@ func New(dialect SQLDialect, ignoreCase bool) *Keywords {
271271
k.addKeywordsWithCategory(SQLITE_SPECIFIC)
272272
case DialectSnowflake:
273273
k.addKeywordsWithCategory(SNOWFLAKE_SPECIFIC)
274+
case DialectClickHouse:
275+
k.addKeywordsWithCategory(CLICKHOUSE_SPECIFIC)
274276
}
275277

276278
// Build O(1) lookup cache for compound keyword first-words

pkg/sql/keywords/snowflake_test.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -472,6 +472,7 @@ func TestDialectRegistry(t *testing.T) {
472472
DialectSnowflake: false,
473473
DialectBigQuery: false,
474474
DialectRedshift: false,
475+
DialectClickHouse: false,
475476
}
476477

477478
for _, d := range dialects {

0 commit comments

Comments
 (0)