Skip to content

Commit c50661f

Browse files
ajitpratap0Ajit Pratap Singhclaude
authored
fix(parser): allow TABLE/PARTITION/TABLES as identifiers in ClickHouse (#480) (#481)
* feat(parser): add SQL Server PIVOT/UNPIVOT clause parsing (#456) Add support for SQL Server and Oracle PIVOT/UNPIVOT operators in FROM clauses. PIVOT transforms rows to columns via an aggregate function, while UNPIVOT performs the reverse column-to-row transformation. - Add PivotClause and UnpivotClause AST node types - Add Pivot/Unpivot fields to TableReference struct - Implement parsePivotClause/parseUnpivotClause in new pivot.go - Wire parsing into parseFromTableReference and parseJoinedTableRef - Add PIVOT/UNPIVOT to tokenizer keyword map for correct token typing - Update formatter to render PIVOT/UNPIVOT clauses - Enable testdata/mssql/11_pivot.sql and 12_unpivot.sql - Add 4 dedicated tests covering subquery+alias, plain table, AS alias Closes #456 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * security: add CVE-2026-32285 to .trivyignore CVE-2026-32285 affects github.com/buger/jsonparser v1.1.1, which is a transitive dependency via mark3labs/mcp-go → invopop/jsonschema → wk8/go-ordered-map → buger/jsonparser. No fixed version is available upstream. The package is not called directly by any GoSQLX code and risk is scoped to MCP JSON schema generation. Added to .trivyignore until a patched version is released. Fixes Trivy Repository Scan CI failures in PR #475 and #477. * fix(parser): scope PIVOT/UNPIVOT to SQL Server/Oracle dialects PIVOT and UNPIVOT were registered in the global tokenizer keyword map, which made them reserved words across every dialect. Queries like `SELECT pivot FROM users` then failed in PostgreSQL/MySQL/SQLite/ ClickHouse where these identifiers are perfectly legal. Changes: - Remove PIVOT/UNPIVOT from tokenizer keywordTokenTypes (they are now tokenized as identifiers in all dialects). - Gate isPivotKeyword/isUnpivotKeyword on the parser dialect (TSQL/ Oracle only) and accept identifier-typed tokens by value match. - Skip alias consumption in parseFromTableReference / parseJoinedTableRef when the upcoming identifier is a contextual PIVOT/UNPIVOT keyword, so the pivot-clause parser can claim it. - Fix unsafe single-value type assertion in TestTSQL_PivotWithASAlias to comply with the project's mandatory two-value form. - Add TestPivotIdentifierInNonTSQLDialects regression covering pivot/ unpivot as identifiers in PostgreSQL, MySQL, and SQLite. All parser/tokenizer/formatter tests pass with -race. * fix(parser): polish PIVOT/UNPIVOT round-trip and validation - Formatter emits AS before PIVOT/UNPIVOT aliases for clean round-trip. - Tokenizer records Quote='[' on SQL Server bracket-quoted identifiers; pivot parser uses renderQuotedIdent to preserve [North] etc. in PivotClause.InValues and UnpivotClause.InColumns. - Reject empty IN lists for both PIVOT and UNPIVOT. - Extract parsePivotAlias helper, collapsing four duplicated alias blocks in select_subquery.go. - Add TestPivotNegativeCases (missing parens, missing FOR/IN, empty IN) and TestPivotBracketedInValuesPreserved. Full test suite passes with -race. * fix(parser): escape embedded delimiters and fix empty-IN error order - renderQuotedIdent now doubles embedded `]`, `"`, and `` ` `` per dialect convention so identifiers like [foo]bar] round-trip unambiguously. - Empty PIVOT/UNPIVOT IN list check now runs before the closing-`)` check so the user-facing error is "at least one value/column..." instead of the misleading ") to close ... IN list". - Clarify renderQuotedIdent comment to reference Token.Quote and Word.QuoteStyle as the actual sources. * refactor(formatter): thread nodeFormatter through tableRefSQL and joinSQL Previously these package-level renderers hardcoded keyword literals (PIVOT, UNPIVOT, FOR, IN, LATERAL, JOIN, ON) which bypassed the caller's case policy (f.kw). Thread *nodeFormatter into both functions and route every keyword through f.kw so uppercase/lowercase options apply uniformly across FROM, JOIN, MERGE, DELETE USING, and UPDATE FROM paths. Addresses claude-review feedback on PR #477. All formatter, parser, and tokenizer tests pass with -race. * fix(parser): allow TABLE/PARTITION/TABLES as identifiers in ClickHouse (#480) ClickHouse system tables (system.replicas, system.parts, system.tables) expose columns named `table` and `partition`, and queries commonly reference `system.tables`. The parser previously rejected these because TABLE, PARTITION, and TABLES tokenize as keywords and the non-reserved-keyword-as-identifier path was gated to SQL Server only. - Extend the gate in parsePrimaryExpression to also enable for ClickHouse. - Add TokenTypePartition and the "TABLES" keyword value to isNonReservedKeyword so they can serve as identifiers in expression position and after qualifiers. Closes #480 * fix(parser): allow ROWS as identifier in ClickHouse (#480, #482) Extend the non-reserved-keyword allowlist to also accept ROWS as a column identifier. ClickHouse exposes a `rows` column in system.tables and other system tables; ROWS is otherwise reserved for window-frame syntax (ROWS BETWEEN ...) but is unambiguous in expression position. Part of #482 (ClickHouse parser gaps from QA sweep). --------- Co-authored-by: Ajit Pratap Singh <ajitpratapsingh@Ajits-Mac-mini-2655.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent daea668 commit c50661f

File tree

3 files changed

+124
-4
lines changed

3 files changed

+124
-4
lines changed
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
// Copyright 2026 GoSQLX Authors
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
9+
package parser_test
10+
11+
import (
12+
"testing"
13+
14+
"github.com/ajitpratap0/GoSQLX/pkg/gosqlx"
15+
"github.com/ajitpratap0/GoSQLX/pkg/sql/keywords"
16+
)
17+
18+
// TestClickHouseTableAsIdentifier verifies that the ClickHouse dialect accepts
19+
// `table` as a column identifier in SELECT lists, function arguments, and
20+
// GROUP BY clauses. ClickHouse system tables (system.replicas, system.tables,
21+
// system.parts) all expose a `table` column, so this is a common real-world
22+
// pattern. Regression test for issue #480.
23+
func TestClickHouseTableAsIdentifier(t *testing.T) {
24+
queries := map[string]string{
25+
"replicas_with_table_column": `SELECT
26+
database,
27+
table,
28+
is_leader,
29+
is_readonly,
30+
is_session_expired,
31+
parts_to_check,
32+
queue_size,
33+
inserts_in_queue,
34+
merges_in_queue,
35+
absolute_delay,
36+
last_queue_update,
37+
zookeeper_path
38+
FROM system.replicas
39+
ORDER BY absolute_delay DESC`,
40+
41+
"tables_with_bytes_on_disk": `SELECT
42+
database,
43+
table,
44+
engine,
45+
formatReadableSize(bytes_on_disk) AS size,
46+
parts,
47+
active_parts
48+
FROM system.tables
49+
WHERE engine LIKE '%MergeTree%'
50+
AND is_temporary = 0
51+
ORDER BY bytes_on_disk DESC
52+
LIMIT 10`,
53+
54+
"tables_with_total_bytes": `SELECT
55+
database,
56+
table,
57+
engine,
58+
formatReadableSize(total_bytes) AS size,
59+
parts,
60+
active_parts
61+
FROM system.tables
62+
WHERE engine LIKE '%MergeTree%'
63+
AND is_temporary = 0
64+
ORDER BY total_bytes DESC
65+
LIMIT 10`,
66+
67+
"parts_with_concat_table": `SELECT
68+
concat(database, '.' ,table) AS table_name,
69+
count() AS part_count,
70+
max(partition) AS latest_partition,
71+
formatReadableSize(sum(bytes_on_disk)) AS total_size
72+
FROM system.parts
73+
WHERE active = 1
74+
AND database NOT IN ('system')
75+
GROUP BY database, table
76+
ORDER BY part_count DESC
77+
LIMIT 10`,
78+
79+
"parts_having_count": `SELECT
80+
database,
81+
table,
82+
count() AS parts,
83+
formatReadableSize(sum(bytes_on_disk)) AS size
84+
FROM system.parts
85+
WHERE active = 1
86+
AND database NOT IN ('system')
87+
GROUP BY database, table
88+
HAVING parts > 300
89+
ORDER BY parts DESC`,
90+
91+
// system.tables exposes a `rows` column. ROWS is a reserved keyword
92+
// (used in window frames: ROWS BETWEEN ...), but in ClickHouse it is
93+
// a valid column name in system tables.
94+
"tables_with_rows_column": `SELECT
95+
database,
96+
table,
97+
rows,
98+
total_bytes
99+
FROM system.tables
100+
WHERE database = 'default'
101+
ORDER BY rows DESC`,
102+
}
103+
104+
for name, query := range queries {
105+
query := query
106+
t.Run(name, func(t *testing.T) {
107+
parsed, err := gosqlx.ParseWithDialect(query, keywords.DialectClickHouse)
108+
if err != nil {
109+
t.Fatalf("ParseWithDialect failed: %v", err)
110+
}
111+
if parsed == nil {
112+
t.Fatal("expected non-nil AST")
113+
}
114+
})
115+
}
116+
}

pkg/sql/parser/expressions_literal.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ func (p *Parser) parsePrimaryExpression() (ast.Expression, error) {
103103
return funcCall, nil
104104
}
105105

106-
if p.isType(models.TokenTypeIdentifier) || p.isType(models.TokenTypeDoubleQuotedString) || (p.dialect == string(keywords.DialectSQLServer) && p.isNonReservedKeyword()) {
106+
if p.isType(models.TokenTypeIdentifier) || p.isType(models.TokenTypeDoubleQuotedString) || ((p.dialect == string(keywords.DialectSQLServer) || p.dialect == string(keywords.DialectClickHouse)) && p.isNonReservedKeyword()) {
107107
// Handle identifiers and function calls
108108
// Double-quoted strings are treated as identifiers in SQL (e.g., "column_name")
109109
// Non-reserved keywords (TARGET, SOURCE, etc.) can also be used as identifiers

pkg/sql/parser/parser.go

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -958,14 +958,18 @@ func (p *Parser) isNonReservedKeyword() bool {
958958
case models.TokenTypeTarget, models.TokenTypeSource, models.TokenTypeMatched:
959959
return true
960960
case models.TokenTypeTable, models.TokenTypeIndex, models.TokenTypeView,
961-
models.TokenTypeKey, models.TokenTypeColumn, models.TokenTypeDatabase:
961+
models.TokenTypeKey, models.TokenTypeColumn, models.TokenTypeDatabase,
962+
models.TokenTypePartition, models.TokenTypeRows:
962963
// DDL keywords that are commonly used as quoted identifiers in MySQL (backtick)
963-
// and SQL Server (bracket) dialects.
964+
// and SQL Server (bracket) dialects, and as plain column names in ClickHouse
965+
// system tables (system.parts.partition, system.replicas.table,
966+
// system.tables.rows, etc).
964967
return true
965968
case models.TokenTypeKeyword:
966969
// Token may have generic Type; check value for specific keywords
967970
switch strings.ToUpper(p.currentToken.Token.Value) {
968-
case "TARGET", "SOURCE", "MATCHED", "VALUE", "NAME", "TYPE", "STATUS":
971+
case "TARGET", "SOURCE", "MATCHED", "VALUE", "NAME", "TYPE", "STATUS",
972+
"TABLES":
969973
return true
970974
}
971975
}

0 commit comments

Comments
 (0)