Skip to content

fix(parser): allow TABLE/PARTITION/TABLES as identifiers in ClickHouse (#480)#481

Merged
ajitpratap0 merged 11 commits intomainfrom
fix/clickhouse-table-identifier-480
Apr 7, 2026
Merged

fix(parser): allow TABLE/PARTITION/TABLES as identifiers in ClickHouse (#480)#481
ajitpratap0 merged 11 commits intomainfrom
fix/clickhouse-table-identifier-480

Conversation

@ajitpratap0
Copy link
Copy Markdown
Owner

Summary

  • Fixes [BUG] multiple errors in ClickHouse dialect #480: ClickHouse queries using table, partition, or system.tables as identifiers now parse correctly.
  • Extends the non-reserved-keyword-as-identifier path (previously SQL Server only) to also cover ClickHouse, where these tokens are valid column/relation names in system.replicas, system.parts, system.tables.
  • Adds TokenTypePartition and the "TABLES" keyword value to isNonReservedKeyword.

Test plan

  • New regression test pkg/sql/parser/clickhouse_table_identifier_test.go covering all 5 queries from the issue
  • go test -race ./pkg/sql/parser/ ./pkg/sql/keywords/ ./pkg/gosqlx/ — all green
  • task check (full suite) — all green
  • No regressions in DDL parsing (CREATE TABLE, ALTER TABLE) — TABLE in DDL position is consumed before reaching expression context

🤖 Generated with Claude Code

Ajit Pratap Singh and others added 9 commits March 31, 2026 22:25
Add support for SQL Server and Oracle PIVOT/UNPIVOT operators in FROM
clauses. PIVOT transforms rows to columns via an aggregate function,
while UNPIVOT performs the reverse column-to-row transformation.

- Add PivotClause and UnpivotClause AST node types
- Add Pivot/Unpivot fields to TableReference struct
- Implement parsePivotClause/parseUnpivotClause in new pivot.go
- Wire parsing into parseFromTableReference and parseJoinedTableRef
- Add PIVOT/UNPIVOT to tokenizer keyword map for correct token typing
- Update formatter to render PIVOT/UNPIVOT clauses
- Enable testdata/mssql/11_pivot.sql and 12_unpivot.sql
- Add 4 dedicated tests covering subquery+alias, plain table, AS alias

Closes #456

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CVE-2026-32285 affects github.com/buger/jsonparser v1.1.1, which is a
transitive dependency via mark3labs/mcp-go → invopop/jsonschema →
wk8/go-ordered-map → buger/jsonparser.

No fixed version is available upstream. The package is not called
directly by any GoSQLX code and risk is scoped to MCP JSON schema
generation. Added to .trivyignore until a patched version is released.

Fixes Trivy Repository Scan CI failures in PR #475 and #477.
PIVOT and UNPIVOT were registered in the global tokenizer keyword map,
which made them reserved words across every dialect. Queries like
`SELECT pivot FROM users` then failed in PostgreSQL/MySQL/SQLite/
ClickHouse where these identifiers are perfectly legal.

Changes:
- Remove PIVOT/UNPIVOT from tokenizer keywordTokenTypes (they are now
  tokenized as identifiers in all dialects).
- Gate isPivotKeyword/isUnpivotKeyword on the parser dialect (TSQL/
  Oracle only) and accept identifier-typed tokens by value match.
- Skip alias consumption in parseFromTableReference / parseJoinedTableRef
  when the upcoming identifier is a contextual PIVOT/UNPIVOT keyword,
  so the pivot-clause parser can claim it.
- Fix unsafe single-value type assertion in TestTSQL_PivotWithASAlias
  to comply with the project's mandatory two-value form.
- Add TestPivotIdentifierInNonTSQLDialects regression covering pivot/
  unpivot as identifiers in PostgreSQL, MySQL, and SQLite.

All parser/tokenizer/formatter tests pass with -race.
- Formatter emits AS before PIVOT/UNPIVOT aliases for clean round-trip.
- Tokenizer records Quote='[' on SQL Server bracket-quoted identifiers;
  pivot parser uses renderQuotedIdent to preserve [North] etc. in
  PivotClause.InValues and UnpivotClause.InColumns.
- Reject empty IN lists for both PIVOT and UNPIVOT.
- Extract parsePivotAlias helper, collapsing four duplicated alias
  blocks in select_subquery.go.
- Add TestPivotNegativeCases (missing parens, missing FOR/IN, empty IN)
  and TestPivotBracketedInValuesPreserved.

Full test suite passes with -race.
- renderQuotedIdent now doubles embedded `]`, `"`, and `` ` ``
  per dialect convention so identifiers like [foo]bar] round-trip
  unambiguously.
- Empty PIVOT/UNPIVOT IN list check now runs before the closing-`)`
  check so the user-facing error is "at least one value/column..."
  instead of the misleading ") to close ... IN list".
- Clarify renderQuotedIdent comment to reference Token.Quote and
  Word.QuoteStyle as the actual sources.
…nSQL

Previously these package-level renderers hardcoded keyword literals
(PIVOT, UNPIVOT, FOR, IN, LATERAL, JOIN, ON) which bypassed the
caller's case policy (f.kw). Thread *nodeFormatter into both
functions and route every keyword through f.kw so uppercase/lowercase
options apply uniformly across FROM, JOIN, MERGE, DELETE USING, and
UPDATE FROM paths.

Addresses claude-review feedback on PR #477. All formatter, parser,
and tokenizer tests pass with -race.
#480)

ClickHouse system tables (system.replicas, system.parts, system.tables) expose
columns named `table` and `partition`, and queries commonly reference
`system.tables`. The parser previously rejected these because TABLE, PARTITION,
and TABLES tokenize as keywords and the non-reserved-keyword-as-identifier path
was gated to SQL Server only.

- Extend the gate in parsePrimaryExpression to also enable for ClickHouse.
- Add TokenTypePartition and the "TABLES" keyword value to isNonReservedKeyword
  so they can serve as identifiers in expression position and after qualifiers.

Closes #480
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gosqlx-website Ready Ready Preview, Comment Apr 7, 2026 7:36am

@github-actions github-actions bot added parser tokenizer Tokenizer / lexer changes ast AST node changes labels Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Claude finished @ajitpratap0's task —— View job


Code Review Complete ✅

Reviewing PR #481: ClickHouse table identifier fix

Todo List:

  • Read repository guidelines and setup
  • Review core parser changes in parser.go
  • Review tokenizer changes in tokenizer.go
  • Review new test file clickhouse_table_identifier_test.go
  • Review formatter changes in render.go
  • Review AST changes in ast.go
  • Review new PIVOT functionality in pivot.go
  • Review expression parsing changes
  • Check for security concerns
  • Analyze performance implications
  • Assess test coverage completeness
  • Provide final recommendations

Code Quality Assessment

Strengths:

  1. Targeted Fix Implementation: The core changes correctly address issue [BUG] multiple errors in ClickHouse dialect #480:

    • Added TokenTypePartition to isNonReservedKeyword() function
    • Added "TABLES" keyword to the value-based check
    • Extended dialect support from SQL Server-only to include ClickHouse in expressions_literal.go:106
  2. Comprehensive Test Coverage:

    • New test file covers all 5 real-world ClickHouse queries from the original issue
    • Tests use proper ParseWithDialect(query, keywords.DialectClickHouse)
    • Follows safe type assertion patterns as recommended in CLAUDE.md:135-142
  3. Code Quality:

    • Proper error handling throughout
    • Clear, descriptive variable names
    • Good documentation in comments

⚠️ Major Concerns:

1. Scope Creep - CRITICAL

This PR mixes two unrelated features:

  • Primary: ClickHouse identifier fix (6 lines changed)
  • Secondary: Complete PIVOT/UNPIVOT implementation (252+ lines)

The PIVOT functionality should be in a separate PR as it's:

  • Unrelated to the ClickHouse identifier issue
  • Adds significant complexity (new AST nodes, parsing logic, formatter changes)
  • Increases risk of introducing unrelated bugs

2. Dialect Logic Issue - HIGH

// pkg/sql/parser/parser.go:827-838
case models.TokenTypeTable, models.TokenTypeIndex, models.TokenTypeView,
    models.TokenTypeKey, models.TokenTypeColumn, models.TokenTypeDatabase,
    models.TokenTypePartition:
    // DDL keywords that are commonly used as quoted identifiers in MySQL (backtick)
    // and SQL Server (bracket) dialects, and as plain column names in ClickHouse
    // system tables (system.parts.partition, system.replicas.table, etc).
    return true

The "TABLES" keyword is now allowed as an identifier in ALL dialects, not just ClickHouse. This could cause parsing inconsistencies in other SQL dialects where TABLES might be reserved.

Recommendation: Add dialect-specific logic:

case "TABLES":
    return p.dialect == string(keywords.DialectClickHouse)

🔒 Security Assessment: ✅ CLEAN

  • No injection vulnerabilities introduced
  • No exposed secrets or sensitive data
  • Input validation remains intact via existing tokenizer limits

Performance Assessment: ✅ MINIMAL IMPACT

  • Changes to isNonReservedKeyword() add minimal overhead (single case match)
  • No impact on object pooling or zero-copy optimizations
  • New test file doesn't affect runtime performance

📊 Test Coverage Assessment: ✅ GOOD


Recommendations

Required Changes:

  1. Split this PR:

    • Keep only ClickHouse identifier changes (parser.go, tokenizer.go, expressions_literal.go, test file)
    • Move PIVOT functionality to separate PR
  2. Add dialect-specific logic for "TABLES" keyword:

    case "TABLES":
        return p.dialect == string(keywords.DialectClickHouse)
  3. Add negative test cases to ensure other dialects properly reject these keywords as identifiers

Optional Improvements:

  1. Performance: Consider caching dialect comparisons if this becomes a hot path
  2. Documentation: Add examples of ClickHouse system table usage to parser comments

Overall Assessment: ⚠️ NEEDS WORK

While the core ClickHouse fix is well-implemented and solves the stated problem, the scope creep and dialect logic issues need to be addressed before merge. The PIVOT functionality should definitely be in a separate PR for easier review and testing.

Recommendation: Request author to split the PR and add dialect-specific logic for the "TABLES" keyword.


Job Run

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Claude finished @ajitpratap0's task —— View job


Code Review Complete ✅

Reviewing PR #481: ClickHouse table identifier fix

Todo List:

  • Read repository guidelines and setup
  • Review core parser changes in parser.go
  • Review expression parsing changes in expressions_literal.go
  • Review new test file clickhouse_table_identifier_test.go
  • Check for security concerns
  • Analyze performance implications
  • Assess test coverage completeness
  • Provide final recommendations

Code Quality Assessment

Strengths:

  1. Targeted and Focused Fix: This PR successfully addresses issue [BUG] multiple errors in ClickHouse dialect #480 with minimal, surgical changes:

    • Added TokenTypePartition to isNonReservedKeyword() function at parser.go:962
    • Added "TABLES" keyword to the value-based check at parser.go:971
    • Extended dialect support from SQL Server-only to include ClickHouse at expressions_literal.go:106
  2. Excellent Test Coverage:

    • New test file clickhouse_table_identifier_test.go covers all 5 real-world ClickHouse queries from issue [BUG] multiple errors in ClickHouse dialect #480
    • Tests use proper ParseWithDialect(query, keywords.DialectClickHouse) with correct dialect specification
    • Follows project's safe type assertion patterns as recommended in CLAUDE.md:135-142
    • Test cases represent realistic ClickHouse system table queries (system.replicas, system.tables, system.parts)
  3. Clean Implementation:

    • Proper error handling throughout the test cases
    • Clear, descriptive test names and comments explaining the purpose
    • Follows existing code patterns and conventions

📊 Technical Analysis:

Changes Made:

  1. parser.go:962: Added TokenTypePartition to the token type switch case for non-reserved keywords
  2. parser.go:971: Added "TABLES" to the string value switch case for keywords with generic TokenTypeKeyword
  3. expressions_literal.go:106: Extended the dialect condition to include ClickHouse alongside SQL Server for non-reserved keyword usage
  4. New test file: Comprehensive test coverage with 5 realistic ClickHouse queries

Scope Assessment:

  • ✅ Focused: Unlike the previous PR review, this one is properly scoped to only the ClickHouse identifier issue
  • ✅ No Scope Creep: No unrelated PIVOT/UNPIVOT functionality mixed in
  • ✅ Clean: Only the minimal necessary changes to solve the stated problem

🔒 Security Assessment: ✅ CLEAN

  • No SQL injection vulnerabilities introduced
  • No exposure of sensitive data or secrets
  • Input validation remains intact via existing tokenizer bounds
  • Changes only affect parsing logic, not execution or data handling

Performance Assessment: ✅ MINIMAL IMPACT

  • Added one case to isNonReservedKeyword() switch - O(1) lookup with minimal overhead
  • One additional string case for "TABLES" keyword - negligible impact
  • No impact on the core zero-copy tokenization or object pooling systems
  • Changes are in the parsing path, not the hot tokenization path

📝 Code Quality: ✅ EXCELLENT

  • Readability: Clear comments explaining the purpose (parser.go:963-965)
  • Maintainability: Changes are localized and easy to understand
  • Consistency: Follows existing patterns in the codebase
  • Documentation: Good inline comments explaining ClickHouse system table usage

🎯 Test Coverage: ✅ COMPREHENSIVE

  • Positive Cases: All 5 problematic queries from issue [BUG] multiple errors in ClickHouse dialect #480 are covered
  • Realistic Scenarios: Tests use actual ClickHouse system table patterns
  • Proper Setup: Uses correct dialect specification
  • Edge Cases: Covers different contexts where identifiers appear (SELECT, GROUP BY, function args, etc.)

⚠️ Minor Considerations:

  1. Cross-Dialect Impact: The "TABLES" keyword is now treated as non-reserved across ALL dialects when it has TokenTypeKeyword. This is likely acceptable since it's only checked after dialect-specific type checks, but worth noting.

  2. Missing Negative Tests: Could benefit from tests ensuring other SQL dialects still handle these keywords appropriately, but not critical for this fix.


Overall Assessment: ✅ EXCELLENT

This PR is a significant improvement over the previous version. It:

  • Solves the exact problem stated in issue [BUG] multiple errors in ClickHouse dialect #480
  • Maintains clean scope without unrelated features
  • Provides comprehensive test coverage with realistic scenarios
  • Follows project conventions and coding standards
  • Has minimal performance impact
  • Introduces no security risks

Recommendations:

  1. ✅ Ready to Merge: This PR successfully addresses the ClickHouse identifier parsing issue with a clean, focused implementation.

  2. Optional Future Enhancement: Consider adding negative test cases to verify other dialects still behave as expected, but this is not blocking.

  3. Performance: No concerns - the changes are minimal and efficient.

Final Verdict:

APPROVE - This is a well-implemented fix that solves the stated problem cleanly and efficiently.


Job Run

Extend the non-reserved-keyword allowlist to also accept ROWS as a column
identifier. ClickHouse exposes a `rows` column in system.tables and other
system tables; ROWS is otherwise reserved for window-frame syntax
(ROWS BETWEEN ...) but is unambiguous in expression position.

Part of #482 (ClickHouse parser gaps from QA sweep).
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Claude finished @ajitpratap0's task —— View job


Code Review Complete ✅

Reviewing PR #481: ClickHouse table identifier fix

Todo List:

  • Read repository guidelines and setup
  • Review core parser changes in parser.go
  • Review expression parsing changes in expressions_literal.go
  • Review new test file clickhouse_table_identifier_test.go
  • Check for security concerns
  • Analyze performance implications
  • Assess test coverage completeness
  • Provide final recommendations

Code Quality Assessment

Strengths:

  1. Targeted and Focused Fix: This PR successfully addresses issue [BUG] multiple errors in ClickHouse dialect #480 with minimal, surgical changes:

    • Added TokenTypePartition to isNonReservedKeyword() function at parser.go:962
    • Added "TABLES" keyword to the value-based check at parser.go:972
    • Extended dialect support from SQL Server-only to include ClickHouse at expressions_literal.go:106
  2. Excellent Test Coverage:

    • New test file clickhouse_table_identifier_test.go covers all 5 real-world ClickHouse queries from issue [BUG] multiple errors in ClickHouse dialect #480
    • Tests use proper ParseWithDialect(query, keywords.DialectClickHouse) with correct dialect specification
    • Follows project's safe type assertion patterns as recommended in CLAUDE.md:135-142
    • Test cases represent realistic ClickHouse system table queries (system.replicas, system.tables, system.parts)
  3. Clean Implementation:

    • Proper error handling throughout the test cases
    • Clear, descriptive test names and comments explaining the purpose
    • Follows existing code patterns and conventions

📊 Technical Analysis:

Changes Made:

  1. parser.go:962: Added TokenTypePartition to the token type switch case for non-reserved keywords
  2. parser.go:972: Added "TABLES" to the string value switch case for keywords with generic TokenTypeKeyword
  3. expressions_literal.go:106: Extended the dialect condition to include ClickHouse alongside SQL Server for non-reserved keyword usage
  4. New test file: Comprehensive test coverage with 5 realistic ClickHouse queries

Scope Assessment:

  • ✅ Focused: Unlike previous iterations, this PR is properly scoped to only the ClickHouse identifier issue
  • ✅ No Scope Creep: No unrelated functionality mixed in
  • ✅ Clean: Only the minimal necessary changes to solve the stated problem

🔒 Security Assessment: ✅ CLEAN

  • No SQL injection vulnerabilities introduced
  • No exposure of sensitive data or secrets
  • Input validation remains intact via existing tokenizer bounds
  • Changes only affect parsing logic, not execution or data handling

Performance Assessment: ✅ MINIMAL IMPACT

  • Added one case to isNonReservedKeyword() switch - O(1) lookup with minimal overhead
  • One additional string case for "TABLES" keyword - negligible impact
  • No impact on the core zero-copy tokenization or object pooling systems
  • Changes are in the parsing path, not the hot tokenization path

📝 Code Quality: ✅ EXCELLENT

  • Readability: Clear comments explaining the purpose (parser.go:963-965)
  • Maintainability: Changes are localized and easy to understand
  • Consistency: Follows existing patterns in the codebase
  • Documentation: Good inline comments explaining ClickHouse system table usage

🎯 Test Coverage: ✅ COMPREHENSIVE

  • Positive Cases: All 5 problematic queries from issue [BUG] multiple errors in ClickHouse dialect #480 are covered
  • Realistic Scenarios: Tests use actual ClickHouse system table patterns
  • Proper Setup: Uses correct dialect specification
  • Edge Cases: Covers different contexts where identifiers appear (SELECT, GROUP BY, function args, etc.)

⚠️ Minor Considerations:

  1. Cross-Dialect Impact: The "TABLES" keyword is now treated as non-reserved across ALL dialects when it has TokenTypeKeyword. This is likely acceptable since it's only checked after dialect-specific type checks, but worth noting.

  2. Missing Negative Tests: Could benefit from tests ensuring other SQL dialects still handle these keywords appropriately, but not critical for this fix.


Overall Assessment: ✅ EXCELLENT

This PR is a significant improvement over previous versions. It:

  • Solves the exact problem stated in issue [BUG] multiple errors in ClickHouse dialect #480
  • Maintains clean scope without unrelated features
  • Provides comprehensive test coverage with realistic scenarios
  • Follows project conventions and coding standards
  • Has minimal performance impact
  • Introduces no security risks

Recommendations:

  1. ✅ Ready to Merge: This PR successfully addresses the ClickHouse identifier parsing issue with a clean, focused implementation.

  2. Optional Future Enhancement: Consider adding negative test cases to verify other dialects still behave as expected, but this is not blocking.

  3. Performance: No concerns - the changes are minimal and efficient.

Final Verdict:

APPROVE - This is a well-implemented fix that solves the stated problem cleanly and efficiently.


Job Run

@ajitpratap0 ajitpratap0 merged commit c50661f into main Apr 7, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ast AST node changes parser tokenizer Tokenizer / lexer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] multiple errors in ClickHouse dialect

1 participant