Skip to content

feat(parser): ClickHouse nested column types + engine/order clauses (#482)#488

Merged
ajitpratap0 merged 4 commits intomainfrom
feat/clickhouse-nested-types-482
Apr 7, 2026
Merged

feat(parser): ClickHouse nested column types + engine/order clauses (#482)#488
ajitpratap0 merged 4 commits intomainfrom
feat/clickhouse-nested-types-482

Conversation

@ajitpratap0
Copy link
Copy Markdown
Owner

Summary

Largest single ClickHouse parser fix from the QA sweep — 12 of the 39 CH failures collapsed to a rigid type-args parser. This PR replaces it and adds the surrounding CREATE TABLE clauses needed to actually exercise it.

What now parses

  • Nested/parameterised column types: `Array(Nullable(String))`, `Map(String, Array(UInt32))`, `LowCardinality(String)`, `FixedString(16)`, `DateTime64(3, 'UTC')`, `Decimal(38, 18)`, `Tuple(a UInt8, b String)`.
  • Engine clauses with arguments: `ENGINE = MergeTree()`, `ENGINE = ReplicatedMergeTree('/path', '{replica}')`, `ENGINE = Distributed('cluster', 'db', 'local_t', sharding_key)`.
  • ClickHouse `CREATE TABLE` trailing clauses: `ORDER BY`, `PARTITION BY`, `PRIMARY KEY`, `SAMPLE BY` (parsed permissively, not yet modeled on the AST).

Implementation

  • New `parseTypeArgsString` in `ddl_columns.go` — depth-tracking token collector that round-trips a balanced `(...)` block as a string. Replaces the previous flat 1–2-parameter parser.
  • Engine option values can now carry their own `(...)` block, parsed via the same helper.
  • New `skipClickHouseClauseExpr` in `ddl.go` consumes the expression after `ORDER BY` / `PARTITION BY` / `PRIMARY KEY` / `SAMPLE BY`. Bare or parenthesised; stops at the next clause / EOF / semicolon.

Test plan

  • `TestClickHouseNestedColumnTypes`: 10 shapes covering all the QA failures
  • `go test -race ./pkg/sql/parser/ ./pkg/sql/ast/ ./pkg/sql/keywords/ ./pkg/gosqlx/` green

Known follow-ups (out of scope)

  • AST modeling for the trailing clauses (currently consumed but discarded)
  • `SETTINGS k = v` after engine — separate PR
  • `TTL` clause — separate PR
  • `CODEC(...)` column option — separate PR

Part of #482.

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gosqlx-website Ready Ready Preview, Comment Apr 7, 2026 8:57am

@github-actions github-actions bot added the parser label Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Claude finished @ajitpratap0's task —— View job


Code Review Complete ✅

Reviewing PR #482 for ClickHouse nested column types and engine clauses parsing.

Review Checklist

  • Read and analyze changed files
  • Review code quality and implementation approach
  • Check for potential bugs or edge cases
  • Evaluate performance implications
  • Assess security considerations
  • Review test coverage and quality
  • Provide comprehensive feedback

✅ Positive Aspects

Clean Architecture & Implementation

  • Excellent separation of concerns: New parseTypeArgsString function in ddl_columns.go:470-538 is well-contained and reusable
  • Consistent with existing patterns: Follows the parser's recursive descent approach and error handling conventions
  • Proper depth tracking: The balanced parentheses parsing with depth counter prevents infinite loops and malformed input

Robust Type Arguments Parser

  • Well-designed token collection: parseTypeArgsString handles complex nested types like Array(Nullable(String)) and Map(String, Array(UInt32))
  • Smart whitespace handling: Lines 518-522 insert spaces between adjacent identifiers to preserve Tuple(a UInt8, b String) formatting
  • Proper string literal handling: Lines 525-530 correctly quote string literals while preserving other tokens

Good Test Coverage

  • Comprehensive test cases: 10 test scenarios cover all major ClickHouse type patterns mentioned in the issue
  • Real-world examples: Tests include practical cases like DateTime64(3, 'UTC'), Decimal(38, 18), and engine configurations
  • Uses proper dialect: Correctly uses keywords.DialectClickHouse in tests

⚠️ Areas for Improvement

Error Handling & Edge Cases

1. Empty type arguments handling (ddl_columns.go:514-516)

if val == "" {
    return "", p.expectedError("type argument")
}
  • Consider more specific error messages for better debugging
  • Could distinguish between unexpected tokens vs. empty values

2. Missing validation in skipClickHouseClauseExpr (ddl.go:620-659)

  • Function doesn't validate expression syntax, only skips tokens
  • May silently accept malformed ClickHouse clauses
  • Consider adding basic validation for common mistakes

Performance Considerations

3. String concatenation in parseTypeArgsString (ddl_columns.go:480-537)

var buf strings.Builder  // ✅ Good use of strings.Builder
  • Good: Uses strings.Builder for efficient string building
  • Minor: Could pre-allocate buffer size for very large type definitions

4. Repeated string operations in skipClickHouseClauseExpr (ddl.go:653)

val := strings.ToUpper(p.currentToken.Token.Value)
  • Called in loop without caching - could be optimized for performance-critical paths
  • Consider case-insensitive comparison alternatives

Code Organization

5. Magic string literals (ddl.go:654)

if val == "PARTITION" || val == "SAMPLE" || val == "SETTINGS" || val == "TTL" {
  • Consider extracting ClickHouse clause keywords to constants
  • Would improve maintainability and reduce typo risk

Test Coverage Gaps

6. Missing edge case tests

  • No tests for malformed nested types (e.g., Array(, Map(String))
  • No tests for empty engine arguments ENGINE = MergeTree()
  • No tests for mixed quote types in type arguments

🔒 Security Assessment

✅ No significant security concerns identified

  • Input validation relies on tokenizer (appropriate for parser layer)
  • No SQL injection vectors in type argument parsing
  • Balanced parentheses parsing prevents buffer overflows

📊 Performance Assessment

✅ Generally efficient implementation

  • Minimal memory allocations during parsing
  • Single-pass token consumption
  • Efficient string building with strings.Builder

Minor optimization opportunities:

  • Consider token lookahead caching for repeated clause detection
  • Pre-allocate string builder capacity for large type definitions

🧪 Test Quality

✅ Strong test foundation

  • Covers all major use cases from the issue description
  • Uses real-world ClickHouse syntax patterns
  • Proper error checking with gosqlx.ParseWithDialect

Suggested additions:

// Add these test cases for better coverage
"empty_engine_args": `CREATE TABLE t (id UInt64) ENGINE = MergeTree()`,
"malformed_nested": `CREATE TABLE t (bad Array() )`, // should fail gracefully
"complex_tuple": `CREATE TABLE t (data Tuple(id UInt64, name String, tags Array(String)))`,

📝 Minor Code Quality Issues

7. Comment consistency (ddl_columns.go:78-82)

  • Excellent documentation of supported types
  • Consider adding examples of malformed input that should fail

8. Error message specificity

  • Some generic "expected" messages could be more descriptive
  • p.expectedError(") to close type arguments") at ddl_columns.go:488 is good
  • Could expand similar patterns elsewhere

🎯 Overall Assessment

Score: 8.5/10 - This is a high-quality implementation that significantly improves ClickHouse parser coverage. The code is well-structured, follows existing patterns, and includes comprehensive tests.

Key Strengths:

  • ✅ Solves the core problem elegantly with reusable components
  • ✅ Maintains parser performance and memory efficiency
  • ✅ Comprehensive test coverage for the happy path
  • ✅ Clear documentation and comments

Recommended Actions:

  1. High Priority: Add error case tests for malformed input
  2. Medium Priority: Extract ClickHouse clause keywords to constants
  3. Low Priority: Consider error message improvements for better debugging

This PR successfully addresses the 12 ClickHouse parsing failures mentioned and provides a solid foundation for future ClickHouse dialect improvements. The implementation is production-ready as-is, with the suggested improvements being optimizations rather than blockers.


@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Claude finished @ajitpratap0's task —— View job


Code Review Complete ✅

Reviewing PR #488 for ClickHouse nested column types and engine clauses parsing.

Review Checklist

  • Read and analyze changed files
  • Review code quality and implementation approach
  • Check for potential bugs or edge cases
  • Evaluate performance implications
  • Assess security considerations
  • Review test coverage and quality
  • Provide comprehensive feedback

✅ Positive Aspects

Clean Architecture & Implementation

  • Excellent separation of concerns: New parseTypeArgsString function in ddl_columns.go:470-538 is well-contained and reusable
  • Consistent with existing patterns: Follows the parser's recursive descent approach and error handling conventions
  • Proper depth tracking: The balanced parentheses parsing with depth counter prevents infinite loops and malformed input

Robust Type Arguments Parser

  • Well-designed token collection: parseTypeArgsString handles complex nested types like Array(Nullable(String)) and Map(String, Array(UInt32))
  • Smart whitespace handling: Lines 518-522 insert spaces between adjacent identifiers to preserve Tuple(a UInt8, b String) formatting
  • Proper string literal handling: Lines 525-530 correctly quote string literals while preserving other tokens

Good Test Coverage

  • Comprehensive test cases: 10 test scenarios cover all major ClickHouse type patterns mentioned in the issue
  • Real-world examples: Tests include practical cases like DateTime64(3, 'UTC'), Decimal(38, 18), and engine configurations
  • Uses proper dialect: Correctly uses keywords.DialectClickHouse in tests

⚠️ Areas for Improvement

Error Handling & Edge Cases

1. Empty type arguments handling (ddl_columns.go:514-516)

if val == "" {
    return "", p.expectedError("type argument")
}
  • Consider more specific error messages for better debugging
  • Could distinguish between unexpected tokens vs. empty values

2. Missing validation in skipClickHouseClauseExpr (ddl.go:620-659)

  • Function doesn't validate expression syntax, only skips tokens
  • May silently accept malformed ClickHouse clauses
  • Consider adding basic validation for common mistakes

Performance Considerations

3. String concatenation in parseTypeArgsString (ddl_columns.go:480-537)

var buf strings.Builder  // ✅ Good use of strings.Builder
  • Good: Uses strings.Builder for efficient string building
  • Minor: Could pre-allocate buffer size for very large type definitions

4. Repeated string operations in skipClickHouseClauseExpr (ddl.go:653)

val := strings.ToUpper(p.currentToken.Token.Value)
  • Called in loop without caching - could be optimized for performance-critical paths
  • Consider case-insensitive comparison alternatives

Code Organization

5. Magic string literals (ddl.go:654)

if val == "PARTITION" || val == "SAMPLE" || val == "SETTINGS" || val == "TTL" {
  • Consider extracting ClickHouse clause keywords to constants
  • Would improve maintainability and reduce typo risk

Test Coverage Gaps

6. Missing edge case tests

  • No tests for malformed nested types (e.g., Array(, Map(String))
  • No tests for empty engine arguments ENGINE = MergeTree()
  • No tests for mixed quote types in type arguments

🔒 Security Assessment

✅ No significant security concerns identified

  • Input validation relies on tokenizer (appropriate for parser layer)
  • No SQL injection vectors in type argument parsing
  • Balanced parentheses parsing prevents buffer overflows

📊 Performance Assessment

✅ Generally efficient implementation

  • Minimal memory allocations during parsing
  • Single-pass token consumption
  • Efficient string building with strings.Builder

Minor optimization opportunities:

  • Consider token lookahead caching for repeated clause detection
  • Pre-allocate string builder capacity for large type definitions

🧪 Test Quality

✅ Strong test foundation

  • Covers all major use cases from the issue description
  • Uses real-world ClickHouse syntax patterns
  • Proper error checking with gosqlx.ParseWithDialect

Suggested additions:

// Add these test cases for better coverage
"empty_engine_args": `CREATE TABLE t (id UInt64) ENGINE = MergeTree()`,
"malformed_nested": `CREATE TABLE t (bad Array() )`, // should fail gracefully
"complex_tuple": `CREATE TABLE t (data Tuple(id UInt64, name String, tags Array(String)))`,

📝 Minor Code Quality Issues

7. Comment consistency (ddl_columns.go:78-82)

  • Excellent documentation of supported types
  • Consider adding examples of malformed input that should fail

8. Error message specificity

  • Some generic "expected" messages could be more descriptive
  • p.expectedError(") to close type arguments") at ddl_columns.go:488 is good
  • Could expand similar patterns elsewhere

🎯 Overall Assessment

Score: 8.5/10 - This is a high-quality implementation that significantly improves ClickHouse parser coverage. The code is well-structured, follows existing patterns, and includes comprehensive tests.

Key Strengths:

  • ✅ Solves the core problem elegantly with reusable components
  • ✅ Maintains parser performance and memory efficiency
  • ✅ Comprehensive test coverage for the happy path
  • ✅ Clear documentation and comments

Recommended Actions:

  1. High Priority: Add error case tests for malformed input
  2. Medium Priority: Extract ClickHouse clause keywords to constants
  3. Low Priority: Consider error message improvements for better debugging

This PR successfully addresses the 12 ClickHouse parsing failures mentioned and provides a solid foundation for future ClickHouse dialect improvements. The implementation is production-ready as-is, with the suggested improvements being optimizations rather than blockers.


@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Claude finished @ajitpratap0's task —— View job


Code Review Complete ✅

Reviewing PR #482 for ClickHouse nested column types and engine clauses parsing.

Review Checklist

  • Read and analyze changed files
  • Review code quality and implementation approach
  • Check for potential bugs or edge cases
  • Evaluate performance implications
  • Assess security considerations
  • Review test coverage and quality
  • Provide comprehensive feedback

✅ Positive Aspects

Clean Architecture & Implementation

  • Excellent separation of concerns: New parseTypeArgsString function in ddl_columns.go:470-538 is well-contained and reusable
  • Consistent with existing patterns: Follows the parser's recursive descent approach and error handling conventions
  • Proper depth tracking: The balanced parentheses parsing with depth counter prevents infinite loops and malformed input

Robust Type Arguments Parser

  • Well-designed token collection: parseTypeArgsString handles complex nested types like Array(Nullable(String)) and Map(String, Array(UInt32))
  • Smart whitespace handling: Lines 518-522 insert spaces between adjacent identifiers to preserve Tuple(a UInt8, b String) formatting
  • Proper string literal handling: Lines 525-530 correctly quote string literals while preserving other tokens

Good Test Coverage

  • Comprehensive test cases: 10 test scenarios cover all major ClickHouse type patterns mentioned in the issue
  • Real-world examples: Tests include practical cases like DateTime64(3, 'UTC'), Decimal(38, 18), and engine configurations
  • Uses proper dialect: Correctly uses keywords.DialectClickHouse in tests

⚠️ Areas for Improvement

Error Handling & Edge Cases

1. Empty type arguments handling (ddl_columns.go:514-516)

if val == "" {
    return "", p.expectedError("type argument")
}
  • Consider more specific error messages for better debugging
  • Could distinguish between unexpected tokens vs. empty values

2. Missing validation in skipClickHouseClauseExpr (ddl.go:620-659)

  • Function doesn't validate expression syntax, only skips tokens
  • May silently accept malformed ClickHouse clauses
  • Consider adding basic validation for common mistakes

Performance Considerations

3. String concatenation in parseTypeArgsString (ddl_columns.go:480-537)

var buf strings.Builder  // ✅ Good use of strings.Builder
  • Good: Uses strings.Builder for efficient string building
  • Minor: Could pre-allocate buffer size for very large type definitions

4. Repeated string operations in skipClickHouseClauseExpr (ddl.go:653)

val := strings.ToUpper(p.currentToken.Token.Value)
  • Called in loop without caching - could be optimized for performance-critical paths
  • Consider case-insensitive comparison alternatives

Code Organization

5. Magic string literals (ddl.go:654)

if val == "PARTITION" || val == "SAMPLE" || val == "SETTINGS" || val == "TTL" {
  • Consider extracting ClickHouse clause keywords to constants
  • Would improve maintainability and reduce typo risk

Test Coverage Gaps

6. Missing edge case tests

  • No tests for malformed nested types (e.g., Array(, Map(String))
  • No tests for empty engine arguments ENGINE = MergeTree()
  • No tests for mixed quote types in type arguments

🔒 Security Assessment

✅ No significant security concerns identified

  • Input validation relies on tokenizer (appropriate for parser layer)
  • No SQL injection vectors in type argument parsing
  • Balanced parentheses parsing prevents buffer overflows

📊 Performance Assessment

✅ Generally efficient implementation

  • Minimal memory allocations during parsing
  • Single-pass token consumption
  • Efficient string building with strings.Builder

Minor optimization opportunities:

  • Consider token lookahead caching for repeated clause detection
  • Pre-allocate string builder capacity for large type definitions

🧪 Test Quality

✅ Strong test foundation

  • Covers all major use cases from the issue description
  • Uses real-world ClickHouse syntax patterns
  • Proper error checking with gosqlx.ParseWithDialect

Suggested additions:

// Add these test cases for better coverage
"empty_engine_args": `CREATE TABLE t (id UInt64) ENGINE = MergeTree()`,
"malformed_nested": `CREATE TABLE t (bad Array() )`, // should fail gracefully
"complex_tuple": `CREATE TABLE t (data Tuple(id UInt64, name String, tags Array(String)))`,

📝 Minor Code Quality Issues

7. Comment consistency (ddl_columns.go:78-82)

  • Excellent documentation of supported types
  • Consider adding examples of malformed input that should fail

8. Error message specificity

  • Some generic "expected" messages could be more descriptive
  • p.expectedError(") to close type arguments") at ddl_columns.go:488 is good
  • Could expand similar patterns elsewhere

🎯 Overall Assessment

Score: 8.5/10 - This is a high-quality implementation that significantly improves ClickHouse parser coverage. The code is well-structured, follows existing patterns, and includes comprehensive tests.

Key Strengths:

  • ✅ Solves the core problem elegantly with reusable components
  • ✅ Maintains parser performance and memory efficiency
  • ✅ Comprehensive test coverage for the happy path
  • ✅ Clear documentation and comments

Recommended Actions:

  1. High Priority: Add error case tests for malformed input
  2. Medium Priority: Extract ClickHouse clause keywords to constants
  3. Low Priority: Consider error message improvements for better debugging

This PR successfully addresses the 12 ClickHouse parsing failures mentioned and provides a solid foundation for future ClickHouse dialect improvements. The implementation is production-ready as-is, with the suggested improvements being optimizations rather than blockers.


@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Claude finished @ajitpratap0's task —— View job


Code Review Complete ✅

Reviewing PR #482 for ClickHouse nested column types and engine clauses parsing.

Review Checklist

  • Read and analyze changed files
  • Review code quality and implementation approach
  • Check for potential bugs or edge cases
  • Evaluate performance implications
  • Assess security considerations
  • Review test coverage and quality
  • Provide comprehensive feedback

✅ Positive Aspects

Clean Architecture & Implementation

  • Excellent separation of concerns: New parseTypeArgsString function in ddl_columns.go:470-538 is well-contained and reusable
  • Consistent with existing patterns: Follows the parser's recursive descent approach and error handling conventions
  • Proper depth tracking: The balanced parentheses parsing with depth counter prevents infinite loops and malformed input

Robust Type Arguments Parser

  • Well-designed token collection: parseTypeArgsString handles complex nested types like Array(Nullable(String)) and Map(String, Array(UInt32))
  • Smart whitespace handling: Lines 518-522 insert spaces between adjacent identifiers to preserve Tuple(a UInt8, b String) formatting
  • Proper string literal handling: Lines 525-530 correctly quote string literals while preserving other tokens

Good Test Coverage

  • Comprehensive test cases: 10 test scenarios cover all major ClickHouse type patterns mentioned in the issue
  • Real-world examples: Tests include practical cases like DateTime64(3, 'UTC'), Decimal(38, 18), and engine configurations
  • Uses proper dialect: Correctly uses keywords.DialectClickHouse in tests

⚠️ Areas for Improvement

Error Handling & Edge Cases

1. Empty type arguments handling (ddl_columns.go:514-516)

if val == "" {
    return "", p.expectedError("type argument")
}
  • Consider more specific error messages for better debugging
  • Could distinguish between unexpected tokens vs. empty values

2. Missing validation in skipClickHouseClauseExpr (ddl.go:620-659)

  • Function doesn't validate expression syntax, only skips tokens
  • May silently accept malformed ClickHouse clauses
  • Consider adding basic validation for common mistakes

Performance Considerations

3. String concatenation in parseTypeArgsString (ddl_columns.go:480-537)

var buf strings.Builder  // ✅ Good use of strings.Builder
  • Good: Uses strings.Builder for efficient string building
  • Minor: Could pre-allocate buffer size for very large type definitions

4. Repeated string operations in skipClickHouseClauseExpr (ddl.go:653)

val := strings.ToUpper(p.currentToken.Token.Value)
  • Called in loop without caching - could be optimized for performance-critical paths
  • Consider case-insensitive comparison alternatives

Code Organization

5. Magic string literals (ddl.go:654)

if val == "PARTITION" || val == "SAMPLE" || val == "SETTINGS" || val == "TTL" {
  • Consider extracting ClickHouse clause keywords to constants
  • Would improve maintainability and reduce typo risk

Test Coverage Gaps

6. Missing edge case tests

  • No tests for malformed nested types (e.g., Array(, Map(String))
  • No tests for empty engine arguments ENGINE = MergeTree()
  • No tests for mixed quote types in type arguments

🔒 Security Assessment

✅ No significant security concerns identified

  • Input validation relies on tokenizer (appropriate for parser layer)
  • No SQL injection vectors in type argument parsing
  • Balanced parentheses parsing prevents buffer overflows

📊 Performance Assessment

✅ Generally efficient implementation

  • Minimal memory allocations during parsing
  • Single-pass token consumption
  • Efficient string building with strings.Builder

Minor optimization opportunities:

  • Consider token lookahead caching for repeated clause detection
  • Pre-allocate string builder capacity for large type definitions

🧪 Test Quality

✅ Strong test foundation

  • Covers all major use cases from the issue description
  • Uses real-world ClickHouse syntax patterns
  • Proper error checking with gosqlx.ParseWithDialect

Suggested additions:

// Add these test cases for better coverage
"empty_engine_args": `CREATE TABLE t (id UInt64) ENGINE = MergeTree()`,
"malformed_nested": `CREATE TABLE t (bad Array() )`, // should fail gracefully
"complex_tuple": `CREATE TABLE t (data Tuple(id UInt64, name String, tags Array(String)))`,

📝 Minor Code Quality Issues

7. Comment consistency (ddl_columns.go:78-82)

  • Excellent documentation of supported types
  • Consider adding examples of malformed input that should fail

8. Error message specificity

  • Some generic "expected" messages could be more descriptive
  • p.expectedError(") to close type arguments") at ddl_columns.go:488 is good
  • Could expand similar patterns elsewhere

🎯 Overall Assessment

Score: 8.5/10 - This is a high-quality implementation that significantly improves ClickHouse parser coverage. The code is well-structured, follows existing patterns, and includes comprehensive tests.

Key Strengths:

  • ✅ Solves the core problem elegantly with reusable components
  • ✅ Maintains parser performance and memory efficiency
  • ✅ Comprehensive test coverage for the happy path
  • ✅ Clear documentation and comments

Recommended Actions:

  1. High Priority: Add error case tests for malformed input
  2. Medium Priority: Extract ClickHouse clause keywords to constants
  3. Low Priority: Consider error message improvements for better debugging

This PR successfully addresses the 12 ClickHouse parsing failures mentioned and provides a solid foundation for future ClickHouse dialect improvements. The implementation is production-ready as-is, with the suggested improvements being optimizations rather than blockers.


@ajitpratap0 ajitpratap0 merged commit 4e39433 into main Apr 7, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant