fix(clickhouse): correctly transpile arrayMap and arrayFilter higher-order functions [CLAUDE]#7794
Conversation
…order functions [CLAUDE] ClickHouse's higher-order array functions arrayMap(lambda, arr) and arrayFilter(lambda, arr) were being parsed as Anonymous expressions because the ClickHouse parser lacked entries for their uppercased forms (ARRAYMAP, ARRAYFILTER). As a result, transpiling to DuckDB produced invalid ARRAYMAP() and ARRAYFILTER() calls instead of LIST_TRANSFORM() and LIST_FILTER(). Additionally, the ClickHouse generator had no Transform entry, so DuckDB's LIST_TRANSFORM and Spark's TRANSFORM arrived in ClickHouse as the wrong name. Note: argument order in ClickHouse is reversed relative to the canonical exp.Transform/exp.ArrayFilter convention (lambda first, array second).
There was a problem hiding this comment.
Nice work, I left some comments.
My suggestion is the following:
Remove the current validate_all tests of this PR, and only use vaildate_identity tests for testing the roundtrip (input clickhouse -> output clickhouse) of the added functions for example self.validate_identity("arrayFilter(x -> x > 0, arr)").assert_is(exp.ArrayFilter). This would make the PR simple and mergable.
Then, if you want you can investigate various inputs - outputs for the transpilation, as you saw the NULL creates semantic issues for the transpilation. After, detecting all this edge cases let's make a following PR that contains the robust transpilation.
| "SELECT arrayMap(x -> x + 1, arr) FROM t", | ||
| read={ | ||
| "duckdb": "SELECT LIST_TRANSFORM(arr, x -> x + 1) FROM t", | ||
| "spark": "SELECT TRANSFORM(arr, x -> x + 1) FROM t", | ||
| }, | ||
| write={ | ||
| "clickhouse": "SELECT arrayMap(x -> x + 1, arr) FROM t", | ||
| "duckdb": "SELECT LIST_TRANSFORM(arr, x -> x + 1) FROM t", | ||
| "spark": "SELECT TRANSFORM(arr, x -> x + 1) FROM t", | ||
| }, | ||
| ) |
There was a problem hiding this comment.
There are cases that the transpilation results into different semantics.
clickhouse (input):
SELECT arrayMap(NULL, [1, 2]) AS res
Query id: 0db4a81e-03b1-4253-8593-6a73112a447f
┌─res──┐
1. │ ᴺᵁᴸᴸ │
└──────┘
1 row in set. Elapsed: 0.002 sec.
transpiled duckdb (output):
memory D SELECT LIST_TRANSFORM([1,2], NULL) AS res;
Binder Error:
Invalid lambda expression!
transpiled spark 4 (output):
spark-sql (default)> SELECT TRANSFORM(array(1,2), NULL);
[null,null]
Time taken: 0.051 seconds, Fetched 1 row(s)
| self.validate_all( | ||
| "SELECT arrayFilter(x -> x > 0, arr) FROM t", | ||
| read={ | ||
| "duckdb": "SELECT LIST_FILTER(arr, x -> x > 0) FROM t", | ||
| }, | ||
| write={ | ||
| "clickhouse": "SELECT arrayFilter(x -> x > 0, arr) FROM t", | ||
| "duckdb": "SELECT LIST_FILTER(arr, x -> x > 0) FROM t", | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Same as https://github.com/tobymao/sqlglot/pull/7794/changes#r3473795554 :
clickhouse (input):
SELECT arrayFilter(NULL, [1, 2, 3])
Query id: 2b70a67e-8db5-47e5-b60a-02551e878843
┌─arrayFilter(NULL, [1, 2, 3])─┐
1. │ ᴺᵁᴸᴸ │
└──────────────────────────────┘
1 row in set. Elapsed: 0.002 sec.
transpiled duckdb(output):
memory D SELECT LIST_FILTER([1, 2, 3], NULL);
Binder Error:
Invalid lambda expression!
| # ClickHouse higher-order array functions: the lambda comes first, the array second. | ||
| # This is the opposite of exp.ArrayFilter(this=array, expression=lambda) convention. |
There was a problem hiding this comment.
Let's remove the comments here, to simplify the code.
ruff-format requires the multi-arg lambda to fit on one line.
Problem
ClickHouse's higher-order array functions
arrayMap(lambda, arr)andarrayFilter(lambda, arr)were silently becomingAnonymousexpressionsduring parsing because the ClickHouse parser lacked
ARRAYMAP/ARRAYFILTERentries in its
FUNCTIONSdict. Consequently:Additionally, the ClickHouse generator had no
exp.Transformentry, soLIST_TRANSFORM(DuckDB) andTRANSFORM(Spark/Hive) arriving at theClickHouse writer produced
TRANSFORM(arr, lambda)instead ofarrayMap(lambda, arr).Fix
Parser (
sqlglot/parsers/clickhouse.py): add"ARRAYMAP"and"ARRAYFILTER"to the ClickHouseFUNCTIONSmapping. Both builders swapthe argument order since ClickHouse puts the lambda first, opposite to the
canonical
exp.Transform(this=arr, expression=lambda)convention.Generator (
sqlglot/generators/clickhouse.py): addexp.Transform → arrayMap(lambda, arr)to the ClickHouseTRANSFORMSdict,matching the reversed argument order used by
exp.ArrayFilteralready.After
All 1127 existing tests continue to pass.
This pull request was prepared with the assistance of AI, under my direction and review.