Skip to content

fix(clickhouse): correctly transpile arrayMap and arrayFilter higher-order functions [CLAUDE]#7794

Open
gaoflow wants to merge 2 commits into
tobymao:mainfrom
gaoflow:fix/clickhouse-arraymap-arrayfilter-transpile
Open

fix(clickhouse): correctly transpile arrayMap and arrayFilter higher-order functions [CLAUDE]#7794
gaoflow wants to merge 2 commits into
tobymao:mainfrom
gaoflow:fix/clickhouse-arraymap-arrayfilter-transpile

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Problem

ClickHouse's higher-order array functions arrayMap(lambda, arr) and
arrayFilter(lambda, arr) were silently becoming Anonymous expressions
during parsing because the ClickHouse parser lacked ARRAYMAP / ARRAYFILTER
entries in its FUNCTIONS dict. Consequently:

# Before
sqlglot.transpile("SELECT arrayMap(x -> x + 1, arr) FROM t",
                  read="clickhouse", write="duckdb")
# → "SELECT ARRAYMAP(x -> x + 1, arr) FROM t"   ← invalid DuckDB

sqlglot.transpile("SELECT arrayFilter(x -> x > 0, arr) FROM t",
                  read="clickhouse", write="duckdb")
# → "SELECT ARRAYFILTER(x -> x > 0, arr) FROM t"  ← invalid DuckDB

Additionally, the ClickHouse generator had no exp.Transform entry, so
LIST_TRANSFORM (DuckDB) and TRANSFORM (Spark/Hive) arriving at the
ClickHouse writer produced TRANSFORM(arr, lambda) instead of
arrayMap(lambda, arr).

Fix

  • Parser (sqlglot/parsers/clickhouse.py): add "ARRAYMAP" and
    "ARRAYFILTER" to the ClickHouse FUNCTIONS mapping. Both builders swap
    the argument order since ClickHouse puts the lambda first, opposite to the
    canonical exp.Transform(this=arr, expression=lambda) convention.

  • Generator (sqlglot/generators/clickhouse.py): add
    exp.Transform → arrayMap(lambda, arr) to the ClickHouse TRANSFORMS dict,
    matching the reversed argument order used by exp.ArrayFilter already.

After

sqlglot.transpile("SELECT arrayMap(x -> x + 1, arr) FROM t",
                  read="clickhouse", write="duckdb")
# → "SELECT LIST_TRANSFORM(arr, x -> x + 1) FROM t"  ✓

sqlglot.transpile("SELECT arrayFilter(x -> x > 0, arr) FROM t",
                  read="clickhouse", write="duckdb")
# → "SELECT LIST_FILTER(arr, x -> x > 0) FROM t"  ✓

sqlglot.transpile("SELECT LIST_TRANSFORM(arr, x -> x + 1) FROM t",
                  read="duckdb", write="clickhouse")
# → "SELECT arrayMap(x -> x + 1, arr) FROM t"  ✓

sqlglot.transpile("SELECT arrayMap(x -> x + 1, arr) FROM t",
                  read="clickhouse", write="clickhouse")
# → "SELECT arrayMap(x -> x + 1, arr) FROM t"  ✓ (round-trip)

All 1127 existing tests continue to pass.


This pull request was prepared with the assistance of AI, under my direction and review.

…order functions [CLAUDE]

ClickHouse's higher-order array functions arrayMap(lambda, arr) and
arrayFilter(lambda, arr) were being parsed as Anonymous expressions because
the ClickHouse parser lacked entries for their uppercased forms (ARRAYMAP,
ARRAYFILTER). As a result, transpiling to DuckDB produced invalid ARRAYMAP()
and ARRAYFILTER() calls instead of LIST_TRANSFORM() and LIST_FILTER().

Additionally, the ClickHouse generator had no Transform entry, so DuckDB's
LIST_TRANSFORM and Spark's TRANSFORM arrived in ClickHouse as the wrong name.

Note: argument order in ClickHouse is reversed relative to the canonical
exp.Transform/exp.ArrayFilter convention (lambda first, array second).
@geooo109 geooo109 self-assigned this Jun 25, 2026

@geooo109 geooo109 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I left some comments.

My suggestion is the following:

Remove the current validate_all tests of this PR, and only use vaildate_identity tests for testing the roundtrip (input clickhouse -> output clickhouse) of the added functions for example self.validate_identity("arrayFilter(x -> x > 0, arr)").assert_is(exp.ArrayFilter). This would make the PR simple and mergable.

Then, if you want you can investigate various inputs - outputs for the transpilation, as you saw the NULL creates semantic issues for the transpilation. After, detecting all this edge cases let's make a following PR that contains the robust transpilation.

Comment on lines +1772 to +1782
"SELECT arrayMap(x -> x + 1, arr) FROM t",
read={
"duckdb": "SELECT LIST_TRANSFORM(arr, x -> x + 1) FROM t",
"spark": "SELECT TRANSFORM(arr, x -> x + 1) FROM t",
},
write={
"clickhouse": "SELECT arrayMap(x -> x + 1, arr) FROM t",
"duckdb": "SELECT LIST_TRANSFORM(arr, x -> x + 1) FROM t",
"spark": "SELECT TRANSFORM(arr, x -> x + 1) FROM t",
},
)

@geooo109 geooo109 Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are cases that the transpilation results into different semantics.

clickhouse (input):
SELECT arrayMap(NULL, [1, 2]) AS res

Query id: 0db4a81e-03b1-4253-8593-6a73112a447f

   ┌─res──┐
1. │ ᴺᵁᴸᴸ │
   └──────┘

1 row in set. Elapsed: 0.002 sec. 

transpiled duckdb (output):
memory D SELECT LIST_TRANSFORM([1,2], NULL) AS res;
Binder Error:
Invalid lambda expression!

transpiled spark 4 (output):
spark-sql (default)> SELECT TRANSFORM(array(1,2), NULL);
[null,null]
Time taken: 0.051 seconds, Fetched 1 row(s)

Comment on lines +1783 to +1792
self.validate_all(
"SELECT arrayFilter(x -> x > 0, arr) FROM t",
read={
"duckdb": "SELECT LIST_FILTER(arr, x -> x > 0) FROM t",
},
write={
"clickhouse": "SELECT arrayFilter(x -> x > 0, arr) FROM t",
"duckdb": "SELECT LIST_FILTER(arr, x -> x > 0) FROM t",
},
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as https://github.com/tobymao/sqlglot/pull/7794/changes#r3473795554 :

clickhouse (input):
SELECT arrayFilter(NULL, [1, 2, 3])

Query id: 2b70a67e-8db5-47e5-b60a-02551e878843

   ┌─arrayFilter(NULL, [1, 2, 3])─┐
1. │ ᴺᵁᴸᴸ                         │
   └──────────────────────────────┘

1 row in set. Elapsed: 0.002 sec. 

transpiled duckdb(output): 
memory D SELECT LIST_FILTER([1, 2, 3], NULL);
Binder Error:
Invalid lambda expression!

Comment on lines +265 to +266
# ClickHouse higher-order array functions: the lambda comes first, the array second.
# This is the opposite of exp.ArrayFilter(this=array, expression=lambda) convention.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the comments here, to simplify the code.

ruff-format requires the multi-arg lambda to fit on one line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants