Skip to content

Commit 1f2dedd

Browse files
serprextheory
authored andcommitted
Map array functions to ClickHouse equivalents
Add mappings for most Postgres array functions and operators to push down to ClickHouse equivalents. Mark those without equivalents as unshippable so that Postgres does not try to push them down and trigger errors. Also support pushing down array slices to the ClickHouse `arraySlice()` function. Refactor `classify_builtin_operator()` to work with operator OIDs rather than strings, since the same operator string can be used for different data types and thus would map to different ClickHouse functions. Document the function mappings and add a new doc section for pushing down operators that includes both the array operator mappings introduced here as well as the pushed down regular expression operators added in c047a7b & 3e765a6 and the JSON element extraction operators added in 0b4c03e. While at it, fix some types in the docs. Author: Philip Dubé <philip.dube@clickhouse.com> Reviewed-by: David E. Wheeler <david.wheeler@clickhouse.com>
1 parent 5183850 commit 1f2dedd

7 files changed

Lines changed: 971 additions & 91 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,14 @@ All notable changes to this project will be documented in this file. It uses the
2828
prepending them to the regular expression (e.g., `(?i)foo`).
2929
* Added pushdown for `regexp_split_to_array()` to `splitByRegexp()`,
3030
including flags.
31+
* Added pushdown mappings for array functions: `array_cat`, `array_append`,
32+
`array_remove`, `array_to_string`, `cardinality`, `array_length`,
33+
`array_prepend`, `string_to_array`, `trim_array`, `array_fill`,
34+
`array_reverse`, `array_shuffle`, `array_sample`, `array_sort`.
35+
* Added pushdown for array operators: `@>` (`hasAll`), `<@` (`hasAll`),
36+
`&&` (`hasAny`).
37+
* Array slice syntax (`arr[L:U]`, `arr[:U]`, `arr[L:]`) now pushes down
38+
as `arraySlice()`.
3139

3240
### ⬆️ Dependency Updates
3341

@@ -39,6 +47,10 @@ All notable changes to this project will be documented in this file. It uses the
3947
unable to map a ClickHouse type to a Postgres type.
4048
* Fixed reversal of the arguments passed to the ClickHouse `match()`
4149
function by the mapping from `regexp_like()`.
50+
* `array_dims`, `array_ndims`, `array_lower`, `array_upper`, `array_replace`,
51+
`array_positions`, `array_fill (3-arg)`, `array_sort (3-arg)`, and
52+
`string_to_array(3-arg)` now evaluate locally instead of being pushed to
53+
ClickHouse where they would fail.
4254

4355
[v0.1.11]: https://github.com/ClickHouse/pg_clickhouse/compare/v0.1.10...v0.1.11
4456

doc/pg_clickhouse.md

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -287,13 +287,13 @@ Use [CREATE FOREIGN TABLE] to create a foreign table that can query data from
287287
a ClickHouse database:
288288

289289
```sql
290-
CREATE FOREIGN TABLE uact (
290+
CREATE FOREIGN TABLE acts (
291291
user_id bigint NOT NULL,
292292
page_views int,
293293
duration smallint,
294294
sign smallint
295295
) SERVER taxi_srv OPTIONS(
296-
table_name 'uact',
296+
table_name 'acts',
297297
engine 'CollapsingMergeTree'
298298
);
299299
```
@@ -325,8 +325,6 @@ option:
325325

326326
Example:
327327

328-
(aggregatefunction 'sum')
329-
330328
```sql
331329
CREATE FOREIGN TABLE test (
332330
column1 bigint OPTIONS(AggregateFunction 'uniq'),
@@ -354,14 +352,14 @@ TABLE].
354352
Use [DROP FOREIGN TABLE] to remove a foreign table:
355353

356354
```sql
357-
DROP FOREIGN TABLE uact;
355+
DROP FOREIGN TABLE acts;
358356
```
359357

360358
This command fails if there are any objects that depend on the foreign table.
361359
Use the `CASCADE` clause to drop them, too:
362360

363361
```sql
364-
DROP FOREIGN TABLE uact CASCADE;
362+
DROP FOREIGN TABLE acts CASCADE;
365363
```
366364

367365
## DML SQL Reference
@@ -422,7 +420,7 @@ like any other tables:
422420
try=# SELECT start_at, duration, resource FROM logs WHERE req_id = 4117909262;
423421
start_at | duration | resource
424422
----------------------------+----------+----------------
425-
2025-12-05 15:07:32.944188 | 175 | /widgets/totam
423+
2025-12-05 15:07:32.944188 | 175 | /widgets/totem
426424
(1 row)
427425
```
428426

@@ -922,7 +920,7 @@ SELECT c1, encode(c2::bytea, 'hex'), encode(c3::bytea, 'hex') FROM texts ORDER B
922920

923921
The text columns will be correct:
924922

925-
```pgdsql
923+
```pgsql
926924
927925
c1 | encode | encode
928926
----+----------------------------------------------------------+----------------------------------
@@ -1029,6 +1027,19 @@ maps the following functions:
10291027
* `date_trunc('quarter')`: [toStartOfQuarter](https://clickhouse.com/docs/sql-reference/functions/date-time-functions#toStartOfQuarter)
10301028
* `date_trunc('year')`: [toStartOfYear](https://clickhouse.com/docs/sql-reference/functions/date-time-functions#toStartOfYear)
10311029
* `array_position`: [indexOf](https://clickhouse.com/docs/sql-reference/functions/array-functions#indexOf)
1030+
* `array_cat`: [arrayConcat](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayConcat)
1031+
* `array_append`: [arrayPushBack](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayPushBack)
1032+
* `array_prepend`: [arrayPushFront](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayPushFront)
1033+
* `array_remove`: [arrayRemove](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayRemove)
1034+
* `array_length` & `cardinality`: [length](https://clickhouse.com/docs/sql-reference/functions/array-functions#length)
1035+
* `array_to_string`: [arrayStringConcat](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayStringConcat)
1036+
* `string_to_array`: [splitByString](https://clickhouse.com/docs/sql-reference/functions/splitting-merging-functions#splitByString)
1037+
* `trim_array`: [arrayResize](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayResize)
1038+
* `array_fill`: [arrayWithConstant](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayWithConstant)
1039+
* `array_reverse`: [arrayReverse](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayReverse)
1040+
* `array_shuffle`: [arrayShuffle](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayShuffle)
1041+
* `array_sample`: [arrayRandomSample](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayRandomSample)
1042+
* `array_sort`: [arraySort](https://clickhouse.com/docs/sql-reference/functions/array-functions#arraySort) / [arrayReverseSort](https://clickhouse.com/docs/sql-reference/functions/array-functions#arrayReverseSort)
10321043
* `btrim`: [trimBoth](https://clickhouse.com/docs/sql-reference/functions/string-functions#trimboth)
10331044
* `strpos`: [position](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#position)
10341045
* `regexp_like`: [match](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#match)
@@ -1056,6 +1067,19 @@ maps the following functions:
10561067
* `CURRENT_ROLE`: Passed as value from PostgreSQL function.
10571068
* `SESSION_USER`: Passed as value from PostgreSQL function.
10581069

1070+
### Pushdown Operators
1071+
1072+
* Array slice (`arr[L:U]`): [arraySlice](https://clickhouse.com/docs/sql-reference/functions/array-functions#arraySlice)
1073+
* `@>` (array contains): [hasAll](https://clickhouse.com/docs/sql-reference/functions/array-functions#hasAll)
1074+
* `<@` (array contained by): [hasAll](https://clickhouse.com/docs/sql-reference/functions/array-functions#hasAll)
1075+
* `&&` (array overlap): [hasAny](https://clickhouse.com/docs/sql-reference/functions/array-functions#hasAny)
1076+
* `~` (regexp match): [match](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#match)
1077+
* `!~` (regexp not match): [match](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#match)
1078+
* `~*` (case insensitive regexp no match): [match](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#match)
1079+
* `!~*` (case insensitive regexp not match): [match](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#match)
1080+
* `->` (JSON extract element): [sub-column syntax](https://clickhouse.com/docs/sql-reference/data-types/newjson#reading-json-paths-as-sub-columns)
1081+
* `->>` (JSON extract element as text): [toJSONString](https://clickhouse.com/docs/sql-reference/functions/json-functions#toJSONString) + [sub-column syntax](https://clickhouse.com/docs/sql-reference/data-types/newjson#reading-json-paths-as-sub-columns)
1082+
10591083
### Custom Functions
10601084

10611085
These custom functions created by `pg_clickhouse` provide foreign query

src/custom_types.c

Lines changed: 148 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
#include "access/htup.h"
55
#include "access/htup_details.h"
66
#include "catalog/dependency.h"
7+
#include "catalog/pg_operator_d.h"
78
#include "catalog/pg_proc.h"
89
#include "catalog/pg_type.h"
910
#include "commands/defrem.h"
@@ -50,12 +51,32 @@
5051
#define F_EXTRACT_TEXT_TIMESTAMP 6202
5152
#define F_EXTRACT_TEXT_TIMESTAMPTZ 6203
5253
#define F_EXTRACT_TEXT_DATE 6199
54+
#define F_TRIM_ARRAY 6172
55+
#define F_STRING_TO_ARRAY_TEXT_TEXT 394
56+
#define F_STRING_TO_ARRAY_TEXT_TEXT_TEXT 376
57+
#define F_ARRAY_TO_STRING_ANYARRAY_TEXT 395
58+
#define F_ARRAY_TO_STRING_ANYARRAY_TEXT_TEXT 384
59+
#define F_ARRAY_FILL_ANYELEMENT__INT4 F_ARRAY_FILL
60+
#define F_ARRAY_FILL_ANYELEMENT__INT4__INT4 F_ARRAY_FILL_WITH_LOWER_BOUNDS
61+
#define F_CARDINALITY F_ARRAY_CARDINALITY
5362
#endif
54-
/* regexp_like was added in Postgres 15; Mock it for earlier versions. */
63+
/* regexp_like was added in Postgres 15 */
5564
#if PG_VERSION_NUM < 150000
5665
#define F_REGEXP_LIKE_TEXT_TEXT 6263
5766
#define F_REGEXP_LIKE_TEXT_TEXT_TEXT 6264
5867
#endif
68+
/* array_shuffle, array_sample added in Postgres 16 */
69+
#if PG_VERSION_NUM < 160000
70+
#define F_ARRAY_SHUFFLE 6215
71+
#define F_ARRAY_SAMPLE 6216
72+
#endif
73+
/* array_reverse, array_sort added in Postgres 18 */
74+
#if PG_VERSION_NUM < 180000
75+
#define F_ARRAY_REVERSE 6381
76+
#define F_ARRAY_SORT_ANYARRAY 6388
77+
#define F_ARRAY_SORT_ANYARRAY_BOOL 6389
78+
#define F_ARRAY_SORT_ANYARRAY_BOOL_BOOL 6390
79+
#endif
5980

6081
#define STR_STARTS_WITH(str, sub) strncmp(str, sub, strlen(sub)) == 0
6182
#define STR_EQUAL(a, b) strcmp(a, b) == 0
@@ -217,6 +238,34 @@ chfdw_check_for_custom_function(Oid funcid)
217238
case F_CLOCK_TIMESTAMP:
218239
case F_CURRENT_SCHEMA:
219240
case F_CURRENT_DATABASE:
241+
/* array functions: simple mappings */
242+
case F_ARRAY_CAT:
243+
case F_ARRAY_APPEND:
244+
case F_ARRAY_REMOVE:
245+
case F_ARRAY_TO_STRING_ANYARRAY_TEXT:
246+
case F_CARDINALITY:
247+
case F_ARRAY_REVERSE:
248+
case F_ARRAY_SORT_ANYARRAY:
249+
case F_ARRAY_SHUFFLE:
250+
case F_ARRAY_SAMPLE:
251+
/* array functions: arg rewriting */
252+
case F_ARRAY_LENGTH:
253+
case F_ARRAY_PREPEND:
254+
case F_STRING_TO_ARRAY_TEXT_TEXT:
255+
case F_TRIM_ARRAY:
256+
case F_ARRAY_FILL_ANYELEMENT__INT4:
257+
case F_ARRAY_SORT_ANYARRAY_BOOL:
258+
/* array functions: unshippable */
259+
case F_ARRAY_DIMS:
260+
case F_ARRAY_NDIMS:
261+
case F_ARRAY_LOWER:
262+
case F_ARRAY_UPPER:
263+
case F_ARRAY_REPLACE:
264+
case F_ARRAY_POSITIONS:
265+
case F_ARRAY_TO_STRING_ANYARRAY_TEXT_TEXT:
266+
case F_STRING_TO_ARRAY_TEXT_TEXT_TEXT:
267+
case F_ARRAY_FILL_ANYELEMENT__INT4__INT4:
268+
case F_ARRAY_SORT_ANYARRAY_BOOL_BOOL:
220269
special_builtin = true;
221270
break;
222271
default:
@@ -363,6 +412,69 @@ chfdw_check_for_custom_function(Oid funcid)
363412
entry->custom_name[0] = '\1';
364413
break;
365414
}
415+
/* array functions: unshippable */
416+
case F_ARRAY_DIMS:
417+
case F_ARRAY_NDIMS:
418+
case F_ARRAY_LOWER:
419+
case F_ARRAY_UPPER:
420+
case F_ARRAY_REPLACE:
421+
case F_ARRAY_POSITIONS:
422+
case F_ARRAY_TO_STRING_ANYARRAY_TEXT_TEXT:
423+
case F_STRING_TO_ARRAY_TEXT_TEXT_TEXT:
424+
case F_ARRAY_FILL_ANYELEMENT__INT4__INT4:
425+
case F_ARRAY_SORT_ANYARRAY_BOOL_BOOL:
426+
entry->cf_type = CF_UNSHIPPABLE;
427+
break;
428+
/* array functions: simple mappings */
429+
case F_ARRAY_CAT:
430+
strcpy(entry->custom_name, "arrayConcat");
431+
break;
432+
case F_ARRAY_APPEND:
433+
strcpy(entry->custom_name, "arrayPushBack");
434+
break;
435+
case F_ARRAY_REMOVE:
436+
strcpy(entry->custom_name, "arrayRemove");
437+
break;
438+
case F_ARRAY_TO_STRING_ANYARRAY_TEXT:
439+
strcpy(entry->custom_name, "arrayStringConcat");
440+
break;
441+
case F_CARDINALITY:
442+
case F_ARRAY_LENGTH:
443+
entry->cf_type = CF_ARRAY_LENGTH;
444+
strcpy(entry->custom_name, "length");
445+
break;
446+
case F_ARRAY_REVERSE:
447+
strcpy(entry->custom_name, "arrayReverse");
448+
break;
449+
case F_ARRAY_SORT_ANYARRAY:
450+
strcpy(entry->custom_name, "arraySort");
451+
break;
452+
case F_ARRAY_SHUFFLE:
453+
strcpy(entry->custom_name, "arrayShuffle");
454+
break;
455+
case F_ARRAY_SAMPLE:
456+
strcpy(entry->custom_name, "arrayRandomSample");
457+
break;
458+
case F_ARRAY_PREPEND:
459+
entry->cf_type = CF_ARRAY_PREPEND;
460+
strcpy(entry->custom_name, "arrayPushFront");
461+
break;
462+
case F_STRING_TO_ARRAY_TEXT_TEXT:
463+
entry->cf_type = CF_STRING_TO_ARRAY;
464+
strcpy(entry->custom_name, "splitByString");
465+
break;
466+
case F_TRIM_ARRAY:
467+
entry->cf_type = CF_TRIM_ARRAY;
468+
strcpy(entry->custom_name, "arrayResize");
469+
break;
470+
case F_ARRAY_FILL_ANYELEMENT__INT4:
471+
entry->cf_type = CF_ARRAY_FILL;
472+
strcpy(entry->custom_name, "arrayWithConstant");
473+
break;
474+
case F_ARRAY_SORT_ANYARRAY_BOOL:
475+
entry->cf_type = CF_ARRAY_SORT_DESC;
476+
entry->custom_name[0] = '\1';
477+
break;
366478
}
367479

368480
if (special_builtin)
@@ -426,39 +538,42 @@ chfdw_check_for_custom_type(Oid typeoid)
426538
return entry;
427539
}
428540

429-
/*
430-
* Operator-name to custom_object_type mapping table, searched linearly by
431-
* classify_builtin_operator(). Keep in sync with the CF_* enum in fdw.h.
432-
*/
433-
typedef struct
434-
{
435-
const char *oprname;
436-
custom_object_type ctype;
437-
} BuiltinOperatorMap;
438-
439-
static const BuiltinOperatorMap builtin_operator_map[] = {
440-
{"~", CF_REGEX_MATCH},
441-
{"!~", CF_REGEX_NO_MATCH},
442-
{"~*", CF_REGEX_ICASE_MATCH},
443-
{"!~*", CF_REGEX_ICASE_NO_MATCH},
444-
{"->", CF_JSONB_FETCHVAL},
445-
{"->>", CF_JSONB_FETCHVAL_TEXT},
446-
};
541+
/* pg_operator_d.h only has oid_symbol for some operators */
542+
#define OID_TEXT_REGEX_NE_OP 642
543+
#define OID_TEXT_IREGEX_NE_OP 1229
544+
#define OID_JSONB_FETCHVAL_OP 3211
545+
#define OID_JSONB_FETCHVAL_TEXT_OP 3477
447546

448547
/*
449-
* Map a builtin operator name to its custom_object_type. Returns CF_USUAL
450-
* when the operator needs no special handling and should follow the normal
451-
* builtin shortcut (i.e. be presumed shippable with no rewrite).
548+
* Map a builtin operator OID to its custom_object_type. Returns CF_USUAL
549+
* when the operator needs no special handling.
452550
*/
453551
static custom_object_type
454-
classify_builtin_operator(const char *oprname)
552+
classify_builtin_operator(Oid opoid)
455553
{
456-
for (int i = 0; i < lengthof(builtin_operator_map); i++)
554+
switch (opoid)
457555
{
458-
if (strcmp(oprname, builtin_operator_map[i].oprname) == 0)
459-
return builtin_operator_map[i].ctype;
556+
case OID_TEXT_REGEXEQ_OP:
557+
return CF_REGEX_MATCH;
558+
case OID_TEXT_REGEX_NE_OP:
559+
return CF_REGEX_NO_MATCH;
560+
case OID_TEXT_ICREGEXEQ_OP:
561+
return CF_REGEX_ICASE_MATCH;
562+
case OID_TEXT_IREGEX_NE_OP:
563+
return CF_REGEX_ICASE_NO_MATCH;
564+
case OID_JSONB_FETCHVAL_OP:
565+
return CF_JSONB_FETCHVAL;
566+
case OID_JSONB_FETCHVAL_TEXT_OP:
567+
return CF_JSONB_FETCHVAL_TEXT;
568+
case OID_ARRAY_CONTAINS_OP:
569+
return CF_ARRAY_CONTAINS;
570+
case OID_ARRAY_CONTAINED_OP:
571+
return CF_ARRAY_CONTAINED_BY;
572+
case OID_ARRAY_OVERLAP_OP:
573+
return CF_ARRAY_OVERLAP;
574+
default:
575+
return CF_USUAL;
460576
}
461-
return CF_USUAL;
462577
}
463578

464579
CustomObjectDef *
@@ -474,31 +589,10 @@ chfdw_check_for_custom_operator(Oid opoid, Form_pg_operator form)
474589

475590
if (chfdw_is_builtin(opoid))
476591
{
477-
switch (opoid)
592+
ctype = classify_builtin_operator(opoid);
593+
if (ctype == CF_USUAL && opoid != F_TIMESTAMPTZ_PL_INTERVAL)
478594
{
479-
/* timestamptz + interval */
480-
case F_TIMESTAMPTZ_PL_INTERVAL:
481-
break;
482-
default:
483-
484-
/* Look up the operator name so we can classify it. */
485-
if (!form)
486-
{
487-
tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(opoid));
488-
if (!HeapTupleIsValid(tuple))
489-
ereport(ERROR,
490-
errcode(ERRCODE_INTERNAL_ERROR),
491-
errmsg("pg_clickhouse: cache lookup failed for operator %u", opoid));
492-
form = (Form_pg_operator) GETSTRUCT(tuple);
493-
}
494-
495-
ctype = classify_builtin_operator(NameStr(form->oprname));
496-
if (ctype != CF_USUAL)
497-
break; /* fall through to cache + classify below */
498-
499-
if (tuple)
500-
ReleaseSysCache(tuple);
501-
return NULL;
595+
return NULL;
502596
}
503597
}
504598

@@ -516,10 +610,11 @@ chfdw_check_for_custom_operator(Oid opoid, Form_pg_operator form)
516610
entry = hash_search(custom_objects_cache, (void *) &opoid, HASH_ENTER, NULL);
517611
init_custom_entry(entry);
518612

613+
ctype = classify_builtin_operator(opoid);
519614
if (opoid == F_TIMESTAMPTZ_PL_INTERVAL)
520615
entry->cf_type = CF_TIMESTAMPTZ_PL_INTERVAL;
521-
else if (form && classify_builtin_operator(NameStr(form->oprname)) != CF_USUAL)
522-
entry->cf_type = classify_builtin_operator(NameStr(form->oprname));
616+
else if (ctype != CF_USUAL)
617+
entry->cf_type = ctype;
523618
else
524619
{
525620
Oid extoid = getExtensionOfObject(OperatorRelationId, opoid);

0 commit comments

Comments
 (0)