Skip to content

Commit 4bd5d31

Browse files
committed
Support pushdown for re2 extension
ClickHouse is working on a postgres extension using re2 for regex This addresses incompatibility between CH & PG regex, where engine depends on whether function is pushed down or not
1 parent 71fb23c commit 4bd5d31

8 files changed

Lines changed: 326 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ All notable changes to this project will be documented in this file. It uses the
4040
* Added mapping for `regexp_replace(4-arg)` to pushdown to
4141
`replaceRegexpAll()` when the `g` flag is set, and to prepend the flags to
4242
the pushed down expression.
43+
* Added pushdown for [pg_re2](https://github.com/ClickHouse/pg_re2) functions to their ClickHouse
44+
equivalents (e.g., `re2match``match`, `re2extractall``extractAll`).
4345

4446
### ⬆️ Dependency Updates
4547

@@ -63,6 +65,8 @@ All notable changes to this project will be documented in this file. It uses the
6365
* Added tests to ensure that `concat_ws()` successfully pushes down to the
6466
compatible function of the same name (an alias for [concatWithSeparator]).
6567

68+
[pg_re2]: https://github.com/ClickHouse/pg_re2
69+
"pg_re2: ClickHouse-compatible regex functions using RE2"
6670
[v0.1.11]: https://github.com/ClickHouse/pg_clickhouse/compare/v0.1.10...v0.1.11
6771
[concatWithSeparator]: https://clickhouse.com/docs/sql-reference/functions/string-functions#concatWithSeparator
6872

META.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
"requires": {
1818
"PostgreSQL": "13.0.0"
1919
}
20+
},
21+
"recommends": {
22+
"re2": "0.1.0"
2023
}
2124
},
2225
"provides": {

doc/pg_clickhouse.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1090,6 +1090,32 @@ any of these functions cannot be pushed down they will raise an exception.
10901090

10911091
* [dictGet](https://clickhouse.com/docs/sql-reference/functions/ext-dict-functions#dictget-dictgetordefault-dictgetornull)
10921092

1093+
### Extension Pushdown
1094+
1095+
pg_clickhouse recognizes functions from select core and third-party extensions,
1096+
pushing them down to their ClickHouse equivalents.
1097+
1098+
#### [pg_re2]
1099+
1100+
All [pg_re2] functions push down 1:1 to ClickHouse:
1101+
1102+
* `re2match` → [match](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#match)
1103+
* `re2extract` → [extract](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#extract)
1104+
* `re2extractall` → [extractAll](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#extractAll)
1105+
* `re2regexpextract` → [regexpExtract](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#regexpExtract)
1106+
* `re2extractgroups` → [extractGroups](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#extractGroups)
1107+
* `re2replaceregexpone` → [replaceRegexpOne](https://clickhouse.com/docs/sql-reference/functions/string-replace-functions#replaceRegexpOne)
1108+
* `re2replaceregexpall` → [replaceRegexpAll](https://clickhouse.com/docs/sql-reference/functions/string-replace-functions#replaceRegexpAll)
1109+
* `re2countmatches` → [countMatches](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#countMatches)
1110+
* `re2countmatchescaseinsensitive` → [countMatchesCaseInsensitive](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#countMatchesCaseInsensitive)
1111+
* `re2multimatchany` → [multiMatchAny](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#multiMatchAny)
1112+
* `re2multimatchanyindex` → [multiMatchAnyIndex](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#multiMatchAnyIndex)
1113+
* `re2multimatchallindices` → [multiMatchAllIndices](https://clickhouse.com/docs/sql-reference/functions/string-search-functions#multiMatchAllIndices)
1114+
1115+
#### [intarray]
1116+
1117+
* `idx` → [indexOf](https://clickhouse.com/docs/sql-reference/functions/array-functions#indexOf)
1118+
10931119
### Pushdown Casts
10941120

10951121
pg_clickhouse pushes down casts such as `CAST(x AS bigint)` for compatible
@@ -1335,3 +1361,7 @@ Copyright (c) 2025-2026, ClickHouse.
13351361
[Postgres flags]: https://www.postgresql.org/docs/18/functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE
13361362
"PostgreSQL Docs: ARE Embedded-Option Letters"
13371363
[RE2 Regular Expressions]: https://github.com/google/re2/wiki/Syntax "RE2 Syntax"
1364+
[pg_re2]: https://github.com/ClickHouse/pg_re2
1365+
"pg_re2: ClickHouse-compatible regex functions using RE2"
1366+
[intarray]: https://www.postgresql.org/docs/current/intarray.html
1367+
"PostgreSQL Docs: intarray"

src/custom_types.c

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,40 @@ chfdw_check_for_ordered_aggregate(Aggref * agg)
163163
return STR_EQUAL(extname, "pg_clickhouse");
164164
}
165165

166+
/*
167+
* Map sans-prefix pg_re2 function names to ClickHouse
168+
* case-sensitive names. Must be kept in lexicographic order.
169+
*/
170+
static char *re2_func_map[][2] = {
171+
{"countmatches", "countMatches"},
172+
{"countmatchescaseinsensitive", "countMatchesCaseInsensitive"},
173+
{"extractall", "extractAll"},
174+
{"extractgroups", "extractGroups"},
175+
{"multimatchallindices", "multiMatchAllIndices"},
176+
{"multimatchany", "multiMatchAny"},
177+
{"multimatchanyindex", "multiMatchAnyIndex"},
178+
{"regexpextract", "regexpExtract"},
179+
{"replaceregexpall", "replaceRegexpAll"},
180+
{"replaceregexpone", "replaceRegexpOne"},
181+
{NULL, NULL},
182+
};
183+
184+
inline static char *
185+
re2_func_name(char *proname)
186+
{
187+
Assert(strncmp(proname, "re2", 3) == 0);
188+
char *stripped = proname + 3;
189+
size_t i = 0;
190+
191+
while (re2_func_map[i][0] != NULL)
192+
{
193+
if (STR_EQUAL(re2_func_map[i][0], stripped))
194+
return re2_func_map[i][1];
195+
i++;
196+
}
197+
return stripped;
198+
}
199+
166200
/*
167201
* Map pg_clickhouse pushdown function names to ClickHouse case-sensitive
168202
* names. Must be kept in lexicographic order.
@@ -520,6 +554,12 @@ chfdw_check_for_custom_function(Oid funcid)
520554
strcpy(entry->custom_name, "indexOf");
521555
}
522556
}
557+
else if (STR_EQUAL(extname, "re2"))
558+
{
559+
/* pg_re2: 1:1 pushdown to ClickHouse RE2 functions. */
560+
entry->cf_type = CF_CH_FUNCTION;
561+
strlcpy(entry->custom_name, re2_func_name(proname), NAMEDATALEN);
562+
}
523563
else if (STR_EQUAL(extname, "pg_clickhouse"))
524564
{
525565
/* pg_clickhouse custom functions. */

test/expected/re2_functions.out

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
SELECT EXISTS(SELECT 1 FROM pg_available_extensions WHERE name = 're2') AS have_re2 \gset
2+
\if :have_re2
3+
CREATE SERVER re2_svr FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(dbname 're2_test');
4+
CREATE USER MAPPING FOR CURRENT_USER SERVER re2_svr;
5+
SELECT clickhouse_raw_query('DROP DATABASE IF EXISTS re2_test');
6+
clickhouse_raw_query
7+
----------------------
8+
9+
(1 row)
10+
11+
SELECT clickhouse_raw_query('CREATE DATABASE re2_test');
12+
clickhouse_raw_query
13+
----------------------
14+
15+
(1 row)
16+
17+
SELECT clickhouse_raw_query($$
18+
CREATE TABLE re2_test.t1 (
19+
id Int32,
20+
val String
21+
) ENGINE = MergeTree ORDER BY id
22+
$$);
23+
clickhouse_raw_query
24+
----------------------
25+
26+
(1 row)
27+
28+
SELECT clickhouse_raw_query($$
29+
INSERT INTO re2_test.t1 VALUES
30+
(1, 'POSIX uses BRE and ERE'),
31+
(2, 're2 uses finite automata'),
32+
(3, 'PCRE supports backtracking')
33+
$$);
34+
clickhouse_raw_query
35+
----------------------
36+
37+
(1 row)
38+
39+
CREATE SCHEMA re2_test;
40+
IMPORT FOREIGN SCHEMA re2_test FROM SERVER re2_svr INTO re2_test;
41+
SET search_path = re2_test, public;
42+
CREATE EXTENSION re2;
43+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2match(val, 're2');
44+
QUERY PLAN
45+
-------------------------------------------------------------------------
46+
Foreign Scan on re2_test.t1
47+
Output: id, val
48+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE (match(val, 're2'))
49+
(3 rows)
50+
51+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extract(val, '(re2)') = 're2';
52+
QUERY PLAN
53+
---------------------------------------------------------------------------------------
54+
Foreign Scan on re2_test.t1
55+
Output: id, val
56+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((extract(val, '(re2)') = 're2'))
57+
(3 rows)
58+
59+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extractall(val, '[A-Z]+') = ARRAY['POSIX','BRE','ERE'];
60+
QUERY PLAN
61+
-----------------------------------------------------------------------------------------------------------
62+
Foreign Scan on re2_test.t1
63+
Output: id, val
64+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((extractAll(val, '[A-Z]+') = ['POSIX','BRE','ERE']))
65+
(3 rows)
66+
67+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2regexpextract(val, '(re2)', 1) = 're2';
68+
QUERY PLAN
69+
------------------------------------------------------------------------------------------------
70+
Foreign Scan on re2_test.t1
71+
Output: id, val
72+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((regexpExtract(val, '(re2)', 1) = 're2'))
73+
(3 rows)
74+
75+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extractgroups(val, '(POSIX) uses (BRE)') = ARRAY['POSIX','BRE'];
76+
QUERY PLAN
77+
--------------------------------------------------------------------------------------------------------------------
78+
Foreign Scan on re2_test.t1
79+
Output: id, val
80+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((extractGroups(val, '(POSIX) uses (BRE)') = ['POSIX','BRE']))
81+
(3 rows)
82+
83+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2replaceregexpone(val, 'POSIX', 're2') = 're2 uses BRE and ERE';
84+
QUERY PLAN
85+
------------------------------------------------------------------------------------------------------------------------
86+
Foreign Scan on re2_test.t1
87+
Output: id, val
88+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((replaceRegexpOne(val, 'POSIX', 're2') = 're2 uses BRE and ERE'))
89+
(3 rows)
90+
91+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2replaceregexpall(val, ' ', '-') = 're2-uses-finite-automata';
92+
QUERY PLAN
93+
----------------------------------------------------------------------------------------------------------------------
94+
Foreign Scan on re2_test.t1
95+
Output: id, val
96+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((replaceRegexpAll(val, ' ', '-') = 're2-uses-finite-automata'))
97+
(3 rows)
98+
99+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2countmatches(val, 'e') > 0;
100+
QUERY PLAN
101+
------------------------------------------------------------------------------------
102+
Foreign Scan on re2_test.t1
103+
Output: id, val
104+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((countMatches(val, 'e') > 0))
105+
(3 rows)
106+
107+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2countmatchescaseinsensitive(val, 'E') > 0;
108+
QUERY PLAN
109+
---------------------------------------------------------------------------------------------------
110+
Foreign Scan on re2_test.t1
111+
Output: id, val
112+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((countMatchesCaseInsensitive(val, 'E') > 0))
113+
(3 rows)
114+
115+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchany(val, ARRAY['POSIX','PCRE']);
116+
QUERY PLAN
117+
--------------------------------------------------------------------------------------------
118+
Foreign Scan on re2_test.t1
119+
Output: id, val
120+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE (multiMatchAny(val, ['POSIX','PCRE']))
121+
(3 rows)
122+
123+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchanyindex(val, ARRAY['POSIX','PCRE']) > 0;
124+
QUERY PLAN
125+
-------------------------------------------------------------------------------------------------------
126+
Foreign Scan on re2_test.t1
127+
Output: id, val
128+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((multiMatchAnyIndex(val, ['POSIX','PCRE']) > 0))
129+
(3 rows)
130+
131+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchallindices(val, ARRAY['POSIX','PCRE']) = ARRAY[1];
132+
QUERY PLAN
133+
-----------------------------------------------------------------------------------------------------------
134+
Foreign Scan on re2_test.t1
135+
Output: id, val
136+
Remote SQL: SELECT id, val FROM re2_test.t1 WHERE ((multiMatchAllIndices(val, ['POSIX','PCRE']) = [1]))
137+
(3 rows)
138+
139+
DROP EXTENSION re2;
140+
DROP USER MAPPING FOR CURRENT_USER SERVER re2_svr;
141+
SELECT clickhouse_raw_query('DROP DATABASE re2_test');
142+
clickhouse_raw_query
143+
----------------------
144+
145+
(1 row)
146+
147+
DROP SERVER re2_svr CASCADE;
148+
NOTICE: drop cascades to foreign table t1
149+
\else
150+
\echo 'SKIP: re2 extension not available'
151+
\endif

test/expected/re2_functions_1.out

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
SELECT EXISTS(SELECT 1 FROM pg_available_extensions WHERE name = 're2') AS have_re2 \gset
2+
\if :have_re2
3+
CREATE SERVER re2_svr FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(dbname 're2_test');
4+
CREATE USER MAPPING FOR CURRENT_USER SERVER re2_svr;
5+
SELECT clickhouse_raw_query('DROP DATABASE IF EXISTS re2_test');
6+
SELECT clickhouse_raw_query('CREATE DATABASE re2_test');
7+
SELECT clickhouse_raw_query($$
8+
CREATE TABLE re2_test.t1 (
9+
id Int32,
10+
val String
11+
) ENGINE = MergeTree ORDER BY id
12+
$$);
13+
SELECT clickhouse_raw_query($$
14+
INSERT INTO re2_test.t1 VALUES
15+
(1, 'POSIX uses BRE and ERE'),
16+
(2, 're2 uses finite automata'),
17+
(3, 'PCRE supports backtracking')
18+
$$);
19+
CREATE SCHEMA re2_test;
20+
IMPORT FOREIGN SCHEMA re2_test FROM SERVER re2_svr INTO re2_test;
21+
SET search_path = re2_test, public;
22+
CREATE EXTENSION re2;
23+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2match(val, 're2');
24+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extract(val, '(re2)') = 're2';
25+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extractall(val, '[A-Z]+') = ARRAY['POSIX','BRE','ERE'];
26+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2regexpextract(val, '(re2)', 1) = 're2';
27+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extractgroups(val, '(POSIX) uses (BRE)') = ARRAY['POSIX','BRE'];
28+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2replaceregexpone(val, 'POSIX', 're2') = 're2 uses BRE and ERE';
29+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2replaceregexpall(val, ' ', '-') = 're2-uses-finite-automata';
30+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2countmatches(val, 'e') > 0;
31+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2countmatchescaseinsensitive(val, 'E') > 0;
32+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchany(val, ARRAY['POSIX','PCRE']);
33+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchanyindex(val, ARRAY['POSIX','PCRE']) > 0;
34+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchallindices(val, ARRAY['POSIX','PCRE']) = ARRAY[1];
35+
DROP EXTENSION re2;
36+
DROP USER MAPPING FOR CURRENT_USER SERVER re2_svr;
37+
SELECT clickhouse_raw_query('DROP DATABASE re2_test');
38+
DROP SERVER re2_svr CASCADE;
39+
\else
40+
\echo 'SKIP: re2 extension not available'
41+
SKIP: re2 extension not available
42+
\endif

test/expected/result_map.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,3 +225,11 @@ window_functions.sql
225225
24.3 | window_functions_2.out
226226
23.8 | window_functions_5.out
227227
23.3 | window_functions_3.out
228+
229+
re2_functions.sql
230+
-----------------
231+
232+
Postgres | pg_re2 | File
233+
----------|-----------|----------------------
234+
13+ | installed | re2_functions.out
235+
13+ | absent | re2_functions_1.out

test/sql/re2_functions.sql

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
SELECT EXISTS(SELECT 1 FROM pg_available_extensions WHERE name = 're2') AS have_re2 \gset
2+
\if :have_re2
3+
4+
CREATE SERVER re2_svr FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(dbname 're2_test');
5+
CREATE USER MAPPING FOR CURRENT_USER SERVER re2_svr;
6+
7+
SELECT clickhouse_raw_query('DROP DATABASE IF EXISTS re2_test');
8+
SELECT clickhouse_raw_query('CREATE DATABASE re2_test');
9+
SELECT clickhouse_raw_query($$
10+
CREATE TABLE re2_test.t1 (
11+
id Int32,
12+
val String
13+
) ENGINE = MergeTree ORDER BY id
14+
$$);
15+
SELECT clickhouse_raw_query($$
16+
INSERT INTO re2_test.t1 VALUES
17+
(1, 'POSIX uses BRE and ERE'),
18+
(2, 're2 uses finite automata'),
19+
(3, 'PCRE supports backtracking')
20+
$$);
21+
22+
CREATE SCHEMA re2_test;
23+
IMPORT FOREIGN SCHEMA re2_test FROM SERVER re2_svr INTO re2_test;
24+
SET search_path = re2_test, public;
25+
26+
CREATE EXTENSION re2;
27+
28+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2match(val, 're2');
29+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extract(val, '(re2)') = 're2';
30+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extractall(val, '[A-Z]+') = ARRAY['POSIX','BRE','ERE'];
31+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2regexpextract(val, '(re2)', 1) = 're2';
32+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2extractgroups(val, '(POSIX) uses (BRE)') = ARRAY['POSIX','BRE'];
33+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2replaceregexpone(val, 'POSIX', 're2') = 're2 uses BRE and ERE';
34+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2replaceregexpall(val, ' ', '-') = 're2-uses-finite-automata';
35+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2countmatches(val, 'e') > 0;
36+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2countmatchescaseinsensitive(val, 'E') > 0;
37+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchany(val, ARRAY['POSIX','PCRE']);
38+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchanyindex(val, ARRAY['POSIX','PCRE']) > 0;
39+
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t1 WHERE re2multimatchallindices(val, ARRAY['POSIX','PCRE']) = ARRAY[1];
40+
41+
DROP EXTENSION re2;
42+
DROP USER MAPPING FOR CURRENT_USER SERVER re2_svr;
43+
SELECT clickhouse_raw_query('DROP DATABASE re2_test');
44+
DROP SERVER re2_svr CASCADE;
45+
46+
\else
47+
\echo 'SKIP: re2 extension not available'
48+
\endif

0 commit comments

Comments
 (0)