You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -362,6 +372,7 @@ DROP FUNCTION IF EXISTS py_is_prime(INT);
362
372
|`symbol`| Yes | - | Python function entry name.<br>• **Inline Mode**: Write function name directly, such as `"evaluate"`<br>• **Module Mode**: Format is `[package_name.]module_name.func_name`, see module mode description |
363
373
|`file`| No | - | Python `.zip` package path, only required for module mode. Supports three protocols:<br>• `file://` - Local filesystem path<br>• `http://` - HTTP remote download<br>• `https://` - HTTPS remote download |
364
374
|`runtime_version`| Yes | - | Python runtime version, such as `"3.10.12"`, requires complete version number |
375
+
|`volatility`| No |`volatile`| Volatility of the Python UDF. Valid values are `immutable`, `stable`, and `volatile`.<br>`immutable`: identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state.<br>`stable`: identical inputs produce the same result within a single statement, but the result may change between statements, such as `now()` and `current_timestamp()`.<br>`volatile`: the function result may change for each call, such as `uuid()` and `random()`.<br>Correctly marking this property allows the optimizer to handle rewrite and other optimization scenarios more safely; incorrect marking may cause wrong query results. |
365
376
|`always_nullable`| No |`true`| Whether to always return nullable results |
366
377
367
378
#### Runtime Version Description
@@ -427,6 +438,7 @@ PROPERTIES (
427
438
"type"="PYTHON_UDF",
428
439
"symbol"="evaluate",
429
440
"runtime_version"="3.10.12",
441
+
"volatility"="immutable",
430
442
"always_nullable"="true"
431
443
)
432
444
AS $$
@@ -473,6 +485,7 @@ PROPERTIES (
473
485
"type"="PYTHON_UDF",
474
486
"symbol"="add",
475
487
"runtime_version"="3.10.12",
488
+
"volatility"="immutable",
476
489
"always_nullable"="true"
477
490
)
478
491
AS $$
@@ -495,6 +508,7 @@ PROPERTIES (
495
508
"type"="PYTHON_UDF",
496
509
"symbol"="to_upper",
497
510
"runtime_version"="3.10.12",
511
+
"volatility"="immutable",
498
512
"always_nullable"="true"
499
513
)
500
514
AS $$
@@ -517,6 +531,7 @@ PROPERTIES (
517
531
"type"="PYTHON_UDF",
518
532
"symbol"="sqrt",
519
533
"runtime_version"="3.10.12",
534
+
"volatility"="immutable",
520
535
"always_nullable"="true"
521
536
)
522
537
AS $$
@@ -586,6 +601,7 @@ PROPERTIES (
586
601
"type"="PYTHON_UDF",
587
602
"symbol"="evaluate",
588
603
"runtime_version"="3.10.12",
604
+
"volatility"="immutable",
589
605
"always_nullable"="true"
590
606
)
591
607
AS $$
@@ -609,6 +625,7 @@ PROPERTIES (
609
625
"type"="PYTHON_UDF",
610
626
"symbol"="evaluate",
611
627
"runtime_version"="3.10.12",
628
+
"volatility"="immutable",
612
629
"always_nullable"="true"
613
630
)
614
631
AS $$
@@ -633,6 +650,7 @@ PROPERTIES (
633
650
"type"="PYTHON_UDF",
634
651
"symbol"="evaluate",
635
652
"runtime_version"="3.10.12",
653
+
"volatility"="immutable",
636
654
"always_nullable"="true"
637
655
)
638
656
AS $$
@@ -656,6 +674,7 @@ PROPERTIES (
656
674
"type"="PYTHON_UDF",
657
675
"symbol"="evaluate",
658
676
"runtime_version"="3.10.12",
677
+
"volatility"="immutable",
659
678
"always_nullable"="true"
660
679
)
661
680
AS $$
@@ -680,6 +699,7 @@ PROPERTIES (
680
699
"type"="PYTHON_UDF",
681
700
"symbol"="evaluate",
682
701
"runtime_version"="3.10.12",
702
+
"volatility"="immutable",
683
703
"always_nullable"="true"
684
704
)
685
705
AS $$
@@ -705,7 +725,8 @@ RETURNS STRING
705
725
PROPERTIES (
706
726
"type"="PYTHON_UDF",
707
727
"symbol"="evaluate",
708
-
"runtime_version"="3.10.12"
728
+
"runtime_version"="3.10.12",
729
+
"volatility"="immutable"
709
730
)
710
731
AS $$
711
732
def evaluate(email):
@@ -731,7 +752,8 @@ RETURNS INT
731
752
PROPERTIES (
732
753
"type"="PYTHON_UDF",
733
754
"symbol"="evaluate",
734
-
"runtime_version"="3.10.12"
755
+
"runtime_version"="3.10.12",
756
+
"volatility"="immutable"
735
757
)
736
758
AS $$
737
759
def evaluate(s1, s2):
@@ -768,7 +790,8 @@ RETURNS INT
768
790
PROPERTIES (
769
791
"type"="PYTHON_UDF",
770
792
"symbol"="evaluate",
771
-
"runtime_version"="3.10.12"
793
+
"runtime_version"="3.10.12",
794
+
"volatility"="immutable"
772
795
)
773
796
AS $$
774
797
from datetime import datetime
@@ -797,7 +820,8 @@ RETURNS BOOLEAN
797
820
PROPERTIES (
798
821
"type"="PYTHON_UDF",
799
822
"symbol"="evaluate",
800
-
"runtime_version"="3.10.12"
823
+
"runtime_version"="3.10.12",
824
+
"volatility"="immutable"
801
825
)
802
826
AS $$
803
827
def evaluate(id_card):
@@ -1411,6 +1435,8 @@ DROP FUNCTION IF EXISTS py_variance(DOUBLE);
1411
1435
|`runtime_version`| Yes | - | Python runtime version, such as `"3.10.12"`|
1412
1436
|`always_nullable`| No |`true`| Whether to always return nullable results |
1413
1437
1438
+
`volatility` is only supported for scalar Python UDF and is not supported for Python UDAF.
1439
+
1414
1440
#### runtime_version Description
1415
1441
1416
1442
- Must fill in **complete version number** of Python version, format is `x.x.x` or `x.x.xx`
@@ -2407,6 +2433,8 @@ CREATE TABLES FUNCTION py_split(STRING, STRING) ...;
2407
2433
|`runtime_version`| Yes | - | Python runtime version, such as `"3.10.12"`|
2408
2434
|`always_nullable`| No |`true`| Whether to always return nullable results |
2409
2435
2436
+
`volatility` is only supported for scalar Python UDF and is not supported for Python UDTF.
2437
+
2410
2438
#### runtime_version Description
2411
2439
2412
2440
- Must fill in **complete version number** of Python version, format is `x.x.x` or `x.x.xx`
@@ -2962,6 +2990,7 @@ PROPERTIES (
2962
2990
"type"="PYTHON_UDF",
2963
2991
"symbol"="evaluate",
2964
2992
"runtime_version"="3.12.11", -- Must specify complete version number, matching Python 3.12.11
2993
+
"volatility"="immutable",
2965
2994
"always_nullable"="true"
2966
2995
)
2967
2996
AS $$
@@ -3064,6 +3093,7 @@ PROPERTIES (
3064
3093
"type"="PYTHON_UDF",
3065
3094
"symbol"="evaluate",
3066
3095
"runtime_version"="3.9.18", -- Must specify complete version number, matching Python 3.9.18
3096
+
"volatility"="immutable",
3067
3097
"always_nullable"="true"
3068
3098
)
3069
3099
AS $$
@@ -3078,6 +3108,7 @@ PROPERTIES (
3078
3108
"type"="PYTHON_UDF",
3079
3109
"symbol"="evaluate",
3080
3110
"runtime_version"="3.12.11", -- Must specify complete version number, matching Python 3.12.11
Copy file name to clipboardExpand all lines: docs/sql-manual/sql-statements/function/CREATE-FUNCTION.md
+94-4Lines changed: 94 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,6 +75,7 @@ CREATE [ GLOBAL ]
75
75
> -`symbol`: Indicates the class name containing the UDF class. This parameter is mandatory.
76
76
> -`type`: Indicates the UDF call type. The default is Native. Use JAVA_UDF when using a Java UDF.
77
77
> -`always_nullable`: Indicates whether the UDF result may contain NULL values. This is an optional parameter with a default value of true.
78
+
> -`volatility`: Indicates the volatility of a scalar Java UDF or scalar Python UDF. This is an optional parameter with a default value of `volatile`. Valid values are `immutable`, `stable`, and `volatile`. `immutable` means identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state. `stable` means identical inputs produce the same result within a single statement, but the result may change between statements; examples include `now()` and `current_timestamp()`. `volatile` means the function result may change for each call; examples include `uuid()` and `random()`. Correct marking allows the optimizer to handle query rewrites more safely; incorrect marking may lead to wrong query results. This property is not supported for UDAF, UDTF, RPC, or alias functions.
78
79
79
80
## Access Control Requirements
80
81
@@ -91,11 +92,12 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
2. Create a custom UDAF function. The `volatility` property is not supported for UDAF.
99
101
100
102
101
103
@@ -108,7 +110,7 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
108
110
);
109
111
```
110
112
111
-
3. Create a custom UDTF function.
113
+
3. Create a custom UDTF function. The `volatility` property is not supported for UDTF.
112
114
113
115
114
116
@@ -135,4 +137,92 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
135
137
136
138
```sql
137
139
CREATE GLOBAL ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));
138
-
```
140
+
```
141
+
142
+
6. Create a volatile Python UDF. Functions such as `uuid.uuid4()` that depend on randomness should keep the default `volatility = volatile` and must not be incorrectly marked as `immutable`.
143
+
144
+
```sql
145
+
CREATETABLEcte_uuid_seed (id INT) ENGINE=OLAP DUPLICATE KEY(id)
146
+
DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num"="1");
147
+
INSERT INTO cte_uuid_seed VALUES (1),(2),(3);
148
+
149
+
DROPFUNCTION IF EXISTS py_uuid_token(INT);
150
+
CREATEFUNCTIONpy_uuid_token(INT)
151
+
RETURNS STRING
152
+
PROPERTIES (
153
+
"type"="PYTHON_UDF",
154
+
"symbol"="py_uuid_token_impl",
155
+
"always_nullable"="false",
156
+
"runtime_version"="3.12.11",
157
+
"volatility"="volatile"
158
+
)
159
+
AS $$
160
+
import uuid
161
+
def py_uuid_token_impl(x):
162
+
return f"{x}-{uuid.uuid4()}"
163
+
$$;
164
+
165
+
SET enable_cte_materialize = true;
166
+
SET inline_cte_referenced_threshold =10;
167
+
168
+
WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
169
+
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
170
+
FROM (SELECT id, token FROM cte UNION ALLSELECT id, token FROM cte) u
171
+
GROUP BY id ORDER BY id;
172
+
```
173
+
174
+
Correct result:
175
+
176
+
```text
177
+
+------+-----------------+
178
+
| id | distinct_tokens |
179
+
+------+-----------------+
180
+
| 1 | 1 |
181
+
| 2 | 1 |
182
+
| 3 | 1 |
183
+
+------+-----------------+
184
+
```
185
+
186
+
For this function, the following definition is incorrect:
187
+
188
+
```sql
189
+
DROPFUNCTION IF EXISTS py_uuid_token(INT);
190
+
CREATEFUNCTIONpy_uuid_token(INT)
191
+
RETURNS STRING
192
+
PROPERTIES (
193
+
"type"="PYTHON_UDF",
194
+
"symbol"="py_uuid_token_impl",
195
+
"always_nullable"="false",
196
+
"runtime_version"="3.12.11",
197
+
"volatility"="immutable"
198
+
)
199
+
AS $$
200
+
import uuid
201
+
def py_uuid_token_impl(x):
202
+
return f"{x}-{uuid.uuid4()}"
203
+
$$;
204
+
```
205
+
206
+
Run the same query again:
207
+
208
+
```sql
209
+
WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
210
+
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
211
+
FROM (SELECT id, token FROM cte UNION ALLSELECT id, token FROM cte) u
212
+
GROUP BY id ORDER BY id;
213
+
```
214
+
215
+
Incorrect result:
216
+
217
+
```text
218
+
+------+-----------------+
219
+
| id | distinct_tokens |
220
+
+------+-----------------+
221
+
| 1 | 2 |
222
+
| 2 | 2 |
223
+
| 3 | 2 |
224
+
+------+-----------------+
225
+
```
226
+
227
+
Why this is wrong:
228
+
Because `py_uuid_token` is volatile, each call to `uuid.uuid4()` generates a new value. If the function is incorrectly marked as `volatility = immutable`, the optimizer may treat repeated references as safe to rewrite and may choose a plan that evaluates the UDF separately on both sides of `UNION ALL`. As a result, the same `id` can produce two different `token` values, and `COUNT(DISTINCT token)` changes from `1` to `2`.
0 commit comments