Skip to content

Commit f0c9f30

Browse files
committed
[Enhancement](udf) Support volatility property for scalar UDF
1 parent 96f1410 commit f0c9f30

6 files changed

Lines changed: 458 additions & 28 deletions

File tree

docs/query-data/udf/python-user-defined-function.md

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ PROPERTIES (
4040
"type" = "PYTHON_UDF",
4141
"symbol" = "entry_function_name",
4242
"runtime_version" = "python_version",
43+
"volatility" = "immutable|stable|volatile",
4344
"always_nullable" = "true|false"
4445
)
4546
AS $$
@@ -58,7 +59,8 @@ RETURNS INT
5859
PROPERTIES (
5960
"type" = "PYTHON_UDF",
6061
"symbol" = "evaluate",
61-
"runtime_version" = "3.10.12"
62+
"runtime_version" = "3.10.12",
63+
"volatility" = "immutable"
6264
)
6365
AS $$
6466
def evaluate(a, b):
@@ -77,7 +79,8 @@ RETURNS STRING
7779
PROPERTIES (
7880
"type" = "PYTHON_UDF",
7981
"symbol" = "evaluate",
80-
"runtime_version" = "3.10.12"
82+
"runtime_version" = "3.10.12",
83+
"volatility" = "immutable"
8184
)
8285
AS $$
8386
def evaluate(s1, s2):
@@ -247,6 +250,7 @@ PROPERTIES (
247250
"file" = "file:///path/to/python_udf_scalar_ops.zip",
248251
"symbol" = "python_udf_scalar_ops.add_three_numbers",
249252
"runtime_version" = "3.10.12",
253+
"volatility" = "immutable",
250254
"always_nullable" = "true"
251255
);
252256

@@ -257,6 +261,7 @@ PROPERTIES (
257261
"file" = "file:///path/to/python_udf_scalar_ops.zip",
258262
"symbol" = "python_udf_scalar_ops.reverse_string",
259263
"runtime_version" = "3.10.12",
264+
"volatility" = "immutable",
260265
"always_nullable" = "true"
261266
);
262267

@@ -267,6 +272,7 @@ PROPERTIES (
267272
"file" = "file:///path/to/python_udf_scalar_ops.zip",
268273
"symbol" = "python_udf_scalar_ops.is_prime",
269274
"runtime_version" = "3.10.12",
275+
"volatility" = "immutable",
270276
"always_nullable" = "true"
271277
);
272278
```
@@ -284,6 +290,7 @@ PROPERTIES (
284290
"file" = "https://your-storage.com/udf/python_udf_scalar_ops.zip",
285291
"symbol" = "python_udf_scalar_ops.add_three_numbers",
286292
"runtime_version" = "3.10.12",
293+
"volatility" = "immutable",
287294
"always_nullable" = "true"
288295
);
289296

@@ -294,6 +301,7 @@ PROPERTIES (
294301
"file" = "https://your-storage.com/udf/python_udf_scalar_ops.zip",
295302
"symbol" = "python_udf_scalar_ops.reverse_string",
296303
"runtime_version" = "3.10.12",
304+
"volatility" = "immutable",
297305
"always_nullable" = "true"
298306
);
299307

@@ -304,6 +312,7 @@ PROPERTIES (
304312
"file" = "https://your-storage.com/udf/python_udf_scalar_ops.zip",
305313
"symbol" = "python_udf_scalar_ops.is_prime",
306314
"runtime_version" = "3.10.12",
315+
"volatility" = "immutable",
307316
"always_nullable" = "true"
308317
);
309318
```
@@ -320,6 +329,7 @@ PROPERTIES (
320329
"file" = "file:///path/to/my_udf.zip",
321330
"symbol" = "my_udf.math_ops.multiply_by_two",
322331
"runtime_version" = "3.10.12",
332+
"volatility" = "immutable",
323333
"always_nullable" = "true"
324334
);
325335
```
@@ -362,6 +372,7 @@ DROP FUNCTION IF EXISTS py_is_prime(INT);
362372
| `symbol` | Yes | - | Python function entry name.<br>• **Inline Mode**: Write function name directly, such as `"evaluate"`<br>• **Module Mode**: Format is `[package_name.]module_name.func_name`, see module mode description |
363373
| `file` | No | - | Python `.zip` package path, only required for module mode. Supports three protocols:<br>• `file://` - Local filesystem path<br>• `http://` - HTTP remote download<br>• `https://` - HTTPS remote download |
364374
| `runtime_version` | Yes | - | Python runtime version, such as `"3.10.12"`, requires complete version number |
375+
| `volatility` | No | `volatile` | Volatility of the Python UDF. Valid values are `immutable`, `stable`, and `volatile`.<br>`immutable`: identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state.<br>`stable`: identical inputs produce the same result within a single statement, but the result may change between statements, such as `now()` and `current_timestamp()`.<br>`volatile`: the function result may change for each call, such as `uuid()` and `random()`.<br>Correctly marking this property allows the optimizer to handle rewrite and other optimization scenarios more safely; incorrect marking may cause wrong query results. |
365376
| `always_nullable` | No | `true` | Whether to always return nullable results |
366377

367378
#### Runtime Version Description
@@ -427,6 +438,7 @@ PROPERTIES (
427438
"type" = "PYTHON_UDF",
428439
"symbol" = "evaluate",
429440
"runtime_version" = "3.10.12",
441+
"volatility" = "immutable",
430442
"always_nullable" = "true"
431443
)
432444
AS $$
@@ -473,6 +485,7 @@ PROPERTIES (
473485
"type" = "PYTHON_UDF",
474486
"symbol" = "add",
475487
"runtime_version" = "3.10.12",
488+
"volatility" = "immutable",
476489
"always_nullable" = "true"
477490
)
478491
AS $$
@@ -495,6 +508,7 @@ PROPERTIES (
495508
"type" = "PYTHON_UDF",
496509
"symbol" = "to_upper",
497510
"runtime_version" = "3.10.12",
511+
"volatility" = "immutable",
498512
"always_nullable" = "true"
499513
)
500514
AS $$
@@ -517,6 +531,7 @@ PROPERTIES (
517531
"type" = "PYTHON_UDF",
518532
"symbol" = "sqrt",
519533
"runtime_version" = "3.10.12",
534+
"volatility" = "immutable",
520535
"always_nullable" = "true"
521536
)
522537
AS $$
@@ -586,6 +601,7 @@ PROPERTIES (
586601
"type" = "PYTHON_UDF",
587602
"symbol" = "evaluate",
588603
"runtime_version" = "3.10.12",
604+
"volatility" = "immutable",
589605
"always_nullable" = "true"
590606
)
591607
AS $$
@@ -609,6 +625,7 @@ PROPERTIES (
609625
"type" = "PYTHON_UDF",
610626
"symbol" = "evaluate",
611627
"runtime_version" = "3.10.12",
628+
"volatility" = "immutable",
612629
"always_nullable" = "true"
613630
)
614631
AS $$
@@ -633,6 +650,7 @@ PROPERTIES (
633650
"type" = "PYTHON_UDF",
634651
"symbol" = "evaluate",
635652
"runtime_version" = "3.10.12",
653+
"volatility" = "immutable",
636654
"always_nullable" = "true"
637655
)
638656
AS $$
@@ -656,6 +674,7 @@ PROPERTIES (
656674
"type" = "PYTHON_UDF",
657675
"symbol" = "evaluate",
658676
"runtime_version" = "3.10.12",
677+
"volatility" = "immutable",
659678
"always_nullable" = "true"
660679
)
661680
AS $$
@@ -680,6 +699,7 @@ PROPERTIES (
680699
"type" = "PYTHON_UDF",
681700
"symbol" = "evaluate",
682701
"runtime_version" = "3.10.12",
702+
"volatility" = "immutable",
683703
"always_nullable" = "true"
684704
)
685705
AS $$
@@ -705,7 +725,8 @@ RETURNS STRING
705725
PROPERTIES (
706726
"type" = "PYTHON_UDF",
707727
"symbol" = "evaluate",
708-
"runtime_version" = "3.10.12"
728+
"runtime_version" = "3.10.12",
729+
"volatility" = "immutable"
709730
)
710731
AS $$
711732
def evaluate(email):
@@ -731,7 +752,8 @@ RETURNS INT
731752
PROPERTIES (
732753
"type" = "PYTHON_UDF",
733754
"symbol" = "evaluate",
734-
"runtime_version" = "3.10.12"
755+
"runtime_version" = "3.10.12",
756+
"volatility" = "immutable"
735757
)
736758
AS $$
737759
def evaluate(s1, s2):
@@ -768,7 +790,8 @@ RETURNS INT
768790
PROPERTIES (
769791
"type" = "PYTHON_UDF",
770792
"symbol" = "evaluate",
771-
"runtime_version" = "3.10.12"
793+
"runtime_version" = "3.10.12",
794+
"volatility" = "immutable"
772795
)
773796
AS $$
774797
from datetime import datetime
@@ -797,7 +820,8 @@ RETURNS BOOLEAN
797820
PROPERTIES (
798821
"type" = "PYTHON_UDF",
799822
"symbol" = "evaluate",
800-
"runtime_version" = "3.10.12"
823+
"runtime_version" = "3.10.12",
824+
"volatility" = "immutable"
801825
)
802826
AS $$
803827
def evaluate(id_card):
@@ -1411,6 +1435,8 @@ DROP FUNCTION IF EXISTS py_variance(DOUBLE);
14111435
| `runtime_version` | Yes | - | Python runtime version, such as `"3.10.12"` |
14121436
| `always_nullable` | No | `true` | Whether to always return nullable results |
14131437

1438+
`volatility` is only supported for scalar Python UDF and is not supported for Python UDAF.
1439+
14141440
#### runtime_version Description
14151441

14161442
- Must fill in **complete version number** of Python version, format is `x.x.x` or `x.x.xx`
@@ -2407,6 +2433,8 @@ CREATE TABLES FUNCTION py_split(STRING, STRING) ...;
24072433
| `runtime_version` | Yes | - | Python runtime version, such as `"3.10.12"` |
24082434
| `always_nullable` | No | `true` | Whether to always return nullable results |
24092435

2436+
`volatility` is only supported for scalar Python UDF and is not supported for Python UDTF.
2437+
24102438
#### runtime_version Description
24112439

24122440
- Must fill in **complete version number** of Python version, format is `x.x.x` or `x.x.xx`
@@ -2962,6 +2990,7 @@ PROPERTIES (
29622990
"type" = "PYTHON_UDF",
29632991
"symbol" = "evaluate",
29642992
"runtime_version" = "3.12.11", -- Must specify complete version number, matching Python 3.12.11
2993+
"volatility" = "immutable",
29652994
"always_nullable" = "true"
29662995
)
29672996
AS $$
@@ -3064,6 +3093,7 @@ PROPERTIES (
30643093
"type" = "PYTHON_UDF",
30653094
"symbol" = "evaluate",
30663095
"runtime_version" = "3.9.18", -- Must specify complete version number, matching Python 3.9.18
3096+
"volatility" = "immutable",
30673097
"always_nullable" = "true"
30683098
)
30693099
AS $$
@@ -3078,6 +3108,7 @@ PROPERTIES (
30783108
"type" = "PYTHON_UDF",
30793109
"symbol" = "evaluate",
30803110
"runtime_version" = "3.12.11", -- Must specify complete version number, matching Python 3.12.11
3111+
"volatility" = "immutable",
30813112
"always_nullable" = "true"
30823113
)
30833114
AS $$

docs/sql-manual/sql-statements/function/CREATE-FUNCTION.md

Lines changed: 94 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ CREATE [ GLOBAL ]
7575
> - `symbol`: Indicates the class name containing the UDF class. This parameter is mandatory.
7676
> - `type`: Indicates the UDF call type. The default is Native. Use JAVA_UDF when using a Java UDF.
7777
> - `always_nullable`: Indicates whether the UDF result may contain NULL values. This is an optional parameter with a default value of true.
78+
> - `volatility`: Indicates the volatility of a scalar Java UDF or scalar Python UDF. This is an optional parameter with a default value of `volatile`. Valid values are `immutable`, `stable`, and `volatile`. `immutable` means identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state. `stable` means identical inputs produce the same result within a single statement, but the result may change between statements; examples include `now()` and `current_timestamp()`. `volatile` means the function result may change for each call; examples include `uuid()` and `random()`. Correct marking allows the optimizer to handle query rewrites more safely; incorrect marking may lead to wrong query results. This property is not supported for UDAF, UDTF, RPC, or alias functions.
7879
7980
## Access Control Requirements
8081

@@ -91,11 +92,12 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
9192
"file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar",
9293
"symbol"="org.apache.doris.udf.AddOne",
9394
"always_nullable"="true",
94-
"type"="JAVA_UDF"
95+
"type"="JAVA_UDF",
96+
"volatility"="immutable"
9597
);
9698
```
9799

98-
2. Create a custom UDAF function.
100+
2. Create a custom UDAF function. The `volatility` property is not supported for UDAF.
99101

100102

101103

@@ -108,7 +110,7 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
108110
);
109111
```
110112

111-
3. Create a custom UDTF function.
113+
3. Create a custom UDTF function. The `volatility` property is not supported for UDTF.
112114

113115

114116

@@ -135,4 +137,92 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
135137

136138
```sql
137139
CREATE GLOBAL ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));
138-
```
140+
```
141+
142+
6. Create a volatile Python UDF. Functions such as `uuid.uuid4()` that depend on randomness should keep the default `volatility = volatile` and must not be incorrectly marked as `immutable`.
143+
144+
```sql
145+
CREATE TABLE cte_uuid_seed (id INT) ENGINE=OLAP DUPLICATE KEY(id)
146+
DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1");
147+
INSERT INTO cte_uuid_seed VALUES (1),(2),(3);
148+
149+
DROP FUNCTION IF EXISTS py_uuid_token(INT);
150+
CREATE FUNCTION py_uuid_token(INT)
151+
RETURNS STRING
152+
PROPERTIES (
153+
"type" = "PYTHON_UDF",
154+
"symbol" = "py_uuid_token_impl",
155+
"always_nullable" = "false",
156+
"runtime_version" = "3.12.11",
157+
"volatility" = "volatile"
158+
)
159+
AS $$
160+
import uuid
161+
def py_uuid_token_impl(x):
162+
return f"{x}-{uuid.uuid4()}"
163+
$$;
164+
165+
SET enable_cte_materialize = true;
166+
SET inline_cte_referenced_threshold = 10;
167+
168+
WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
169+
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
170+
FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u
171+
GROUP BY id ORDER BY id;
172+
```
173+
174+
Correct result:
175+
176+
```text
177+
+------+-----------------+
178+
| id | distinct_tokens |
179+
+------+-----------------+
180+
| 1 | 1 |
181+
| 2 | 1 |
182+
| 3 | 1 |
183+
+------+-----------------+
184+
```
185+
186+
For this function, the following definition is incorrect:
187+
188+
```sql
189+
DROP FUNCTION IF EXISTS py_uuid_token(INT);
190+
CREATE FUNCTION py_uuid_token(INT)
191+
RETURNS STRING
192+
PROPERTIES (
193+
"type" = "PYTHON_UDF",
194+
"symbol" = "py_uuid_token_impl",
195+
"always_nullable" = "false",
196+
"runtime_version" = "3.12.11",
197+
"volatility" = "immutable"
198+
)
199+
AS $$
200+
import uuid
201+
def py_uuid_token_impl(x):
202+
return f"{x}-{uuid.uuid4()}"
203+
$$;
204+
```
205+
206+
Run the same query again:
207+
208+
```sql
209+
WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
210+
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
211+
FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u
212+
GROUP BY id ORDER BY id;
213+
```
214+
215+
Incorrect result:
216+
217+
```text
218+
+------+-----------------+
219+
| id | distinct_tokens |
220+
+------+-----------------+
221+
| 1 | 2 |
222+
| 2 | 2 |
223+
| 3 | 2 |
224+
+------+-----------------+
225+
```
226+
227+
Why this is wrong:
228+
Because `py_uuid_token` is volatile, each call to `uuid.uuid4()` generates a new value. If the function is incorrectly marked as `volatility = immutable`, the optimizer may treat repeated references as safe to rewrite and may choose a plan that evaluates the UDF separately on both sides of `UNION ALL`. As a result, the same `id` can produce two different `token` values, and `COUNT(DISTINCT token)` changes from `1` to `2`.

0 commit comments

Comments
 (0)