Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 37 additions & 6 deletions docs/query-data/udf/python-user-defined-function.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "entry_function_name",
"runtime_version" = "python_version",
"volatility" = "immutable|stable|volatile",
"always_nullable" = "true|false"
)
AS $$
Expand All @@ -58,7 +59,8 @@ RETURNS INT
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12"
"runtime_version" = "3.10.12",
"volatility" = "immutable"
)
AS $$
def evaluate(a, b):
Expand All @@ -77,7 +79,8 @@ RETURNS STRING
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12"
"runtime_version" = "3.10.12",
"volatility" = "immutable"
)
AS $$
def evaluate(s1, s2):
Expand Down Expand Up @@ -247,6 +250,7 @@ PROPERTIES (
"file" = "file:///path/to/python_udf_scalar_ops.zip",
"symbol" = "python_udf_scalar_ops.add_three_numbers",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
);

Expand All @@ -257,6 +261,7 @@ PROPERTIES (
"file" = "file:///path/to/python_udf_scalar_ops.zip",
"symbol" = "python_udf_scalar_ops.reverse_string",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
);

Expand All @@ -267,6 +272,7 @@ PROPERTIES (
"file" = "file:///path/to/python_udf_scalar_ops.zip",
"symbol" = "python_udf_scalar_ops.is_prime",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
);
```
Expand All @@ -284,6 +290,7 @@ PROPERTIES (
"file" = "https://your-storage.com/udf/python_udf_scalar_ops.zip",
"symbol" = "python_udf_scalar_ops.add_three_numbers",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
);

Expand All @@ -294,6 +301,7 @@ PROPERTIES (
"file" = "https://your-storage.com/udf/python_udf_scalar_ops.zip",
"symbol" = "python_udf_scalar_ops.reverse_string",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
);

Expand All @@ -304,6 +312,7 @@ PROPERTIES (
"file" = "https://your-storage.com/udf/python_udf_scalar_ops.zip",
"symbol" = "python_udf_scalar_ops.is_prime",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
);
```
Expand All @@ -320,6 +329,7 @@ PROPERTIES (
"file" = "file:///path/to/my_udf.zip",
"symbol" = "my_udf.math_ops.multiply_by_two",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
);
```
Expand Down Expand Up @@ -362,6 +372,7 @@ DROP FUNCTION IF EXISTS py_is_prime(INT);
| `symbol` | Yes | - | Python function entry name.<br>• **Inline Mode**: Write function name directly, such as `"evaluate"`<br>• **Module Mode**: Format is `[package_name.]module_name.func_name`, see module mode description |
| `file` | No | - | Python `.zip` package path, only required for module mode. Supports three protocols:<br>• `file://` - Local filesystem path<br>• `http://` - HTTP remote download<br>• `https://` - HTTPS remote download |
| `runtime_version` | Yes | - | Python runtime version, such as `"3.10.12"`, requires complete version number |
| `volatility` | No | `volatile` | Volatility of the Python UDF. Valid values are `immutable`, `stable`, and `volatile`.<br>`immutable`: identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state.<br>`stable`: identical inputs produce the same result within a single statement, but the result may change between statements, such as `now()` and `current_timestamp()`.<br>`volatile`: the function result may change for each call, such as `uuid()` and `random()`.<br>Correctly marking this property allows the optimizer to handle rewrite and other optimization scenarios more safely; incorrect marking may cause wrong query results. |
| `always_nullable` | No | `true` | Whether to always return nullable results |

#### Runtime Version Description
Expand Down Expand Up @@ -427,6 +438,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand Down Expand Up @@ -473,6 +485,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "add",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -495,6 +508,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "to_upper",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -517,6 +531,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "sqrt",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand Down Expand Up @@ -586,6 +601,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -609,6 +625,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -633,6 +650,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -656,6 +674,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -680,6 +699,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12",
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -705,7 +725,8 @@ RETURNS STRING
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12"
"runtime_version" = "3.10.12",
"volatility" = "immutable"
)
AS $$
def evaluate(email):
Expand All @@ -731,7 +752,8 @@ RETURNS INT
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12"
"runtime_version" = "3.10.12",
"volatility" = "immutable"
)
AS $$
def evaluate(s1, s2):
Expand Down Expand Up @@ -768,7 +790,8 @@ RETURNS INT
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12"
"runtime_version" = "3.10.12",
"volatility" = "immutable"
)
AS $$
from datetime import datetime
Expand Down Expand Up @@ -797,7 +820,8 @@ RETURNS BOOLEAN
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.10.12"
"runtime_version" = "3.10.12",
"volatility" = "immutable"
)
AS $$
def evaluate(id_card):
Expand Down Expand Up @@ -1411,6 +1435,8 @@ DROP FUNCTION IF EXISTS py_variance(DOUBLE);
| `runtime_version` | Yes | - | Python runtime version, such as `"3.10.12"` |
| `always_nullable` | No | `true` | Whether to always return nullable results |

`volatility` is only supported for scalar Python UDF and is not supported for Python UDAF.

#### runtime_version Description

- Must fill in **complete version number** of Python version, format is `x.x.x` or `x.x.xx`
Expand Down Expand Up @@ -2407,6 +2433,8 @@ CREATE TABLES FUNCTION py_split(STRING, STRING) ...;
| `runtime_version` | Yes | - | Python runtime version, such as `"3.10.12"` |
| `always_nullable` | No | `true` | Whether to always return nullable results |

`volatility` is only supported for scalar Python UDF and is not supported for Python UDTF.

#### runtime_version Description

- Must fill in **complete version number** of Python version, format is `x.x.x` or `x.x.xx`
Expand Down Expand Up @@ -2962,6 +2990,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.12.11", -- Must specify complete version number, matching Python 3.12.11
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand Down Expand Up @@ -3064,6 +3093,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.9.18", -- Must specify complete version number, matching Python 3.9.18
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand All @@ -3078,6 +3108,7 @@ PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.12.11", -- Must specify complete version number, matching Python 3.12.11
"volatility" = "immutable",
"always_nullable" = "true"
)
AS $$
Expand Down
98 changes: 94 additions & 4 deletions docs/sql-manual/sql-statements/function/CREATE-FUNCTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ CREATE [ GLOBAL ]
> - `symbol`: Indicates the class name containing the UDF class. This parameter is mandatory.
> - `type`: Indicates the UDF call type. The default is Native. Use JAVA_UDF when using a Java UDF.
> - `always_nullable`: Indicates whether the UDF result may contain NULL values. This is an optional parameter with a default value of true.
> - `volatility`: Indicates the volatility of a scalar Java UDF or scalar Python UDF. This is an optional parameter with a default value of `volatile`. Valid values are `immutable`, `stable`, and `volatile`. `immutable` means identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state. `stable` means identical inputs produce the same result within a single statement, but the result may change between statements; examples include `now()` and `current_timestamp()`. `volatile` means the function result may change for each call; examples include `uuid()` and `random()`. Correct marking allows the optimizer to handle query rewrites more safely; incorrect marking may lead to wrong query results. This property is not supported for UDAF, UDTF, RPC, or alias functions.

## Access Control Requirements

Expand All @@ -91,11 +92,12 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
"file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar",
"symbol"="org.apache.doris.udf.AddOne",
"always_nullable"="true",
"type"="JAVA_UDF"
"type"="JAVA_UDF",
"volatility"="immutable"
);
```

2. Create a custom UDAF function.
2. Create a custom UDAF function. The `volatility` property is not supported for UDAF.



Expand All @@ -108,7 +110,7 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.
);
```

3. Create a custom UDTF function.
3. Create a custom UDTF function. The `volatility` property is not supported for UDTF.



Expand All @@ -135,4 +137,92 @@ To execute this command, the user must have `ADMIN_PRIV` privileges.

```sql
CREATE GLOBAL ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));
```
```

6. Create a volatile Python UDF. Functions such as `uuid.uuid4()` that depend on randomness should keep the default `volatility = volatile` and must not be incorrectly marked as `immutable`.

```sql
CREATE TABLE cte_uuid_seed (id INT) ENGINE=OLAP DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1");
INSERT INTO cte_uuid_seed VALUES (1),(2),(3);

DROP FUNCTION IF EXISTS py_uuid_token(INT);
CREATE FUNCTION py_uuid_token(INT)
RETURNS STRING
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "py_uuid_token_impl",
"always_nullable" = "false",
"runtime_version" = "3.12.11",
"volatility" = "volatile"
)
AS $$
import uuid
def py_uuid_token_impl(x):
return f"{x}-{uuid.uuid4()}"
$$;

SET enable_cte_materialize = true;
SET inline_cte_referenced_threshold = 10;

WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u
GROUP BY id ORDER BY id;
```

Correct result:

```text
+------+-----------------+
| id | distinct_tokens |
+------+-----------------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
+------+-----------------+
```

For this function, the following definition is incorrect:

```sql
DROP FUNCTION IF EXISTS py_uuid_token(INT);
CREATE FUNCTION py_uuid_token(INT)
RETURNS STRING
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "py_uuid_token_impl",
"always_nullable" = "false",
"runtime_version" = "3.12.11",
"volatility" = "immutable"
)
AS $$
import uuid
def py_uuid_token_impl(x):
return f"{x}-{uuid.uuid4()}"
$$;
```

Run the same query again:

```sql
WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u
GROUP BY id ORDER BY id;
```

Incorrect result:

```text
+------+-----------------+
| id | distinct_tokens |
+------+-----------------+
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
+------+-----------------+
```

Why this is wrong:
Because `py_uuid_token` is volatile, each call to `uuid.uuid4()` generates a new value. If the function is incorrectly marked as `volatility = immutable`, the optimizer may treat repeated references as safe to rewrite and may choose a plan that evaluates the UDF separately on both sides of `UNION ALL`. As a result, the same `id` can produce two different `token` values, and `COUNT(DISTINCT token)` changes from `1` to `2`.
Loading