doris-website/docs/sql-manual/sql-statements/function/CREATE-FUNCTION.md at f0c9f30540f5c5a6e111dfd94aec548bd74a9a4e · apache/doris-website

title	CREATE FUNCTION
language	en
description	This statement is used to create a custom function.

Description

This statement is used to create a custom function.

Syntax

CREATE [ GLOBAL ] 
    [{AGGREGATE | TABLES | ALIAS }] FUNCTION <function_name>
    (<arg_type> [, ...])
    [ RETURNS <ret_type> ]
    [ INTERMEDIATE <inter_type> ]
    [ WITH PARAMETER(<param> [,...]) AS <origin_function> ]
    [ PROPERTIES ("<key>" = "<value>" [, ...]) ]

Required Parameters

1. <function_name>

If function_name includes a database name, such as db1.my_func, the custom function will be created in the corresponding database. Otherwise, the function will be created in the database of the current session. The name and parameters of the new function must not be identical to an existing function in the current namespace; otherwise, the creation will fail.

2. <arg_type>

The input parameter type of the function. For variable-length parameters, use , ... to indicate them. If it is a variable-length type, the type of the variable-length parameters must be consistent with the type of the last non-variable-length parameter.

3. <ret_type>

The return parameter type of the function. This is a required parameter for creating a new function. If creating an alias for an existing function, this parameter is not necessary.

Optional Parameters

1. GLOBAL

If specified, the created function is effective globally.

2. AGGREGATE

If specified, the created function is an aggregate function.

3. TABLES

If specified, the created function is a table function.

4. ALIAS

If specified, the created function is an alias function.

If none of the above parameters representing the function type is selected, it indicates that the created function is a scalar function.

5. <inter_type>

Used to indicate the data type during the intermediate stage of an aggregate function.

6. <param>

Used to indicate the parameters of an alias function, with at least one parameter required.

7. <origin_function>

Used to indicate the original function corresponding to the alias function.

8. <properties>

file: Indicates the JAR package containing the user-defined function (UDF). In a multi-machine environment, it can also be downloaded via HTTP. This parameter is mandatory.

symbol: Indicates the class name containing the UDF class. This parameter is mandatory.

type: Indicates the UDF call type. The default is Native. Use JAVA_UDF when using a Java UDF.

always_nullable: Indicates whether the UDF result may contain NULL values. This is an optional parameter with a default value of true.

volatility: Indicates the volatility of a scalar Java UDF or scalar Python UDF. This is an optional parameter with a default value of volatile. Valid values are immutable, stable, and volatile. immutable means identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state. stable means identical inputs produce the same result within a single statement, but the result may change between statements; examples include now() and current_timestamp(). volatile means the function result may change for each call; examples include uuid() and random(). Correct marking allows the optimizer to handle query rewrites more safely; incorrect marking may lead to wrong query results. This property is not supported for UDAF, UDTF, RPC, or alias functions.

Access Control Requirements

To execute this command, the user must have ADMIN_PRIV privileges.

Example

Create a custom UDF function. For more details, refer to JAVA-UDF.

CREATE FUNCTION java_udf_add_one(int) RETURNS int PROPERTIES (
    "file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar",
    "symbol"="org.apache.doris.udf.AddOne",
    "always_nullable"="true",
    "type"="JAVA_UDF",
    "volatility"="immutable"
);

Create a custom UDAF function. The volatility property is not supported for UDAF.

CREATE AGGREGATE FUNCTION simple_sum(INT) RETURNS INT PROPERTIES (
    "file"="file:///pathTo/java-udaf.jar",
    "symbol"="org.apache.doris.udf.demo.SimpleDemo",
    "always_nullable"="true",
    "type"="JAVA_UDF"
);

Create a custom UDTF function. The volatility property is not supported for UDTF.

CREATE TABLES FUNCTION java_udtf(string, string) RETURNS array<string> PROPERTIES (
    "file"="file:///pathTo/java-udaf.jar",
    "symbol"="org.apache.doris.udf.demo.UDTFStringTest",
    "always_nullable"="true",
    "type"="JAVA_UDF"
);

Create a custom alias function. For more information, refer to Alias Function.

CREATE ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));

Create a global custom alias function.

CREATE GLOBAL ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));

Create a volatile Python UDF. Functions such as uuid.uuid4() that depend on randomness should keep the default volatility = volatile and must not be incorrectly marked as immutable.

CREATE TABLE cte_uuid_seed (id INT) ENGINE=OLAP DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1");
INSERT INTO cte_uuid_seed VALUES (1),(2),(3);

DROP FUNCTION IF EXISTS py_uuid_token(INT);
CREATE FUNCTION py_uuid_token(INT)
RETURNS STRING
PROPERTIES (
    "type" = "PYTHON_UDF",
    "symbol" = "py_uuid_token_impl",
    "always_nullable" = "false",
    "runtime_version" = "3.12.11",
    "volatility" = "volatile"
)
AS $$
import uuid
def py_uuid_token_impl(x):
    return f"{x}-{uuid.uuid4()}"
$$;

SET enable_cte_materialize = true;
SET inline_cte_referenced_threshold = 10;

WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u
GROUP BY id ORDER BY id;

Correct result:

+------+-----------------+
| id   | distinct_tokens |
+------+-----------------+
|    1 |               1 |
|    2 |               1 |
|    3 |               1 |
+------+-----------------+

For this function, the following definition is incorrect:

DROP FUNCTION IF EXISTS py_uuid_token(INT);
CREATE FUNCTION py_uuid_token(INT)
RETURNS STRING
PROPERTIES (
    "type" = "PYTHON_UDF",
    "symbol" = "py_uuid_token_impl",
    "always_nullable" = "false",
    "runtime_version" = "3.12.11",
    "volatility" = "immutable"
)
AS $$
import uuid
def py_uuid_token_impl(x):
    return f"{x}-{uuid.uuid4()}"
$$;

Run the same query again:

WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed)
SELECT id, COUNT(DISTINCT token) AS distinct_tokens
FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u
GROUP BY id ORDER BY id;

Incorrect result:

+------+-----------------+
| id   | distinct_tokens |
+------+-----------------+
|    1 |               2 |
|    2 |               2 |
|    3 |               2 |
+------+-----------------+

Why this is wrong: Because py_uuid_token is volatile, each call to uuid.uuid4() generates a new value. If the function is incorrectly marked as volatility = immutable, the optimizer may treat repeated references as safe to rewrite and may choose a plan that evaluates the UDF separately on both sides of UNION ALL. As a result, the same id can produce two different token values, and COUNT(DISTINCT token) changes from 1 to 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description

Syntax

Required Parameters

Optional Parameters

Access Control Requirements

Example

FilesExpand file tree

CREATE-FUNCTION.md

Latest commit

History

CREATE-FUNCTION.md

File metadata and controls

Description

Syntax

Required Parameters

Optional Parameters

Access Control Requirements

Example