| title | CREATE FUNCTION |
|---|---|
| language | en |
| description | This statement is used to create a custom function. |
This statement is used to create a custom function.
CREATE [ GLOBAL ]
[{AGGREGATE | TABLES | ALIAS }] FUNCTION <function_name>
(<arg_type> [, ...])
[ RETURNS <ret_type> ]
[ INTERMEDIATE <inter_type> ]
[ WITH PARAMETER(<param> [,...]) AS <origin_function> ]
[ PROPERTIES ("<key>" = "<value>" [, ...]) ]1. <function_name>
If
function_nameincludes a database name, such asdb1.my_func, the custom function will be created in the corresponding database. Otherwise, the function will be created in the database of the current session. The name and parameters of the new function must not be identical to an existing function in the current namespace; otherwise, the creation will fail.
2. <arg_type>
The input parameter type of the function. For variable-length parameters, use
, ...to indicate them. If it is a variable-length type, the type of the variable-length parameters must be consistent with the type of the last non-variable-length parameter.
3. <ret_type>
The return parameter type of the function. This is a required parameter for creating a new function. If creating an alias for an existing function, this parameter is not necessary.
1. GLOBAL
If specified, the created function is effective globally.
2. AGGREGATE
If specified, the created function is an aggregate function.
3. TABLES
If specified, the created function is a table function.
4. ALIAS
If specified, the created function is an alias function.
If none of the above parameters representing the function type is selected, it indicates that the created function is a scalar function.
5. <inter_type>
Used to indicate the data type during the intermediate stage of an aggregate function.
6. <param>
Used to indicate the parameters of an alias function, with at least one parameter required.
7. <origin_function>
Used to indicate the original function corresponding to the alias function.
8. <properties>
file: Indicates the JAR package containing the user-defined function (UDF). In a multi-machine environment, it can also be downloaded via HTTP. This parameter is mandatory.symbol: Indicates the class name containing the UDF class. This parameter is mandatory.type: Indicates the UDF call type. The default is Native. Use JAVA_UDF when using a Java UDF.always_nullable: Indicates whether the UDF result may contain NULL values. This is an optional parameter with a default value of true.volatility: Indicates the volatility of a scalar Java UDF or scalar Python UDF. This is an optional parameter with a default value ofvolatile. Valid values areimmutable,stable, andvolatile.immutablemeans identical inputs always produce identical outputs across statements, and the implementation does not depend on current time, random numbers, or external mutable state.stablemeans identical inputs produce the same result within a single statement, but the result may change between statements; examples includenow()andcurrent_timestamp().volatilemeans the function result may change for each call; examples includeuuid()andrandom(). Correct marking allows the optimizer to handle query rewrites more safely; incorrect marking may lead to wrong query results. This property is not supported for UDAF, UDTF, RPC, or alias functions.
To execute this command, the user must have ADMIN_PRIV privileges.
-
Create a custom UDF function. For more details, refer to JAVA-UDF.
CREATE FUNCTION java_udf_add_one(int) RETURNS int PROPERTIES ( "file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar", "symbol"="org.apache.doris.udf.AddOne", "always_nullable"="true", "type"="JAVA_UDF", "volatility"="immutable" );
-
Create a custom UDAF function. The
volatilityproperty is not supported for UDAF.CREATE AGGREGATE FUNCTION simple_sum(INT) RETURNS INT PROPERTIES ( "file"="file:///pathTo/java-udaf.jar", "symbol"="org.apache.doris.udf.demo.SimpleDemo", "always_nullable"="true", "type"="JAVA_UDF" );
-
Create a custom UDTF function. The
volatilityproperty is not supported for UDTF.CREATE TABLES FUNCTION java_udtf(string, string) RETURNS array<string> PROPERTIES ( "file"="file:///pathTo/java-udaf.jar", "symbol"="org.apache.doris.udf.demo.UDTFStringTest", "always_nullable"="true", "type"="JAVA_UDF" );
-
Create a custom alias function. For more information, refer to Alias Function.
CREATE ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));
-
Create a global custom alias function.
CREATE GLOBAL ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));
-
Create a volatile Python UDF. Functions such as
uuid.uuid4()that depend on randomness should keep the defaultvolatility = volatileand must not be incorrectly marked asimmutable.CREATE TABLE cte_uuid_seed (id INT) ENGINE=OLAP DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1"); INSERT INTO cte_uuid_seed VALUES (1),(2),(3); DROP FUNCTION IF EXISTS py_uuid_token(INT); CREATE FUNCTION py_uuid_token(INT) RETURNS STRING PROPERTIES ( "type" = "PYTHON_UDF", "symbol" = "py_uuid_token_impl", "always_nullable" = "false", "runtime_version" = "3.12.11", "volatility" = "volatile" ) AS $$ import uuid def py_uuid_token_impl(x): return f"{x}-{uuid.uuid4()}" $$; SET enable_cte_materialize = true; SET inline_cte_referenced_threshold = 10; WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed) SELECT id, COUNT(DISTINCT token) AS distinct_tokens FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u GROUP BY id ORDER BY id;
Correct result:
+------+-----------------+ | id | distinct_tokens | +------+-----------------+ | 1 | 1 | | 2 | 1 | | 3 | 1 | +------+-----------------+For this function, the following definition is incorrect:
DROP FUNCTION IF EXISTS py_uuid_token(INT); CREATE FUNCTION py_uuid_token(INT) RETURNS STRING PROPERTIES ( "type" = "PYTHON_UDF", "symbol" = "py_uuid_token_impl", "always_nullable" = "false", "runtime_version" = "3.12.11", "volatility" = "immutable" ) AS $$ import uuid def py_uuid_token_impl(x): return f"{x}-{uuid.uuid4()}" $$;
Run the same query again:
WITH cte AS (SELECT id, py_uuid_token(id) AS token FROM cte_uuid_seed) SELECT id, COUNT(DISTINCT token) AS distinct_tokens FROM (SELECT id, token FROM cte UNION ALL SELECT id, token FROM cte) u GROUP BY id ORDER BY id;
Incorrect result:
+------+-----------------+ | id | distinct_tokens | +------+-----------------+ | 1 | 2 | | 2 | 2 | | 3 | 2 | +------+-----------------+Why this is wrong: Because
py_uuid_tokenis volatile, each call touuid.uuid4()generates a new value. If the function is incorrectly marked asvolatility = immutable, the optimizer may treat repeated references as safe to rewrite and may choose a plan that evaluates the UDF separately on both sides ofUNION ALL. As a result, the sameidcan produce two differenttokenvalues, andCOUNT(DISTINCT token)changes from1to2.