diff --git a/docs/chdb/api/python.md b/docs/chdb/api/python.md index 78e61ad21a6..b67e9663be4 100644 --- a/docs/chdb/api/python.md +++ b/docs/chdb/api/python.md @@ -32,7 +32,7 @@ chdb.query(sql, output_format='CSV', path='', udf_path='') | `sql` | str | *required* | SQL query string to execute | | `output_format` | str | `"CSV"` | Output format for results. Supported formats:
• `"CSV"` - Comma-separated values
• `"JSON"` - JSON format
• `"Arrow"` - Apache Arrow format
• `"Parquet"` - Parquet format
• `"DataFrame"` - Pandas DataFrame
• `"ArrowTable"` - PyArrow Table
• `"Debug"` - Enable verbose logging | | `path` | str | `""` | Database file path. Defaults to in-memory database.
Can be a file path or `":memory:"` for in-memory database | -| `udf_path` | str | `""` | Path to User-Defined Functions directory | +| `udf_path` | str | `""` | Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs ([`@func`](#func-decorator) / [`create_function`](#create-function)) | **Returns** @@ -74,11 +74,6 @@ Returns the query result in the specified format: >>> result = chdb.query("CREATE TABLE test (id INT) ENGINE = Memory", path="mydb.chdb") ``` -```pycon ->>> # Query with UDF ->>> result = chdb.query("SELECT my_udf('test')", udf_path="/path/to/udfs") -``` - --- ### `chdb.sql` {#chdb_sql} @@ -102,7 +97,7 @@ chdb.sql(sql, output_format='CSV', path='', udf_path='') | `sql` | str | *required* | SQL query string to execute | | `output_format` | str | `"CSV"` | Output format for results. Supported formats:
• `"CSV"` - Comma-separated values
• `"JSON"` - JSON format
• `"Arrow"` - Apache Arrow format
• `"Parquet"` - Parquet format
• `"DataFrame"` - Pandas DataFrame
• `"ArrowTable"` - PyArrow Table
• `"Debug"` - Enable verbose logging | | `path` | str | `""` | Database file path. Defaults to in-memory database.
Can be a file path or `":memory:"` for in-memory database | -| `udf_path` | str | `""` | Path to User-Defined Functions directory | +| `udf_path` | str | `""` | Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs ([`@func`](#func-decorator) / [`create_function`](#create-function)) | **Returns** @@ -144,11 +139,6 @@ Returns the query result in the specified format: >>> result = chdb.query("CREATE TABLE test (id INT) ENGINE = Memory", path="mydb.chdb") ``` -```pycon ->>> # Query with UDF ->>> result = chdb.query("SELECT my_udf('test')", udf_path="/path/to/udfs") -``` - --- ### `chdb.to_arrowTable` {#chdb-state-sqlitelike-to_arrowtable} @@ -478,7 +468,7 @@ query(sql, fmt='CSV', udf_path='') |------------|-------|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `sql` | str | *required* | SQL query string to execute | | `fmt` | str | `"CSV"` | Output format for results. Available formats:
• `"CSV"` - Comma-separated values
• `"JSON"` - JSON format
• `"TabSeparated"` - Tab-separated values
• `"Pretty"` - Pretty-printed table format
• `"JSONCompact"` - Compact JSON format
• `"Arrow"` - Apache Arrow format
• `"Parquet"` - Parquet format | -| `udf_path` | str | `""` | Path to user-defined functions. If not specified, uses the UDF path from session initialization | +| `udf_path` | str | `""` | Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs ([`@func`](#func-decorator) / [`create_function`](#create-function)). If not specified, uses the UDF path from session initialization | **Returns** @@ -636,7 +626,7 @@ sql(sql, fmt='CSV', udf_path='') |------------|-------|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `sql` | str | *required* | SQL query string to execute | | `fmt` | str | `"CSV"` | Output format for results. Available formats:
• `"CSV"` - Comma-separated values
• `"JSON"` - JSON format
• `"TabSeparated"` - Tab-separated values
• `"Pretty"` - Pretty-printed table format
• `"JSONCompact"` - Compact JSON format
• `"Arrow"` - Apache Arrow format
• `"Parquet"` - Parquet format | -| `udf_path` | str | `""` | Path to user-defined functions. If not specified, uses the UDF path from session initialization | +| `udf_path` | str | `""` | Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs ([`@func`](#func-decorator) / [`create_function`](#create-function)). If not specified, uses the UDF path from session initialization | **Returns** @@ -3262,15 +3252,263 @@ with dbapi.connect("test.chdb") as conn: - Parameter binding syntax follows format style: `%s` ::: -## User-Defined Functions (UDF) {#user-defined-functions} +## Python UDF (User-Defined Functions) {#user-defined-functions} + +chDB supports native Python UDFs that run in-process with full type safety, automatic type inference, and configurable NULL/exception handling. Python functions registered as UDFs can be called directly from SQL queries. + +### `chdb.create_function` {#create-function} + +Register a Python function as a chDB SQL function. + +**Syntax** + +```python +chdb.create_function(name, func, arg_types=None, return_type=None, *, on_null=None, on_error=None) +``` + +**Parameters** + +| Parameter | Type | Default | Description | +|---------------|-----------------|---------------|-----------------------------------------------------------------------------| +| `name` | str | *(required)* | Name of the SQL function to register | +| `func` | callable | *(required)* | Python function to register | +| `arg_types` | list or None | `None` | List of argument types. If `None`, inferred from type annotations | +| `return_type` | type or None | `None` | Return type. If `None`, inferred from the function’s return annotation | +| `on_null` | str or NullHandling | `None` (skip) | How to handle NULL inputs: `"skip"` or `"pass"`. Keyword-only | +| `on_error` | str or ExceptionHandling | `None` (propagate) | How to handle exceptions: `"propagate"` or `"ignore"`. Keyword-only | + +Each type parameter (`arg_types` elements and `return_type`) accepts: + +- A `ChdbType` constant: `INT64`, `STRING`, `FLOAT64`, etc. +- A ClickHouse type string: `"Int64"`, `"String"`, `"DateTime64(6)"`, `"DateTime(‘UTC’)"`, etc. + +**Example** + +```python +from chdb import create_function, drop_function, query +from chdb.sqltypes import INT64, STRING + +create_function("strlen", len, arg_types=[STRING], return_type=INT64) +print(query("SELECT strlen(‘hello’)")) # 5 + +drop_function("strlen") +``` + +--- + +### `chdb.drop_function` {#drop-function} + +Remove a previously registered Python UDF. + +**Syntax** + +```python +chdb.drop_function(name) +``` + +**Parameters** + +| Parameter | Type | Description | +|-----------|------|--------------------------------------| +| `name` | str | Name of the SQL function to remove | + +--- + +### `@func` Decorator {#func-decorator} + +Decorator to register a Python function as a chDB SQL function. The function remains callable as normal Python and is simultaneously available in SQL queries by its `__name__`. + +**Syntax** + +```python +from chdb import func + +@func(arg_types=None, return_type=None, *, on_null=None, on_error=None) +def my_function(...): + ... +``` + +**Parameters** + +Same as [`create_function`](#create-function) (excluding `name` and `func`, which are derived from the decorated function). + +**Examples** + +```python +from chdb import func, query +from chdb.sqltypes import INT64, STRING + +# Explicit types +@func([INT64, INT64], INT64) +def add(a, b): + return a + b + +# Types inferred from annotations +@func() +def multiply(a: int, b: int) -> int: + return a * b + +# Mixed: explicit return_type, inferred arg_types +@func(return_type=STRING) +def greet(name: str): + return f"Hello, {name}!" + +print(query("SELECT add(12, 22)")) # 34 +print(query("SELECT multiply(3, 7)")) # 21 +print(query("SELECT greet(‘world’)")) # Hello, world! +``` + +--- + +### Type System {#udf-type-system} + +#### Available Types {#udf-available-types} + +Import types from `chdb.sqltypes`: + +```python +from chdb.sqltypes import ( + BOOL, + INT8, INT16, INT32, INT64, INT128, INT256, + UINT8, UINT16, UINT32, UINT64, UINT128, UINT256, + FLOAT32, FLOAT64, + STRING, + DATE, DATE32, DATETIME, DATETIME64, +) +``` + +#### Automatic Type Mapping {#udf-automatic-type-mapping} + +When types are inferred from Python annotations, the following mapping is used: + +| Python Type | ClickHouse Type | +|----------------------|------------------| +| `bool` | `Bool` | +| `int` | `Int64` | +| `float` | `Float64` | +| `str` | `String` | +| `bytes` | `String` | +| `bytearray` | `String` | +| `datetime.date` | `Date` | +| `datetime.datetime` | `DateTime64(6)` | + +#### Type Specification Methods {#udf-type-specification-methods} + +Types can be specified in multiple ways: + +```python +from chdb import create_function +from chdb.sqltypes import INT64 + +# 1. ChdbType constants +create_function("f1", lambda x: x, arg_types=[INT64], return_type=INT64) + +# 2. ClickHouse type strings +create_function("f2", lambda x: x, arg_types=["Int64"], return_type="Int64") + +# 3. Parameterized type strings +create_function("f3", lambda x: x, arg_types=["DateTime(‘UTC’)"], return_type="DateTime(‘UTC’)") + +# 4. Python types (inferred from annotations) +@func() +def f4(x: int) -> int: + return x +``` + +--- + +### NULL Handling {#udf-null-handling} + +Control how NULL values are handled with the `on_null` parameter. + +| Value | Enum | Behavior | +|----------|-------------------------|-------------------------------------------------------| +| `"skip"` | `NullHandling.SKIP` | Return NULL without calling the function (default) | +| `"pass"` | `NullHandling.PASS` | Convert NULL to `None` and call the function | + +```python +from chdb import func, query, NullHandling + +# Default: NULL in → NULL out, function not called +@func(return_type="Int64") +def add_one(x: int) -> int: + return x + 1 + +print(query("SELECT add_one(NULL)")) # NULL -User-defined functions module for chDB. +# Pass NULL as None +@func(return_type="Int64", on_null="pass") +def null_safe(x): + return 0 if x is None else x + 1 -This module provides functionality for creating and managing user-defined functions (UDFs) -in chDB. It allows you to extend chDB’s capabilities by writing custom Python functions -that can be called from SQL queries. +print(query("SELECT null_safe(NULL)")) # 0 +``` + +--- + +### Exception Handling {#udf-exception-handling} + +Control how exceptions are handled with the `on_error` parameter. + +| Value | Enum | Behavior | +|---------------|--------------------------------|---------------------------------------------| +| `"propagate"` | `ExceptionHandling.PROPAGATE` | Raise the exception as a SQL error (default) | +| `"ignore"` | `ExceptionHandling.IGNORE` | Catch the exception and return NULL | + +```python +from chdb import func, query + +# Default: exception propagates +@func(arg_types=["Int64", "Int64"], return_type="Int64") +def divide(a, b): + return a // b + +print(query("SELECT divide(1, 0)")) # Error: division by zero + +# Ignore: exception → NULL +@func(arg_types=["Int64", "Int64"], return_type="Int64", on_error="ignore") +def safe_divide(a, b): + return a // b + +print(query("SELECT safe_divide(1, 0)")) # NULL +print(query("SELECT safe_divide(10, 2)")) # 5 +``` + +--- + +### DateTime and Timezone Support {#udf-datetime} + +UDFs fully support `Date`, `Date32`, `DateTime`, and `DateTime64` types with timezone awareness. + +```python +from chdb import func, query +from datetime import datetime, timedelta, date + +@func(arg_types=["DateTime(‘UTC’)"], return_type="DateTime(‘UTC’)") +def add_one_hour(dt): + return dt + timedelta(hours=1) + +@func() +def get_year(d: date) -> int: + return d.year + +print(query("SELECT add_one_hour(toDateTime('2024-01-01 12:00:00', 'UTC'))")) # 2024-01-01 13:00:00 +print(query("SELECT get_year(toDate('2024-06-15'))")) # 2024 +``` + +- Input ClickHouse `DateTime`/`DateTime64` values are converted to Python `datetime` objects with timezone info +- Output Python `datetime` objects preserve timezone info when returned to ClickHouse +- The `DATETIME64` type from `chdb.sqltypes` defaults to scale 6 (microseconds), equivalent to `DateTime64(6)` + +--- + +### Legacy API {#legacy-udf} + +:::warning Deprecated +The `@chdb_udf` decorator is the legacy subprocess-based UDF mechanism. It is still available but the native Python UDF API ([`@func`](#func-decorator) / [`create_function`](#create-function)) is recommended for its easier and more user-friendly API. See the [Python UDF Guide](/chdb/guides/python-udf) for the recommended approach. +::: -### `chdb.udf.chdb_udf` {#chdb-udf} +#### `chdb.udf.chdb_udf` {#chdb-udf} Decorator for chDB Python UDF(User Defined Function). @@ -3310,7 +3548,7 @@ def func_use_json(arg): --- -### `chdb.udf.generate_udf` {#generate-udf} +#### `chdb.udf.generate_udf` {#generate-udf} Generate UDF configuration and executable script files. diff --git a/docs/chdb/guides/python-udf.md b/docs/chdb/guides/python-udf.md new file mode 100644 index 00000000000..159ccdf1c89 --- /dev/null +++ b/docs/chdb/guides/python-udf.md @@ -0,0 +1,337 @@ +--- +title: Python User-Defined Functions (UDF) +sidebar_label: Python UDF +slug: /chdb/guides/python-udf +description: Create native Python UDFs in chDB with full type safety, NULL handling, and exception control. +keywords: [chdb, udf, python, user-defined function] +--- + +# Python User-Defined Functions (UDF) + +chDB allows you to register Python functions as SQL-callable UDFs. These run natively in-process — no subprocess spawning, no serialization overhead. Functions are type-safe, support automatic type inference from Python annotations, and offer configurable NULL and exception handling. + +## Quick Start {#quick-start} + +```python +from chdb import query, func +from chdb.sqltypes import INT64 + +@func([INT64, INT64], INT64) +def add(a, b): + return a + b + +result = query("SELECT add(2, 3)") +print(result) # 5 +``` + +## Registration Methods {#registration-methods} + +### `@func` Decorator {#func-decorator} + +The simplest way to register a UDF. The function's `__name__` becomes the SQL function name. + +```python +from chdb import func +from chdb.sqltypes import INT64, STRING + +# Explicit types +@func([INT64, INT64], INT64) +def add(a, b): + return a + b + +# Types inferred from annotations +@func() +def multiply(a: int, b: int) -> int: + return a * b + +# Partial: explicit return_type, arg_types inferred +@func(return_type=STRING) +def greet(name: str): + return f"Hello, {name}!" +``` + +The decorated function remains callable as normal Python: + +```python +add(2, 3) # 5 (Python call) +query("SELECT add(2, 3)") # 5 (SQL call) +``` + +### `create_function` {#create-function} + +Register any callable (lambda, function, method) with an explicit name: + +```python +from chdb import create_function, query +from chdb.sqltypes import INT64, STRING + +create_function("strlen", len, arg_types=[STRING], return_type=INT64) +query("SELECT strlen('hello')") # 5 + +create_function("double", lambda x: x * 2, arg_types=[INT64], return_type=INT64) +query("SELECT double(21)") # 42 +``` + +### `drop_function` {#drop-function} + +Remove a registered UDF: + +```python +from chdb import drop_function + +drop_function("strlen") +# query("SELECT strlen('hello')") # Error: function not found +``` + +## Type System {#type-system} + +### Available Types {#available-types} + +All types are importable from `chdb.sqltypes`: + +```python +from chdb.sqltypes import ( + # Boolean + BOOL, + # Signed integers + INT8, INT16, INT32, INT64, INT128, INT256, + # Unsigned integers + UINT8, UINT16, UINT32, UINT64, UINT128, UINT256, + # Floating point + FLOAT32, FLOAT64, + # String + STRING, + # Date and time + DATE, DATE32, DATETIME, DATETIME64, +) +``` + +### Specifying Types {#specifying-types} + +Types can be provided in four ways: + +| Method | Example | Description | +|--------|---------|-------------| +| `ChdbType` constant | `INT64`, `STRING` | Imported from `chdb.sqltypes` | +| ClickHouse type string | `"Int64"`, `"String"` | Standard ClickHouse type names | +| Parameterized string | `"DateTime('UTC')"`, `"DateTime64(6)"` | For types with parameters | +| Python annotation | `int`, `str`, `float` | Used via type hints on function signature | + +```python +from chdb import create_function, func +from chdb.sqltypes import INT64 + +# All equivalent: +create_function("f1", lambda x: x * 2, arg_types=[INT64], return_type=INT64) +create_function("f2", lambda x: x * 2, arg_types=["Int64"], return_type="Int64") + +@func() +def f3(x: int) -> int: + return x * 2 +``` + +### Automatic Type Inference {#automatic-type-inference} + +When `arg_types` or `return_type` is omitted, chDB infers types from Python type annotations: + +| Python Type | ClickHouse Type | +|-------------|-----------------| +| `bool` | `Bool` | +| `int` | `Int64` | +| `float` | `Float64` | +| `str` | `String` | +| `bytes` | `String` | +| `bytearray` | `String` | +| `datetime.date` | `Date` | +| `datetime.datetime` | `DateTime64(6)` | + +```python +@func() +def process(name: str, age: int) -> str: + return f"{name} is {age} years old" + +# Equivalent to: +# @func([STRING, INT64], STRING) +``` + +:::note +If `arg_types` is provided explicitly, it must cover **all** parameters — partial explicit + partial inferred is not supported. This applies to both `create_function` and the `@func` decorator: either specify types for all parameters, or omit them entirely and let chDB infer from annotations. +::: + +## NULL Handling {#null-handling} + +The `on_null` parameter controls behavior when any input argument is NULL. + +| Value | Behavior | +|-------|----------| +| `"skip"` (default) | Return NULL immediately without calling the function | +| `"pass"` | Convert NULL to Python `None` and call the function normally | + +You can also use the enum: `chdb.NullHandling.SKIP` / `chdb.NullHandling.PASS`. + +### Example: Default (skip) {#null-skip} + +```python +@func(return_type="Int64") +def increment(x: int) -> int: + return x + 1 + +query("SELECT increment(NULL)") # NULL +query("SELECT increment(5)") # 6 +``` + +### Example: Pass NULL as None {#null-pass} + +```python +@func(return_type="Int64", on_null="pass") +def null_to_zero(x): + return 0 if x is None else x + 1 + +query("SELECT null_to_zero(NULL)") # 0 +query("SELECT null_to_zero(5)") # 6 +``` + +### With Multiple Arguments {#null-multiple-args} + +```python +@func(arg_types=["Int64", "Int64"], return_type="Int64", on_null="pass") +def add_or_zero(a, b): + return (a or 0) + (b or 0) + +query("SELECT add_or_zero(NULL, 5)") # 5 +query("SELECT add_or_zero(NULL, NULL)") # 0 +query("SELECT add_or_zero(3, 7)") # 10 +``` + +## Exception Handling {#exception-handling} + +The `on_error` parameter controls behavior when the Python function raises an exception. + +| Value | Behavior | +|-------|----------| +| `"propagate"` (default) | Raise the exception as a SQL error | +| `"ignore"` | Catch the exception and return NULL for that row | + +You can also use the enum: `chdb.ExceptionHandling.PROPAGATE` / `chdb.ExceptionHandling.IGNORE`. + +### Example: Default (propagate) {#exception-propagate} + +```python +@func(arg_types=["Int64", "Int64"], return_type="Int64") +def divide(a, b): + return a // b + +query("SELECT divide(10, 2)") # 5 +query("SELECT divide(1, 0)") # Error: ZeroDivisionError +``` + +### Example: Ignore errors {#exception-ignore} + +```python +@func(arg_types=["Int64", "Int64"], return_type="Int64", on_error="ignore") +def safe_divide(a, b): + return a // b + +query("SELECT safe_divide(10, 2)") # 5 +query("SELECT safe_divide(1, 0)") # NULL +``` + +## Combining NULL and Exception Handling {#combining-null-and-exception} + +The `on_null` and `on_error` options can be combined: + +| on_null | on_error | NULL input | Exception | +|---------|----------|------------|-----------| +| `"skip"` | `"propagate"` | Return NULL | Raise error | +| `"skip"` | `"ignore"` | Return NULL | Return NULL | +| `"pass"` | `"propagate"` | Call with `None` | Raise error | +| `"pass"` | `"ignore"` | Call with `None` | Return NULL | + +```python +@func( + arg_types=["Int64", "Int64"], + return_type="Int64", + on_null="pass", + on_error="ignore", +) +def robust_divide(a, b): + if a is None or b is None: + return -1 + return a // b + +query("SELECT robust_divide(10, 2)") # 5 +query("SELECT robust_divide(NULL, 2)") # -1 +query("SELECT robust_divide(1, 0)") # NULL (exception caught) +``` + +## DateTime and Timezone Support {#datetime-and-timezone} + +UDFs fully support date and time types with timezone awareness. + +### Date Types {#date-types} + +```python +from datetime import date, timedelta + +@func() +def next_day(d: date) -> date: + return d + timedelta(days=1) + +@func() +def get_year(d: date) -> int: + return d.year + +query("SELECT next_day(toDate('2024-06-15'))") # 2024-06-16 +query("SELECT get_year(toDate('2024-06-15'))") # 2024 +``` + +### DateTime with Timezones {#datetime-with-timezones} + +```python +from datetime import timedelta + +@func(arg_types=["DateTime('UTC')"], return_type="DateTime('UTC')") +def add_one_hour(dt): + return dt + timedelta(hours=1) + +query("SELECT add_one_hour(toDateTime('2024-01-01 12:00:00', 'UTC'))") # 2024-01-01 13:00:00 +``` + +### DateTime64 (High Precision) {#datetime64} + +`DATETIME64` defaults to scale 6 (microseconds): + +```python +from datetime import timedelta + +@func(arg_types=["DateTime64(6, 'UTC')"], return_type="DateTime64(6, 'UTC')") +def add_microsecond(dt): + return dt + timedelta(microseconds=1) + +query("SELECT add_microsecond(toDateTime64('2024-01-01 12:00:00.000000', 6, 'UTC'))") # 2024-01-01 12:00:00.000001 +``` + +:::note +- Input `DateTime`/`DateTime64` values carry timezone info from ClickHouse +- Output `datetime` objects preserve timezone info +- Timezone conversion is handled automatically +::: + +## Using UDFs with Sessions {#using-udfs-with-sessions} + +UDFs are registered globally and available across all sessions in the same process: + +```python +from chdb import session as chs, func +from chdb.sqltypes import INT64 + +@func([INT64], INT64) +def double(x): + return x * 2 + +sess = chs.Session() +sess.query("CREATE TABLE t (x Int64) ENGINE = Memory") +sess.query("INSERT INTO t VALUES (1), (2), (3)") +result = sess.query("SELECT double(x) FROM t ORDER BY x", "CSV") +print(result) # 2, 4, 6 +``` diff --git a/docs/chdb/install/python.md b/docs/chdb/install/python.md index 1ae5ef533ab..7a6af98d17b 100644 --- a/docs/chdb/install/python.md +++ b/docs/chdb/install/python.md @@ -327,114 +327,23 @@ cursor.executemany( ) ``` -### User defined functions (UDF) {#user-defined-functions} +### Python UDF (User-Defined Functions) {#user-defined-functions} -Extend SQL with custom Python functions: - -#### Basic UDF usage {#basic-udf-usage} +chDB supports native in-process Python UDFs with full type safety, automatic type inference, and configurable NULL/exception handling. ```python -from chdb.udf import chdb_udf -from chdb import query - -# Simple mathematical function -@chdb_udf() -def add_numbers(a, b): - return int(a) + int(b) - -# String processing function -@chdb_udf() -def reverse_string(text): - return text[::-1] - -# JSON processing function -@chdb_udf() -def extract_json_field(json_str, field): - import json - try: - data = json.loads(json_str) - return str(data.get(field, '')) - except: - return '' - -# Use UDFs in queries -result = query(""" - SELECT - add_numbers('10', '20') as sum_result, - reverse_string('hello') as reversed, - extract_json_field('{"name": "John", "age": 30}', 'name') as name -""") -print(result) -``` +from chdb import query, func +from chdb.sqltypes import INT64 -#### Advanced UDF with custom return types {#advanced-udf-custom-return-types} +@func([INT64, INT64], INT64) +def add(a, b): + return a + b -```python -# UDF with specific return type -@chdb_udf(return_type="Float64") -def calculate_bmi(height_str, weight_str): - height = float(height_str) / 100 # Convert cm to meters - weight = float(weight_str) - return weight / (height * height) - -# UDF for data validation -@chdb_udf(return_type="UInt8") -def is_valid_email(email): - import re - pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' - return 1 if re.match(pattern, email) else 0 - -# Use in complex queries -result = query(""" - SELECT - name, - calculate_bmi(height, weight) as bmi, - is_valid_email(email) as has_valid_email - FROM ( - SELECT - 'John' as name, '180' as height, '75' as weight, 'john@example.com' as email - UNION ALL - SELECT - 'Jane' as name, '165' as height, '60' as weight, 'invalid-email' as email - ) -""", "Pretty") -print(result) +result = query("SELECT add(2, 3)") +print(result) # 5 ``` -#### UDF best practices {#udf-best-practices} - -1. **Stateless Functions**: UDFs should be pure functions without side effects -2. **Import Inside Functions**: All required modules must be imported within the UDF -3. **String Input/Output**: All UDF parameters are strings (TabSeparated format) -4. **Error Handling**: Include try-catch blocks for robust UDFs -5. **Performance**: UDFs are called for each row, so optimize for performance - -```python -# Well-structured UDF with error handling -@chdb_udf(return_type="String") -def safe_json_extract(json_str, path): - import json - try: - data = json.loads(json_str) - keys = path.split('.') - result = data - for key in keys: - if isinstance(result, dict) and key in result: - result = result[key] - else: - return 'null' - return str(result) - except Exception as e: - return f'error: {str(e)}' - -# Use with complex nested JSON -query(""" - SELECT safe_json_extract( - '{"user": {"profile": {"name": "Alice", "age": 25}}}', - 'user.profile.name' - ) as extracted_name -""") -``` +For the complete guide covering registration methods, type system, NULL handling, exception handling, and DateTime support, see the [Python UDF Guide](/chdb/guides/python-udf). For the full API reference, see the [Python UDF API reference](/chdb/api/python#user-defined-functions). The older `@chdb_udf` decorator is still available but superseded by this API — see [Legacy API](/chdb/api/python#legacy-udf). ### Streaming query processing {#streaming-queries} diff --git a/sidebars.js b/sidebars.js index df021f2bd44..656548e7bcc 100644 --- a/sidebars.js +++ b/sidebars.js @@ -1706,6 +1706,7 @@ const sidebars = { 'chdb/guides/querying-parquet', 'chdb/guides/query-remote-clickhouse', 'chdb/guides/clickhouse-local', + 'chdb/guides/python-udf', ], }, {