Skip to content

Latest commit

 

History

History
3918 lines (2776 loc) · 124 KB

File metadata and controls

3918 lines (2776 loc) · 124 KB
title chDB Python API reference
sidebar_label Python API
slug /chdb/api/python
description Complete Python API reference for chDB
keywords
chdb
embedded
clickhouse-lite
python
api
reference
doc_type reference

Python API Reference

Core Query Functions {#core-query-functions}

chdb.query {#chdb-query}

Execute SQL query using chDB engine.

This is the main query function that executes SQL statements using the embedded ClickHouse engine. Supports various output formats and can work with in-memory or file-based databases.

Syntax

chdb.query(sql, output_format='CSV', path='', udf_path='')

Parameters

Parameter Type Default Description
sql str required SQL query string to execute
output_format str "CSV" Output format for results. Supported formats:
"CSV" - Comma-separated values
"JSON" - JSON format
"Arrow" - Apache Arrow format
"Parquet" - Parquet format
"DataFrame" - Pandas DataFrame
"ArrowTable" - PyArrow Table
"Debug" - Enable verbose logging
path str "" Database file path. Defaults to in-memory database.
Can be a file path or ":memory:" for in-memory database
udf_path str "" Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs (@func / create_function)

Returns

Returns the query result in the specified format:

Return Type Condition
str For text formats like CSV, JSON
pd.DataFrame When output_format is "DataFrame" or "dataframe"
pa.Table When output_format is "ArrowTable" or "arrowtable"
chdb result object For other formats

Raises

Exception Condition
ChdbError If the SQL query execution fails
ImportError If required dependencies are missing for DataFrame/Arrow formats

Examples

>>> # Basic CSV query
>>> result = chdb.query("SELECT 1, 'hello'")
>>> print(result)
"1,hello"
>>> # Query with DataFrame output
>>> df = chdb.query("SELECT 1 as id, 'hello' as msg", "DataFrame")
>>> print(df)
   id    msg
0   1  hello
>>> # Query with file-based database
>>> result = chdb.query("CREATE TABLE test (id INT) ENGINE = Memory", path="mydb.chdb")

chdb.sql {#chdb_sql}

Execute SQL query using chDB engine.

This is the main query function that executes SQL statements using the embedded ClickHouse engine. Supports various output formats and can work with in-memory or file-based databases.

Syntax

chdb.sql(sql, output_format='CSV', path='', udf_path='')

Parameters

Parameter Type Default Description
sql str required SQL query string to execute
output_format str "CSV" Output format for results. Supported formats:
"CSV" - Comma-separated values
"JSON" - JSON format
"Arrow" - Apache Arrow format
"Parquet" - Parquet format
"DataFrame" - Pandas DataFrame
"ArrowTable" - PyArrow Table
"Debug" - Enable verbose logging
path str "" Database file path. Defaults to in-memory database.
Can be a file path or ":memory:" for in-memory database
udf_path str "" Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs (@func / create_function)

Returns

Returns the query result in the specified format:

Return Type Condition
str For text formats like CSV, JSON
pd.DataFrame When output_format is "DataFrame" or "dataframe"
pa.Table When output_format is "ArrowTable" or "arrowtable"
chdb result object For other formats

Raises

Exception Condition
ChdbError If the SQL query execution fails
ImportError If required dependencies are missing for DataFrame/Arrow formats

Examples

>>> # Basic CSV query
>>> result = chdb.query("SELECT 1, 'hello'")
>>> print(result)
"1,hello"
>>> # Query with DataFrame output
>>> df = chdb.query("SELECT 1 as id, 'hello' as msg", "DataFrame")
>>> print(df)
   id    msg
0   1  hello
>>> # Query with file-based database
>>> result = chdb.query("CREATE TABLE test (id INT) ENGINE = Memory", path="mydb.chdb")

chdb.to_arrowTable {#chdb-state-sqlitelike-to_arrowtable}

Convert query result to PyArrow Table.

Converts a chDB query result to a PyArrow Table for efficient columnar data processing. Returns an empty table if the result is empty.

Syntax

chdb.to_arrowTable(res)

Parameters

Parameter Description
res chDB query result object containing binary Arrow data

Returns

Return type Description
pa.Table PyArrow Table containing the query results

Raises

Error type Description
ImportError If pyarrow or pandas aren't installed

Example

>>> result = chdb.query("SELECT 1 as id, 'hello' as msg", "Arrow")
>>> table = chdb.to_arrowTable(result)
>>> print(table.to_pandas())
   id    msg
0   1  hello

chdb.to_df {#chdb_to_df}

Convert query result to pandas DataFrame.

Converts a chDB query result to a pandas DataFrame by first converting to PyArrow Table and then to pandas using multi-threading for better performance.

Syntax

chdb.to_df(r)

Parameters

Parameter Description
r chDB query result object containing binary Arrow data

Returns

Return Type Description
pd.DataFrame pandas DataFrame containing the query results

Raises

Exception Condition
ImportError If pyarrow or pandas aren't installed

Example

>>> result = chdb.query("SELECT 1 as id, 'hello' as msg", "Arrow")
>>> df = chdb.to_df(result)
>>> print(df)
   id    msg
0   1  hello

Connection and Session Management {#connection-session-management}

The following Session Functions are available:

chdb.connect {#chdb-connect}

Create a connection to chDB background server.

This function establishes a Connection to the chDB (ClickHouse) database engine. Only one open connection is allowed per process. Multiple calls with the same connection string will return the same connection object.

chdb.connect(connection_string: str = ':memory:') → Connection

Parameters:

Parameter Type Default Description
connection_string str ":memory:" Database connection string. See formats below.

Basic formats

Format Description
":memory:" In-memory database (default)
"test.db" Relative path database file
"file:test.db" Same as relative path
"/path/to/test.db" Absolute path database file
"file:/path/to/test.db" Same as absolute path

With query parameters

Format Description
"file:test.db?param1=value1&param2=value2" Relative path with params
"file::memory:?verbose&log-level=test" In-memory with params
"///path/to/test.db?param1=value1&param2=value2" Absolute path with params

Query parameter handling

Query parameters are passed to ClickHouse engine as startup arguments. Special parameter handling:

Special Parameter Becomes Description
mode=ro --readonly=1 Read-only mode
verbose (flag) Enables verbose logging
log-level=test (setting) Sets logging level

For a complete parameter list, see clickhouse local --help --verbose

Returns

Return Type Description
Connection Database connection object that supports:
• Creating cursors with Connection.cursor()
• Direct queries with Connection.query()
• Streaming queries with Connection.send_query()
• Context manager protocol for automatic cleanup

Raises

Exception Condition
RuntimeError If connection to database fails

:::warning Only one connection per process is supported. Creating a new connection will close any existing connection. :::

Examples

>>> # In-memory database
>>> conn = connect()
>>> conn = connect(":memory:")
>>>
>>> # File-based database
>>> conn = connect("my_data.db")
>>> conn = connect("/path/to/data.db")
>>>
>>> # With parameters
>>> conn = connect("data.db?mode=ro")  # Read-only mode
>>> conn = connect(":memory:?verbose&log-level=debug")  # Debug logging
>>>
>>> # Using context manager for automatic cleanup
>>> with connect("data.db") as conn:
...     result = conn.query("SELECT 1")
...     print(result)
>>> # Connection automatically closed

See also

  • Connection - Database connection class
  • Cursor - Database cursor for DB-API 2.0 operations

Exception Handling {#chdb-exceptions}

class chdb.ChdbError {#chdb_chdbError}

Bases: Exception

Base exception class for chDB-related errors.

This exception is raised when chDB query execution fails or encounters an error. It inherits from the standard Python Exception class and provides error information from the underlying ClickHouse engine.


class chdb.session.Session {#chdb_session_session}

Bases: object

Session will keep the state of query. If path is None, it will create a temporary directory and use it as the database path and the temporary directory will be removed when the session is closed. You can also pass in a path to create a database at that path where will keep your data.

You can also use a connection string to pass in the path and other parameters.

class chdb.session.Session(path=None)

Examples

Connection String Description
":memory:" In-memory database
"test.db" Relative path
"file:test.db" Same as above
"/path/to/test.db" Absolute path
"file:/path/to/test.db" Same as above
"file:test.db?param1=value1&param2=value2" Relative path with query params
"file::memory:?verbose&log-level=test" In-memory database with query params
"///path/to/test.db?param1=value1&param2=value2" Absolute path with query params

:::note Connection string args handling Connection strings containing query params like “file:test.db?param1=value1&param2=value2” “param1=value1” will be passed to ClickHouse engine as start up args.

For more details, see clickhouse local –help –verbose

Some special args handling:

  • “mode=ro” would be “–readonly=1” for clickhouse (read-only mode) :::

:::warning Important

  • There can be only one session at a time. If you want to create a new session, you need to close the existing one.
  • Creating a new session will close the existing one. :::

cleanup {#cleanup}

Cleanup session resources with exception handling.

This method attempts to close the session while suppressing any exceptions that might occur during the cleanup process. It’s particularly useful in error handling scenarios or when you need to ensure cleanup happens regardless of the session state.

Syntax

cleanup()

:::note This method will never raise an exception, making it safe to call in finally blocks or destructors. :::

Examples

>>> session = Session("test.db")
>>> try:
...     session.query("INVALID SQL")
... finally:
...     session.cleanup()  # Safe cleanup regardless of errors

See also

  • close() - For explicit session closing with error propagation

close {#close}

Close the session and cleanup resources.

This method closes the underlying connection and resets the global session state. After calling this method, the session becomes invalid and can't be used for further queries.

Syntax

close()

:::note This method is automatically called when the session is used as a context manager or when the session object is destroyed. :::

:::warning Important Any attempt to use the session after calling close() will result in an error. :::

Examples

>>> session = Session("test.db")
>>> session.query("SELECT 1")
>>> session.close()  # Explicitly close the session

query {#chdb-session-session-query}

Execute a SQL query and return the results.

This method executes a SQL query against the session’s database and returns the results in the specified format. The method supports various output formats and maintains session state between queries.

Syntax

query(sql, fmt='CSV', udf_path='')

Parameters

Parameter Type Default Description
sql str required SQL query string to execute
fmt str "CSV" Output format for results. Available formats:
"CSV" - Comma-separated values
"JSON" - JSON format
"TabSeparated" - Tab-separated values
"Pretty" - Pretty-printed table format
"JSONCompact" - Compact JSON format
"Arrow" - Apache Arrow format
"Parquet" - Parquet format
udf_path str "" Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs (@func / create_function). If not specified, uses the UDF path from session initialization

Returns

Returns query results in the specified format. The exact return type depends on the format parameter:

  • String formats (CSV, JSON, etc.) return str
  • Binary formats (Arrow, Parquet) return bytes

Raises

Exception Condition
RuntimeError If the session is closed or invalid
ValueError If the SQL query is malformed

:::note The “Debug” format isn't supported and will be automatically converted to “CSV” with a warning. For debugging, use connection string parameters instead. :::

:::warning Warning This method executes the query synchronously and loads all results into memory. For large result sets, consider using send_query() for streaming results. :::

Examples

>>> session = Session("test.db")
>>>
>>> # Basic query with default CSV format
>>> result = session.query("SELECT 1 as number")
>>> print(result)
number
1
>>> # Query with JSON format
>>> result = session.query("SELECT 1 as number", fmt="JSON")
>>> print(result)
{"number": "1"}
>>> # Complex query with table creation
>>> session.query("CREATE TABLE test (id INT, name String) ENGINE = Memory")
>>> session.query("INSERT INTO test VALUES (1, 'Alice'), (2, 'Bob')")
>>> result = session.query("SELECT * FROM test ORDER BY id")
>>> print(result)
id,name
1,Alice
2,Bob

See also


send_query {#chdb-session-session-send_query}

Execute a SQL query and return a streaming result iterator.

This method executes a SQL query against the session’s database and returns a streaming result object that allows you to iterate over the results without loading everything into memory at once. This is particularly useful for large result sets.

Syntax

send_query(sql, fmt='CSV') → StreamingResult

Parameters

Parameter Type Default Description
sql str required SQL query string to execute
fmt str "CSV" Output format for results. Available formats:
"CSV" - Comma-separated values
"JSON" - JSON format
"TabSeparated" - Tab-separated values
"JSONCompact" - Compact JSON format
"Arrow" - Apache Arrow format
"Parquet" - Parquet format

Returns

Return Type Description
StreamingResult A streaming result iterator that yields query results incrementally. The iterator can be used in for loops or converted to other data structures

Raises

Exception Condition
RuntimeError If the session is closed or invalid
ValueError If the SQL query is malformed

:::note The “Debug” format isn't supported and will be automatically converted to “CSV” with a warning. For debugging, use connection string parameters instead. :::

:::warning The returned StreamingResult object should be consumed promptly or stored appropriately, as it maintains a connection to the database. :::

Examples

>>> session = Session("test.db")
>>> session.query("CREATE TABLE big_table (id INT, data String) ENGINE = MergeTree() order by id")
>>>
>>> # Insert large dataset
>>> for i in range(1000):
...     session.query(f"INSERT INTO big_table VALUES ({i}, 'data_{i}')")
>>>
>>> # Stream results to avoid memory issues
>>> streaming_result = session.send_query("SELECT * FROM big_table ORDER BY id")
>>> for chunk in streaming_result:
...     print(f"Processing chunk: {len(chunk)} bytes")
...     # Process chunk without loading entire result set
>>> # Using with context manager
>>> with session.send_query("SELECT COUNT(*) FROM big_table") as stream:
...     for result in stream:
...         print(f"Count result: {result}")

See also

  • query() - For non-streaming query execution
  • chdb.state.sqlitelike.StreamingResult - Streaming result iterator

sql {#chdb-session-session-sql}

Execute a SQL query and return the results.

This method executes a SQL query against the session’s database and returns the results in the specified format. The method supports various output formats and maintains session state between queries.

Syntax

sql(sql, fmt='CSV', udf_path='')

Parameters

Parameter Type Default Description
sql str required SQL query string to execute
fmt str "CSV" Output format for results. Available formats:
"CSV" - Comma-separated values
"JSON" - JSON format
"TabSeparated" - Tab-separated values
"Pretty" - Pretty-printed table format
"JSONCompact" - Compact JSON format
"Arrow" - Apache Arrow format
"Parquet" - Parquet format
udf_path str "" Path to legacy subprocess-based UDF directory. Not needed for native Python UDFs (@func / create_function). If not specified, uses the UDF path from session initialization

Returns

Returns query results in the specified format. The exact return type depends on the format parameter:

  • String formats (CSV, JSON, etc.) return str
  • Binary formats (Arrow, Parquet) return bytes

Raises:

Exception Condition
RuntimeError If the session is closed or invalid
ValueError If the SQL query is malformed

:::note The “Debug” format isn't supported and will be automatically converted to “CSV” with a warning. For debugging, use connection string parameters instead. :::

:::warning Warning This method executes the query synchronously and loads all results into memory. For large result sets, consider using send_query() for streaming results. :::

Examples

>>> session = Session("test.db")
>>>
>>> # Basic query with default CSV format
>>> result = session.query("SELECT 1 as number")
>>> print(result)
number
1
>>> # Query with JSON format
>>> result = session.query("SELECT 1 as number", fmt="JSON")
>>> print(result)
{"number": "1"}
>>> # Complex query with table creation
>>> session.query("CREATE TABLE test (id INT, name String) ENGINE = MergeTree() order by id")
>>> session.query("INSERT INTO test VALUES (1, 'Alice'), (2, 'Bob')")
>>> result = session.query("SELECT * FROM test ORDER BY id")
>>> print(result)
id,name
1,Alice
2,Bob

See also

State Management {#chdb-state-management}

chdb.state.connect {#chdb_state_connect}

Create a Connection to the chDB background server.

This function establishes a connection to the chDB (ClickHouse) database engine. Only one open connection is allowed per process. Multiple calls with the same connection string will return the same connection object.

Syntax

chdb.state.connect(connection_string: str = ':memory:') → Connection

Parameters

Parameter Type Default Description
connection_string(str, optional) str ":memory:" Database connection string. See formats below.

Basic formats

Supported connection string formats:

Format Description
":memory:" In-memory database (default)
"test.db" Relative path database file
"file:test.db" Same as relative path
"/path/to/test.db" Absolute path database file
"file:/path/to/test.db" Same as absolute path

With query parameters

Format Description
"file:test.db?param1=value1&param2=value2" Relative path with params
"file::memory:?verbose&log-level=test" In-memory with params
"///path/to/test.db?param1=value1&param2=value2" Absolute path with params

Query parameter handling

Query parameters are passed to ClickHouse engine as startup arguments. Special parameter handling:

Special Parameter Becomes Description
mode=ro --readonly=1 Read-only mode
verbose (flag) Enables verbose logging
log-level=test (setting) Sets logging level

For a complete parameter list, see clickhouse local --help --verbose

Returns

Return Type Description
Connection Database connection object that supports:
• Creating cursors with Connection.cursor()
• Direct queries with Connection.query()
• Streaming queries with Connection.send_query()
• Context manager protocol for automatic cleanup

Raises

Exception Condition
RuntimeError If connection to database fails

:::warning Warning Only one connection per process is supported. Creating a new connection will close any existing connection. :::

Examples

>>> # In-memory database
>>> conn = connect()
>>> conn = connect(":memory:")
>>>
>>> # File-based database
>>> conn = connect("my_data.db")
>>> conn = connect("/path/to/data.db")
>>>
>>> # With parameters
>>> conn = connect("data.db?mode=ro")  # Read-only mode
>>> conn = connect(":memory:?verbose&log-level=debug")  # Debug logging
>>>
>>> # Using context manager for automatic cleanup
>>> with connect("data.db") as conn:
...     result = conn.query("SELECT 1")
...     print(result)
>>> # Connection automatically closed

See also

  • Connection - Database connection class
  • Cursor - Database cursor for DB-API 2.0 operations

class chdb.state.sqlitelike.Connection {#chdb-state-sqlitelike-connection}

Bases: object

Syntax

class chdb.state.sqlitelike.Connection(connection_string: str)

close {#chdb-session-session-close}

Close the connection and cleanup resources.

This method closes the database connection and cleans up any associated resources including active cursors. After calling this method, the connection becomes invalid and can't be used for further operations.

Syntax

close() → None

:::note This method is idempotent - calling it multiple times is safe. :::

:::warning Warning Any ongoing streaming queries will be cancelled when the connection is closed. Ensure all important data is processed before closing. :::

Examples

>>> conn = connect("test.db")
>>> # Use connection for queries
>>> conn.query("CREATE TABLE test (id INT) ENGINE = Memory")
>>> # Close when done
>>> conn.close()
>>> # Using with context manager (automatic cleanup)
>>> with connect("test.db") as conn:
...     conn.query("SELECT 1")
...     # Connection automatically closed

cursor {#chdb-state-sqlitelike-connection-cursor}

Create a Cursor object for executing queries.

This method creates a database cursor that provides the standard DB-API 2.0 interface for executing queries and fetching results. The cursor allows for fine-grained control over query execution and result retrieval.

Syntax

cursor() → Cursor

Returns

Return Type Description
Cursor A cursor object for database operations

:::note Creating a new cursor will replace any existing cursor associated with this connection. Only one cursor per connection is supported. :::

Examples

>>> conn = connect(":memory:")
>>> cursor = conn.cursor()
>>> cursor.execute("CREATE TABLE test (id INT, name String) ENGINE = Memory")
>>> cursor.execute("INSERT INTO test VALUES (1, 'Alice')")
>>> cursor.execute("SELECT * FROM test")
>>> rows = cursor.fetchall()
>>> print(rows)
((1, 'Alice'),)

See also

  • Cursor - Database cursor implementation

query {#chdb-state-sqlitelike-connection-query}

Execute a SQL query and return the complete results.

This method executes a SQL query synchronously and returns the complete result set. It supports various output formats and automatically applies format-specific post-processing.

Syntax

query(query: str, format: str = 'CSV') → Any

Parameters:

Parameter Type Default Description
query str required SQL query string to execute
format str "CSV" Output format for results. Supported formats:
"CSV" - Comma-separated values (string)
"JSON" - JSON format (string)
"Arrow" - Apache Arrow format (bytes)
"Dataframe" - Pandas DataFrame (requires pandas)
"Arrowtable" - PyArrow Table (requires pyarrow)

Returns

Return Type Description
str For string formats (CSV, JSON)
bytes For Arrow format
pandas.DataFrame For dataframe format
pyarrow.Table For arrowtable format

Raises

Exception Condition
RuntimeError If query execution fails
ImportError If required packages for format aren't installed

:::warning Warning This method loads the entire result set into memory. For large results, consider using send_query() for streaming. :::

Examples

>>> conn = connect(":memory:")
>>>
>>> # Basic CSV query
>>> result = conn.query("SELECT 1 as num, 'hello' as text")
>>> print(result)
num,text
1,hello
>>> # DataFrame format
>>> df = conn.query("SELECT number FROM numbers(5)", "dataframe")
>>> print(df)
   number
0       0
1       1
2       2
3       3
4       4

See also


send_query {#chdb-state-sqlitelike-connection-send_query}

Execute a SQL query and return a streaming result iterator.

This method executes a SQL query and returns a StreamingResult object that allows you to iterate over the results without loading everything into memory at once. This is ideal for processing large result sets.

Syntax

send_query(query: str, format: str = 'CSV') → StreamingResult

Parameters

Parameter Type Default Description
query str required SQL query string to execute
format str "CSV" Output format for results. Supported formats:
"CSV" - Comma-separated values
"JSON" - JSON format
"Arrow" - Apache Arrow format (enables record_batch() method)
"dataframe" - Pandas DataFrame chunks
"arrowtable" - PyArrow Table chunks

Returns

Return Type Description
StreamingResult A streaming iterator for query results that supports:
• Iterator protocol (for loops)
• Context manager protocol (with statements)
• Manual fetching with fetch() method
• PyArrow RecordBatch streaming (Arrow format only)

Raises

Exception Condition
RuntimeError If query execution fails
ImportError If required packages for format aren't installed

:::note Only the “Arrow” format supports the record_batch() method on the returned StreamingResult. :::

Examples

>>> conn = connect(":memory:")
>>>
>>> # Basic streaming
>>> stream = conn.send_query("SELECT number FROM numbers(1000)")
>>> for chunk in stream:
...     print(f"Processing chunk: {len(chunk)} bytes")
>>> # Using context manager for cleanup
>>> with conn.send_query("SELECT * FROM large_table") as stream:
...     chunk = stream.fetch()
...     while chunk:
...         process_data(chunk)
...         chunk = stream.fetch()
>>> # Arrow format with RecordBatch streaming
>>> stream = conn.send_query("SELECT * FROM data", "Arrow")
>>> reader = stream.record_batch(rows_per_batch=10000)
>>> for batch in reader:
...     print(f"Batch shape: {batch.num_rows} x {batch.num_columns}")

See also


class chdb.state.sqlitelike.StreamingResult {#chdb-state-sqlitelike-streamingresult}

Bases: object

Streaming result iterator for processing large query results.

This class provides an iterator interface for streaming query results without loading the entire result set into memory. It supports various output formats and provides methods for manual result fetching and PyArrow RecordBatch streaming.

class chdb.state.sqlitelike.StreamingResult

fetch {#streamingresult-fetch}

Fetch the next chunk of streaming results.

This method retrieves the next available chunk of data from the streaming query result. The format of the returned data depends on the format specified when the streaming query was initiated.

Syntax

fetch() → Any

Returns

Return Type Description
str For text formats (CSV, JSON)
bytes For binary formats (Arrow, Parquet)
None When the result stream is exhausted

Examples

>>> stream = conn.send_query("SELECT * FROM large_table")
>>> chunk = stream.fetch()
>>> while chunk is not None:
...     process_data(chunk)
...     chunk = stream.fetch()

cancel {#streamingresult-cancel}

Cancel the streaming query and cleanup resources.

This method cancels any ongoing streaming query and releases associated resources. It should be called when you want to stop processing results before the stream is exhausted.

Syntax

cancel() → None

Examples

>>> stream = conn.send_query("SELECT * FROM very_large_table")
>>> for i, chunk in enumerate(stream):
...     if i >= 10:  # Only process first 10 chunks
...         stream.cancel()
...         break
...     process_data(chunk)

close {#streamingresult-close}

Close the streaming result and cleanup resources.

Alias for cancel(). Closes the streaming result iterator and releases any associated resources.

Syntax

close() → None

record_batch {#streamingresult-record_batch}

Create a PyArrow RecordBatchReader for efficient batch processing.

This method creates a PyArrow RecordBatchReader that allows efficient iteration over the query results in Arrow format. This is the most efficient way to process large result sets when using PyArrow.

Syntax

record_batch(rows_per_batch: int = 1000000) → pa.RecordBatchReader

Parameters

Parameter Type Default Description
rows_per_batch int 1000000 Number of rows per batch

Returns

Return Type Description
pa.RecordBatchReader PyArrow RecordBatchReader for iterating over batches

:::note This method is only available when the streaming query was initiated with format="Arrow". Using it with other formats will raise an error. :::

Examples

>>> stream = conn.send_query("SELECT * FROM data", format="Arrow")
>>> reader = stream.record_batch(rows_per_batch=10000)
>>> for batch in reader:
...     print(f"Processing batch: {batch.num_rows} rows")
...     df = batch.to_pandas()
...     process_dataframe(df)

Iterator Protocol {#streamingresult-iterator}

StreamingResult supports the Python iterator protocol, allowing it to be used directly in for loops:

>>> stream = conn.send_query("SELECT number FROM numbers(1000000)")
>>> for chunk in stream:
...     print(f"Chunk size: {len(chunk)} bytes")

Context Manager Protocol {#streamingresult-context-manager}

StreamingResult supports the context manager protocol for automatic resource cleanup:

>>> with conn.send_query("SELECT * FROM data") as stream:
...     for chunk in stream:
...         process(chunk)
>>> # Stream automatically closed

class chdb.state.sqlitelike.Cursor {#chdb-state-sqlitelike-cursor}

Bases: object

class chdb.state.sqlitelike.Cursor(connection)

close {#cursor-close-none}

Close the cursor and cleanup resources.

This method closes the cursor and cleans up any associated resources. After calling this method, the cursor becomes invalid and can't be used for further operations.

Syntax

close() → None

:::note This method is idempotent - calling it multiple times is safe. The cursor is also automatically closed when the connection is closed. :::

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT 1")
>>> result = cursor.fetchone()
>>> cursor.close()  # Cleanup cursor resources

column_names {#chdb-state-sqlitelike-cursor-column_names}

Return a list of column names from the last executed query.

This method returns the column names from the most recently executed SELECT query. The names are returned in the same order as they appear in the result set.

Syntax

column_names() → list

Returns

Return Type Description
list List of column name strings, or empty list if no query has been executed or the query returned no columns

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name, email FROM users LIMIT 1")
>>> print(cursor.column_names())
['id', 'name', 'email']

See also


column_types {#chdb-state-sqlitelike-cursor-column_types}

Return a list of column types from the last executed query.

This method returns the ClickHouse column type names from the most recently executed SELECT query. The types are returned in the same order as they appear in the result set.

Syntax

column_types() → list

Returns

Return Type Description
list List of ClickHouse type name strings, or empty list if no query has been executed or the query returned no columns

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT toInt32(1), toString('hello')")
>>> print(cursor.column_types())
['Int32', 'String']

See also


commit {#commit}

Commit any pending transaction.

This method commits any pending database transaction. In ClickHouse, most operations are auto-committed, but this method is provided for DB-API 2.0 compatibility.

:::note ClickHouse typically auto-commits operations, so explicit commits are usually not necessary. This method is provided for compatibility with standard DB-API 2.0 workflow. :::

Syntax

commit() → None

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("INSERT INTO test VALUES (1, 'data')")
>>> cursor.commit()

property description : list {#chdb-state-sqlitelike-cursor-description}

Return column description as per DB-API 2.0 specification.

This property returns a list of 7-item tuples describing each column in the result set of the last executed SELECT query. Each tuple contains: (name, type_code, display_size, internal_size, precision, scale, null_ok)

Currently, only name and type_code are provided, with other fields set to None.

Returns

Return Type Description
list List of 7-tuples describing each column, or empty list if no SELECT query has been executed

:::note This follows the DB-API 2.0 specification for cursor.description. Only the first two elements (name and type_code) contain meaningful data in this implementation. :::

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name FROM users LIMIT 1")
>>> for desc in cursor.description:
...     print(f"Column: {desc[0]}, Type: {desc[1]}")
Column: id, Type: Int32
Column: name, Type: String

See also


execute {#execute}

Execute a SQL query and prepare results for fetching.

This method executes a SQL query and prepares the results for retrieval using the fetch methods. It handles the parsing of result data and automatic type conversion for ClickHouse data types.

Syntax

execute(query: str) → None

Parameters:

Parameter Type Description
query str SQL query string to execute

Raises

Exception Condition
Exception If query execution fails or result parsing fails

:::note This method follows DB-API 2.0 specifications for cursor.execute(). After execution, use fetchone(), fetchmany(), or fetchall() to retrieve results. :::

:::note The method automatically converts ClickHouse data types to appropriate Python types:

  • Int/UInt types → int
  • Float types → float
  • String/FixedString → str
  • DateTime → datetime.datetime
  • Date → datetime.date
  • Bool → bool :::

Examples

>>> cursor = conn.cursor()
>>>
>>> # Execute DDL
>>> cursor.execute("CREATE TABLE test (id INT, name String) ENGINE = Memory")
>>>
>>> # Execute DML
>>> cursor.execute("INSERT INTO test VALUES (1, 'Alice')")
>>>
>>> # Execute SELECT and fetch results
>>> cursor.execute("SELECT * FROM test")
>>> rows = cursor.fetchall()
>>> print(rows)
((1, 'Alice'),)

See also


fetchall {#chdb-state-sqlitelike-cursor-fetchall}

Fetch all remaining rows from the query result.

This method retrieves all remaining rows from the current query result set starting from the current cursor position. It returns a tuple of row tuples with appropriate Python type conversion applied.

Syntax

fetchall() → tuple

Returns:

Return Type Description
tuple Tuple containing all remaining row tuples from the result set. Returns empty tuple if no rows are available

:::warning Warning This method loads all remaining rows into memory at once. For large result sets, consider using fetchmany() to process results in batches. :::

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name FROM users")
>>> all_users = cursor.fetchall()
>>> for user_id, user_name in all_users:
...     print(f"User {user_id}: {user_name}")

See also


fetchmany {#chdb-state-sqlitelike-cursor-fetchmany}

Fetch multiple rows from the query result.

This method retrieves up to ‘size’ rows from the current query result set. It returns a tuple of row tuples, with each row containing column values with appropriate Python type conversion.

Syntax

fetchmany(size: int = 1) → tuple

Parameters

Parameter Type Default Description
size int 1 Maximum number of rows to fetch

Returns

Return Type Description
tuple Tuple containing up to 'size' row tuples. May contain fewer rows if the result set is exhausted

:::note This method follows DB-API 2.0 specifications. It will return fewer than ‘size’ rows if the result set is exhausted. :::

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT * FROM large_table")
>>>
>>> # Process results in batches
>>> while True:
...     batch = cursor.fetchmany(100)  # Fetch 100 rows at a time
...     if not batch:
...         break
...     process_batch(batch)

See also


fetchone {#chdb-state-sqlitelike-cursor-fetchone}

Fetch the next row from the query result.

This method retrieves the next available row from the current query result set. It returns a tuple containing the column values with appropriate Python type conversion applied.

Syntax

fetchone() → tuple | None

Returns:

Return Type Description
Optional[tuple] Next row as a tuple of column values, or None if no more rows are available

:::note This method follows DB-API 2.0 specifications. Column values are automatically converted to appropriate Python types based on ClickHouse column types. :::

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name FROM users")
>>> row = cursor.fetchone()
>>> while row is not None:
...     user_id, user_name = row
...     print(f"User {user_id}: {user_name}")
...     row = cursor.fetchone()

See also


chdb.state.sqlitelike {#state-sqlitelike-to_arrowtable}

Convert query result to PyArrow Table.

This function converts chdb query results to a PyArrow Table format, which provides efficient columnar data access and interoperability with other data processing libraries.

Syntax

chdb.state.sqlitelike.to_arrowTable(res)

Parameters:

Parameter Type Description
res - Query result object from chdb containing Arrow format data

Returns

Return Type Description
pyarrow.Table PyArrow Table containing the query results

Raises

Exception Condition
ImportError If pyarrow or pandas packages aren't installed

:::note This function requires both pyarrow and pandas to be installed. Install them with: pip install pyarrow pandas :::

:::warning Warning Empty results return an empty PyArrow Table with no schema. :::

Examples

>>> import chdb
>>> result = chdb.query("SELECT 1 as num, 'hello' as text", "Arrow")
>>> table = to_arrowTable(result)
>>> print(table.schema)
num: int64
text: string
>>> print(table.to_pandas())
   num   text
0    1  hello

chdb.state.sqlitelike.to_df {#state-sqlitelike-to_df}

Convert query result to Pandas DataFrame.

This function converts chdb query results to a Pandas DataFrame format by first converting to PyArrow Table and then to DataFrame. This provides convenient data analysis capabilities with Pandas API.

Syntax

chdb.state.sqlitelike.to_df(r)

Parameters:

Parameter Type Description
r - Query result object from chdb containing Arrow format data

Returns:

Return Type Description
pandas.DataFrame DataFrame containing the query results with appropriate column names and data types

Raises

Exception Condition
ImportError If pyarrow or pandas packages aren't installed

:::note This function uses multi-threading for the Arrow to Pandas conversion to improve performance on large datasets. :::

See also

Examples

>>> import chdb
>>> result = chdb.query("SELECT 1 as num, 'hello' as text", "Arrow")
>>> df = to_df(result)
>>> print(df)
   num   text
0    1  hello
>>> print(df.dtypes)
num      int64
text    object
dtype: object

DataFrame Integration {#dataframe-integration}

class chdb.dataframe.Table {#chdb-dataframe-table}

Bases:

class chdb.dataframe.Table(*args: Any, **kwargs: Any)

Database API (DBAPI) 2.0 Interface {#database-api-interface}

chDB provides a Python DB-API 2.0 compatible interface for database connectivity, allowing you to use chDB with tools and frameworks that expect standard database interfaces.

The chDB DB-API 2.0 interface includes:

  • Connections: Database connection management with connection strings
  • Cursors: Query execution and result retrieval
  • Type System: DB-API 2.0 compliant type constants and converters
  • Error Handling: Standard database exception hierarchy
  • Thread Safety: Level 1 thread safety (threads may share modules but not connections)

Core Functions {#core-functions}

The Database API (DBAPI) 2.0 Interface implements the following core functions:

chdb.dbapi.connect {#dbapi-connect}

Initialize a new database connection.

Syntax

chdb.dbapi.connect(*args, **kwargs)

Parameters

Parameter Type Default Description
path str None Database file path. None for in-memory database

Raises

Exception Condition
err.Error If connection can't be established

chdb.dbapi.get_client_info() {#dbapi-get-client-info}

Get client version information.

Returns the chDB client version as a string for MySQLdb compatibility.

Syntax

chdb.dbapi.get_client_info()

Returns

Return Type Description
str Version string in format 'major.minor.patch'

Type constructors {#type-constructors}

chdb.dbapi.Binary(x) {#dbapi-binary}

Return x as a binary type.

This function converts the input to bytes type for use with binary database fields, following the DB-API 2.0 specification.

Syntax

chdb.dbapi.Binary(x)

Parameters

Parameter Type Description
x - Input data to convert to binary

Returns

Return Type Description
bytes The input converted to bytes

Connection Class {#connection-class}

class chdb.dbapi.connections.Connection(path=None) {#chdb-dbapi-connections-connection}

Bases: object

DB-API 2.0 compliant connection to chDB database.

This class provides a standard DB-API interface for connecting to and interacting with chDB databases. It supports both in-memory and file-based databases.

The connection manages the underlying chDB engine and provides methods for executing queries, managing transactions (no-op for ClickHouse), and creating cursors.

class chdb.dbapi.connections.Connection(path=None)

Parameters

Parameter Type Default Description
path str None Database file path. If None, uses in-memory database. Can be a file path like 'database.db' or None for ':memory:'

Variables

Variable Type Description
encoding str Character encoding for queries, defaults to 'utf8'
open bool True if connection is open, False if closed

Examples

>>> # In-memory database
>>> conn = Connection()
>>> cursor = conn.cursor()
>>> cursor.execute("SELECT 1")
>>> result = cursor.fetchall()
>>> conn.close()
>>> # File-based database
>>> conn = Connection('mydata.db')
>>> with conn.cursor() as cur:
...     cur.execute("CREATE TABLE users (id INT, name STRING) ENGINE = MergeTree() order by id")
...     cur.execute("INSERT INTO users VALUES (1, 'Alice')")
>>> conn.close()
>>> # Context manager usage
>>> with Connection() as cur:
...     cur.execute("SELECT version()")
...     version = cur.fetchone()

:::note ClickHouse doesn't support traditional transactions, so commit() and rollback() operations are no-ops but provided for DB-API compliance. :::


close {#dbapi-connection-close}

Close the database connection.

Closes the underlying chDB connection and marks this connection as closed. Subsequent operations on this connection will raise an Error.

Syntax

close()

Raises

Exception Condition
err.Error If connection is already closed

commit {#dbapi-commit}

Commit the current transaction.

Syntax

commit()

:::note This is a no-op for chDB/ClickHouse as it doesn’t support traditional transactions. Provided for DB-API 2.0 compliance. :::


cursor {#dbapi-cursor}

Create a new cursor for executing queries.

Syntax

cursor(cursor=None)

Parameters

Parameter Type Description
cursor - Ignored, provided for compatibility

Returns

Return Type Description
Cursor New cursor object for this connection

Raises

Exception Condition
err.Error If connection is closed

Example

>>> conn = Connection()
>>> cur = conn.cursor()
>>> cur.execute("SELECT 1")
>>> result = cur.fetchone()

escape {#escape}

Escape a value for safe inclusion in SQL queries.

Syntax

escape(obj, mapping=None)

Parameters

Parameter Type Description
obj - Value to escape (string, bytes, number, etc.)
mapping - Optional character mapping for escaping

Returns

Return Type Description
- Escaped version of the input suitable for SQL queries

Example

>>> conn = Connection()
>>> safe_value = conn.escape("O'Reilly")
>>> query = f"SELECT * FROM users WHERE name = {safe_value}"

escape_string {#escape-string}

Escape a string value for SQL queries.

Syntax

escape_string(s)

Parameters

Parameter Type Description
s str String to escape

Returns

Return Type Description
str Escaped string safe for SQL inclusion

property open {#property-open}

Check if the connection is open.

Returns

Return Type Description
bool True if connection is open, False if closed

query {#dbapi-query}

Execute a SQL query directly and return raw results.

This method bypasses the cursor interface and executes queries directly. For standard DB-API usage, prefer using cursor() method.

Syntax

query(sql, fmt='CSV')

Parameters:

Parameter Type Default Description
sql str or bytes required SQL query to execute
fmt str "CSV" Output format. Supported formats include "CSV", "JSON", "Arrow", "Parquet", etc.

Returns

Return Type Description
- Query result in the specified format

Raises

Exception Condition
err.InterfaceError If connection is closed or query fails

Example

>>> conn = Connection()
>>> result = conn.query("SELECT 1, 'hello'", "CSV")
>>> print(result)
"1,hello\n"

property resp {#property-resp}

Get the last query response.

Returns

Return Type Description
- The raw response from the last query() call

:::note This property is updated each time query() is called directly. It doesn't reflect queries executed through cursors. :::


rollback {#rollback}

Roll back the current transaction.

Syntax

rollback()

:::note This is a no-op for chDB/ClickHouse as it doesn’t support traditional transactions. Provided for DB-API 2.0 compliance. :::


Cursor Class {#cursor-class}

class chdb.dbapi.cursors.Cursor {#chdb-dbapi-cursors-cursor}

Bases: object

DB-API 2.0 cursor for executing queries and fetching results.

The cursor provides methods for executing SQL statements, managing query results, and navigating through result sets. It supports parameter binding, bulk operations, and follows DB-API 2.0 specifications.

Don't create Cursor instances directly. Use Connection.cursor() instead.

class chdb.dbapi.cursors.Cursor(connection)
Variable Type Description
description tuple Column metadata for the last query result
rowcount int Number of rows affected by the last query (-1 if unknown)
arraysize int Default number of rows to fetch at once (default: 1)
lastrowid - ID of the last inserted row (if applicable)
max_stmt_length int Maximum statement size for executemany() (default: 1024000)

Examples

>>> conn = Connection()
>>> cur = conn.cursor()
>>> cur.execute("SELECT 1 as id, 'test' as name")
>>> result = cur.fetchone()
>>> print(result)  # (1, 'test')
>>> cur.close()

:::note See DB-API 2.0 Cursor Objects for complete specification details. :::


callproc {#callproc}

Execute a stored procedure (placeholder implementation).

Syntax

callproc(procname, args=())

Parameters

Parameter Type Description
procname str Name of stored procedure to execute
args sequence Parameters to pass to the procedure

Returns

Return Type Description
sequence The original args parameter (unmodified)

:::note chDB/ClickHouse doesn't support stored procedures in the traditional sense. This method is provided for DB-API 2.0 compliance but doesn't perform any actual operation. Use execute() for all SQL operations. :::

:::warning Compatibility This is a placeholder implementation. Traditional stored procedure features like OUT/INOUT parameters, multiple result sets, and server variables aren't supported by the underlying ClickHouse engine. :::


close {#dbapi-cursor-close}

Close the cursor and free associated resources.

After closing, the cursor becomes unusable and any operation will raise an exception. Closing a cursor exhausts all remaining data and releases the underlying cursor.

Syntax

close()

execute {#dbapi-execute}

Execute a SQL query with optional parameter binding.

This method executes a single SQL statement with optional parameter substitution. It supports multiple parameter placeholder styles for flexibility.

Syntax

execute(query, args=None)

Parameters

Parameter Type Default Description
query str required SQL query to execute
args tuple/list/dict None Parameters to bind to placeholders

Returns

Return Type Description
int Number of affected rows (-1 if unknown)

Parameter Styles

Style Example
Question mark style "SELECT * FROM users WHERE id = ?"
Named style "SELECT * FROM users WHERE name = %(name)s"
Format style "SELECT * FROM users WHERE age = %s" (legacy)

Examples

>>> # Question mark parameters
>>> cur.execute("SELECT * FROM users WHERE id = ? AND age > ?", (123, 18))
>>>
>>> # Named parameters
>>> cur.execute("SELECT * FROM users WHERE name = %(name)s", {'name': 'Alice'})
>>>
>>> # No parameters
>>> cur.execute("SELECT COUNT(*) FROM users")

Raises

Exception Condition
ProgrammingError If cursor is closed or query is malformed
InterfaceError If database error occurs during execution

executemany(query, args) {#chdb-dbapi-cursors-cursor-executemany}

Execute a query multiple times with different parameter sets.

This method efficiently executes the same SQL query multiple times with different parameter values. It’s particularly useful for bulk INSERT operations.

Syntax

executemany(query, args)

Parameters

Parameter Type Description
query str SQL query to execute multiple times
args sequence Sequence of parameter tuples/dicts/lists for each execution

Returns

Return Type Description
int Total number of affected rows across all executions

Examples

>>> # Bulk insert with question mark parameters
>>> users_data = [(1, 'Alice'), (2, 'Bob'), (3, 'Charlie')]
>>> cur.executemany("INSERT INTO users VALUES (?, ?)", users_data)
>>>
>>> # Bulk insert with named parameters
>>> users_data = [
...     {'id': 1, 'name': 'Alice'},
...     {'id': 2, 'name': 'Bob'}
... ]
>>> cur.executemany(
...     "INSERT INTO users VALUES (%(id)s, %(name)s)",
...     users_data
... )

:::note This method improves performance for multiple-row INSERT and UPDATE operations by optimizing the query execution process. :::


fetchall() {#dbapi-fetchall}

Fetch all remaining rows from the query result.

Syntax

fetchall()

Returns

Return Type Description
list List of tuples representing all remaining rows

Raises

Exception Condition
ProgrammingError If execute() hasn't been called first

:::warning Warning This method can consume large amounts of memory for big result sets. Consider using fetchmany() for large datasets. :::

Example

>>> cursor.execute("SELECT id, name FROM users")
>>> all_rows = cursor.fetchall()
>>> print(len(all_rows))  # Number of total rows

fetchmany {#dbapi-fetchmany}

Fetch multiple rows from the query result.

Syntax

fetchmany(size=1)

Parameters

Parameter Type Default Description
size int 1 Number of rows to fetch. If not specified, uses cursor.arraysize

Returns

Return Type Description
list List of tuples representing the fetched rows

Raises

Exception Condition
ProgrammingError If execute() hasn't been called first

Example

>>> cursor.execute("SELECT id, name FROM users")
>>> rows = cursor.fetchmany(3)
>>> print(rows)  # [(1, 'Alice'), (2, 'Bob'), (3, 'Charlie')]

fetchone {#dbapi-fetchone}

Fetch the next row from the query result.

Syntax

fetchone()

Returns

Return Type Description
tuple or None Next row as a tuple, or None if no more rows available

Raises

Exception Condition
ProgrammingError If execute() hasn't been called first

Example

>>> cursor.execute("SELECT id, name FROM users LIMIT 3")
>>> row = cursor.fetchone()
>>> print(row)  # (1, 'Alice')
>>> row = cursor.fetchone()
>>> print(row)  # (2, 'Bob')

max_stmt_length = 1024000 {#max-stmt-length}

Max statement size which executemany() generates.

Default value is 1024000.


mogrify {#mogrify}

Return the exact query string that would be sent to the database.

This method shows the final SQL query after parameter substitution, which is useful for debugging and logging purposes.

Syntax

mogrify(query, args=None)

Parameters

Parameter Type Default Description
query str required SQL query with parameter placeholders
args tuple/list/dict None Parameters to substitute

Returns

Return Type Description
str The final SQL query string with parameters substituted

Example

>>> cur.mogrify("SELECT * FROM users WHERE id = ?", (123,))
"SELECT * FROM users WHERE id = 123"

:::note This method follows the extension to DB-API 2.0 used by Psycopg. :::


nextset {#nextset}

Move to the next result set (not supported).

Syntax

nextset()

Returns

Return Type Description
None Always returns None as multiple result sets aren't supported

:::note chDB/ClickHouse doesn't support multiple result sets from a single query. This method is provided for DB-API 2.0 compliance but always returns None. :::


setinputsizes {#setinputsizes}

Set input sizes for parameters (no-op implementation).

Syntax

setinputsizes(*args)

Parameters

Parameter Type Description
*args - Parameter size specifications (ignored)

:::note This method does nothing but is required by DB-API 2.0 specification. chDB automatically handles parameter sizing internally. :::


setoutputsizes {#setoutputsizes}

Set output column sizes (no-op implementation).

Syntax

setoutputsizes(*args)

Parameters

Parameter Type Description
*args - Column size specifications (ignored)

:::note This method does nothing but is required by DB-API 2.0 specification. chDB automatically handles output sizing internally. :::


Error Classes {#error-classes}

Exception classes for chdb database operations.

This module provides a complete hierarchy of exception classes for handling database-related errors in chdb, following the Python Database API Specification v2.0.

The exception hierarchy is structured as follows:

StandardError
├── Warning
└── Error
    ├── InterfaceError
    └── DatabaseError
        ├── DataError
        ├── OperationalError
        ├── IntegrityError
        ├── InternalError
        ├── ProgrammingError
        └── NotSupportedError

Each exception class represents a specific category of database errors:

Exception Description
Warning Non-fatal warnings during database operations
InterfaceError Problems with the database interface itself
DatabaseError Base class for all database-related errors
DataError Problems with data processing (invalid values, type errors)
OperationalError Database operational issues (connectivity, resources)
IntegrityError Constraint violations (foreign keys, uniqueness)
InternalError Database internal errors and corruption
ProgrammingError SQL syntax errors and API misuse
NotSupportedError Unsupported features or operations

:::note These exception classes are compliant with Python DB API 2.0 specification and provide consistent error handling across different database operations. :::

See also

Examples

>>> try:
...     cursor.execute("SELECT * FROM nonexistent_table")
... except ProgrammingError as e:
...     print(f"SQL Error: {e}")
...
SQL Error: Table 'nonexistent_table' doesn't exist
>>> try:
...     cursor.execute("INSERT INTO users (id) VALUES (1), (1)")
... except IntegrityError as e:
...     print(f"Constraint violation: {e}")
...
Constraint violation: Duplicate entry '1' for key 'PRIMARY'

exception chdb.dbapi.err.DataError {#chdb-dbapi-err-dataerror}

Bases: DatabaseError

Exception raised for errors that are due to problems with the processed data.

This exception is raised when database operations fail due to issues with the data being processed, such as:

  • Division by zero operations
  • Numeric values out of range
  • Invalid date/time values
  • String truncation errors
  • Type conversion failures
  • Invalid data format for column type

Raises

Exception Condition
DataError When data validation or processing fails

Examples

>>> # Division by zero in SQL
>>> cursor.execute("SELECT 1/0")
DataError: Division by zero
>>> # Invalid date format
>>> cursor.execute("INSERT INTO table VALUES ('invalid-date')")
DataError: Invalid date format

exception chdb.dbapi.err.DatabaseError {#chdb-dbapi-err-databaseerror}

Bases: Error

Exception raised for errors that are related to the database.

This is the base class for all database-related errors. It encompasses all errors that occur during database operations and are related to the database itself rather than the interface.

Common scenarios include:

  • SQL execution errors
  • Database connectivity issues
  • Transaction-related problems
  • Database-specific constraints violations

:::note This serves as the parent class for more specific database error types such as DataError, OperationalError, etc. :::


exception chdb.dbapi.err.Error {#chdb-dbapi-err-error}

Bases: StandardError

Exception that is the base class of all other error exceptions (not Warning).

This is the base class for all error exceptions in chdb, excluding warnings. It serves as the parent class for all database error conditions that prevent successful completion of operations.

:::note This exception hierarchy follows the Python DB API 2.0 specification. :::

See also

  • Warning - For non-fatal warnings that don’t prevent operation completion

exception chdb.dbapi.err.IntegrityError {#chdb-dbapi-err-integrityerror}

Bases: DatabaseError

Exception raised when the relational integrity of the database is affected.

This exception is raised when database operations violate integrity constraints, including:

  • Foreign key constraint violations
  • Primary key or unique constraint violations (duplicate keys)
  • Check constraint violations
  • NOT NULL constraint violations
  • Referential integrity violations

Raises

Exception Condition
IntegrityError When database integrity constraints are violated

Examples

>>> # Duplicate primary key
>>> cursor.execute("INSERT INTO users (id, name) VALUES (1, 'John')")
>>> cursor.execute("INSERT INTO users (id, name) VALUES (1, 'Jane')")
IntegrityError: Duplicate entry '1' for key 'PRIMARY'
>>> # Foreign key violation
>>> cursor.execute("INSERT INTO orders (user_id) VALUES (999)")
IntegrityError: Cannot add or update a child row: foreign key constraint fails

exception chdb.dbapi.err.InterfaceError {#chdb-dbapi-err-interfaceerror}

Bases: Error

Exception raised for errors that are related to the database interface rather than the database itself.

This exception is raised when there are problems with the database interface implementation, such as:

  • Invalid connection parameters
  • API misuse (calling methods on closed connections)
  • Interface-level protocol errors
  • Module import or initialization failures

Raises

Exception Condition
InterfaceError When database interface encounters errors unrelated to database operations

:::note These errors are typically programming errors or configuration issues that can be resolved by fixing the client code or configuration. :::


exception chdb.dbapi.err.InternalError {#chdb-dbapi-err-internalerror}

Bases: DatabaseError

Exception raised when the database encounters an internal error.

This exception is raised when the database system encounters internal errors that aren't caused by the application, such as:

  • Invalid cursor state (cursor isn't valid anymore)
  • Transaction state inconsistencies (transaction is out of sync)
  • Database corruption issues
  • Internal data structure corruption
  • System-level database errors

Raises

Exception Condition
InternalError When database encounters internal inconsistencies

:::warning Warning Internal errors may indicate serious database problems that require database administrator attention. These errors are typically not recoverable through application-level retry logic. :::

:::note These errors are generally outside the control of the application and may require database restart or repair operations. :::


exception chdb.dbapi.err.NotSupportedError {#chdb-dbapi-err-notsupportederror}

Bases: DatabaseError

Exception raised when a method or database API isn't supported.

This exception is raised when the application attempts to use database features or API methods that aren't supported by the current database configuration or version, such as:

  • Requesting rollback() on connections without transaction support
  • Using advanced SQL features not supported by the database version
  • Calling methods not implemented by the current driver
  • Attempting to use disabled database features

Raises

Exception Condition
NotSupportedError When unsupported database features are accessed

Examples

>>> # Transaction rollback on non-transactional connection
>>> connection.rollback()
NotSupportedError: Transactions are not supported
>>> # Using unsupported SQL syntax
>>> cursor.execute("SELECT * FROM table WITH (NOLOCK)")
NotSupportedError: WITH clause not supported in this database version

:::note Check database documentation and driver capabilities to avoid these errors. Consider graceful fallbacks where possible. :::


exception chdb.dbapi.err.OperationalError {#chdb-dbapi-err-operationalerror}

Bases: DatabaseError

Exception raised for errors that are related to the database’s operation.

This exception is raised for errors that occur during database operation and aren't necessarily under the control of the programmer, including:

  • Unexpected disconnection from database
  • Database server not found or unreachable
  • Transaction processing failures
  • Memory allocation errors during processing
  • Disk space or resource exhaustion
  • Database server internal errors
  • Authentication or authorization failures

Raises

Exception Condition
OperationalError When database operations fail due to operational issues

:::note These errors are typically transient and may be resolved by retrying the operation or addressing system-level issues. :::

:::warning Warning Some operational errors may indicate serious system problems that require administrative intervention. :::


exception chdb.dbapi.err.ProgrammingError {#chdb-dbapi-err-programmingerror}

Bases: DatabaseError

Exception raised for programming errors in database operations.

This exception is raised when there are programming errors in the application’s database usage, including:

  • Table or column not found
  • Table or index already exists when creating
  • SQL syntax errors in statements
  • Wrong number of parameters specified in prepared statements
  • Invalid SQL operations (e.g., DROP on non-existent objects)
  • Incorrect usage of database API methods

Raises

Exception Condition
ProgrammingError When SQL statements or API usage contains errors

Examples

>>> # Table not found
>>> cursor.execute("SELECT * FROM nonexistent_table")
ProgrammingError: Table 'nonexistent_table' doesn't exist
>>> # SQL syntax error
>>> cursor.execute("SELCT * FROM users")
ProgrammingError: You have an error in your SQL syntax
>>> # Wrong parameter count
>>> cursor.execute("INSERT INTO users (name, age) VALUES (%s)", ('John',))
ProgrammingError: Column count doesn't match value count

exception chdb.dbapi.err.StandardError {#chdb-dbapi-err-standarderror}

Bases: Exception

Exception related to operation with chdb.

This is the base class for all chdb-related exceptions. It inherits from Python’s built-in Exception class and serves as the root of the exception hierarchy for database operations.

:::note This exception class follows the Python DB API 2.0 specification for database exception handling. :::


exception chdb.dbapi.err.Warning {#chdb-dbapi-err-warning}

Bases: StandardError

Exception raised for important warnings like data truncations while inserting, etc.

This exception is raised when the database operation completes but with important warnings that should be brought to the attention of the application. Common scenarios include:

  • Data truncation during insertion
  • Precision loss in numeric conversions
  • Character set conversion warnings

:::note This follows the Python DB API 2.0 specification for warning exceptions. :::


Module Constants {#module-constants}

chdb.dbapi.apilevel = '2.0' {#apilevel}

str(object=’’) -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object._\_str_\_() (if defined) or repr(object).

  • encoding defaults to ‘utf-8’.
  • errors defaults to ‘strict’.

chdb.dbapi.threadsafety = 1 {#threadsafety}

int([x]) -> integer
int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x._int_(). For floating-point numbers, this truncates towards zero.

If x isn't a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal.

>>> int(‘0b100’, base=0)
4

chdb.dbapi.paramstyle = 'format' {#paramstyle}

str(object=’’) -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object._str_() (if defined) or repr(object). encoding defaults to ‘utf-8’. errors defaults to ‘strict’.


Type Constants {#type-constants}

chdb.dbapi.STRING = frozenset({247, 253, 254}) {#string-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

chdb.dbapi.BINARY = frozenset({249, 250, 251, 252}) {#binary-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

chdb.dbapi.NUMBER = frozenset({0, 1, 3, 4, 5, 8, 9, 13}) {#number-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

chdb.dbapi.DATE = frozenset({10, 14}) {#date-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

chdb.dbapi.TIME = frozenset({11}) {#time-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

chdb.dbapi.TIMESTAMP = frozenset({7, 12}) {#timestamp-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

chdb.dbapi.DATETIME = frozenset({7, 12}) {#datetime-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

chdb.dbapi.ROWID = frozenset({}) {#rowid-type}

Extended frozenset for DB-API 2.0 type comparison.

This class extends frozenset to support DB-API 2.0 type comparison semantics. It allows for flexible type checking where individual items can be compared against the set using both equality and inequality operators.

This is used for type constants like STRING, BINARY, NUMBER, etc. to enable comparisons like “field_type == STRING” where field_type is a single type value.

Examples

>>> string_types = DBAPISet([FIELD_TYPE.STRING, FIELD_TYPE.VAR_STRING])
>>> FIELD_TYPE.STRING == string_types  # Returns True
>>> FIELD_TYPE.INT != string_types     # Returns True
>>> FIELD_TYPE.BLOB in string_types    # Returns False

Usage Examples

Basic Query Example:

import chdb.dbapi as dbapi

print("chdb driver version: {0}".format(dbapi.get_client_info()))

# Create connection and cursor
conn = dbapi.connect()
cur = conn.cursor()

# Execute query
cur.execute('SELECT version()')
print("description:", cur.description)
print("data:", cur.fetchone())

# Clean up
cur.close()
conn.close()

Working with Data:

import chdb.dbapi as dbapi

conn = dbapi.connect()
cur = conn.cursor()

# Create table
cur.execute("""
    CREATE TABLE employees (
        id UInt32,
        name String,
        department String,
        salary Decimal(10,2)
    ) ENGINE = Memory
""")

# Insert data
cur.execute("""
    INSERT INTO employees VALUES
    (1, 'Alice', 'Engineering', 75000.00),
    (2, 'Bob', 'Marketing', 65000.00),
    (3, 'Charlie', 'Engineering', 80000.00)
""")

# Query data
cur.execute("SELECT * FROM employees WHERE department = 'Engineering'")

# Fetch results
print("Column names:", [desc[0] for desc in cur.description])
for row in cur.fetchall():
    print(row)

conn.close()

Connection Management:

import chdb.dbapi as dbapi

# In-memory database (default)
conn1 = dbapi.connect()

# Persistent database file
conn2 = dbapi.connect("./my_database.chdb")

# Connection with parameters
conn3 = dbapi.connect("./my_database.chdb?log-level=debug&verbose")

# Read-only connection
conn4 = dbapi.connect("./my_database.chdb?mode=ro")

# Automatic connection cleanup
with dbapi.connect("test.chdb") as conn:
    cur = conn.cursor()
    cur.execute("SELECT count() FROM numbers(1000)")
    result = cur.fetchone()
    print(f"Count: {result[0]}")
    cur.close()

Best Practices

  1. Connection Management: Always close connections and cursors when done
  2. Context Managers: Use with statements for automatic cleanup
  3. Batch Processing: Use fetchmany() for large result sets
  4. Error Handling: Wrap database operations in try-except blocks
  5. Parameter Binding: Use parameterized queries when possible
  6. Memory Management: Avoid fetchall() for very large datasets

:::note

  • chDB’s DB-API 2.0 interface is compatible with most Python database tools
  • The interface provides Level 1 thread safety (threads may share modules but not connections)
  • Connection strings support the same parameters as chDB sessions
  • All standard DB-API 2.0 exceptions are supported :::

:::warning Warning

  • Always close cursors and connections to avoid resource leaks
  • Large result sets should be processed in batches
  • Parameter binding syntax follows format style: %s :::

Python UDF (User-Defined Functions) {#user-defined-functions}

chDB supports native Python UDFs that run in-process with full type safety, automatic type inference, and configurable NULL/exception handling. Python functions registered as UDFs can be called directly from SQL queries.

chdb.create_function {#create-function}

Register a Python function as a chDB SQL function.

Syntax

chdb.create_function(name, func, arg_types=None, return_type=None, *, on_null=None, on_error=None)

Parameters

Parameter Type Default Description
name str (required) Name of the SQL function to register
func callable (required) Python function to register
arg_types list or None None List of argument types. If None, inferred from type annotations
return_type type or None None Return type. If None, inferred from the function’s return annotation
on_null str or NullHandling None (skip) How to handle NULL inputs: "skip" or "pass". Keyword-only
on_error str or ExceptionHandling None (propagate) How to handle exceptions: "propagate" or "ignore". Keyword-only

Each type parameter (arg_types elements and return_type) accepts:

  • A ChdbType constant: INT64, STRING, FLOAT64, etc.
  • A ClickHouse type string: "Int64", "String", "DateTime64(6)", "DateTime(‘UTC’)", etc.

Example

from chdb import create_function, drop_function, query
from chdb.sqltypes import INT64, STRING

create_function("strlen", len, arg_types=[STRING], return_type=INT64)
print(query("SELECT strlen(‘hello’)"))  # 5

drop_function("strlen")

chdb.drop_function {#drop-function}

Remove a previously registered Python UDF.

Syntax

chdb.drop_function(name)

Parameters

Parameter Type Description
name str Name of the SQL function to remove

@func Decorator {#func-decorator}

Decorator to register a Python function as a chDB SQL function. The function remains callable as normal Python and is simultaneously available in SQL queries by its __name__.

Syntax

from chdb import func

@func(arg_types=None, return_type=None, *, on_null=None, on_error=None)
def my_function(...):
    ...

Parameters

Same as create_function (excluding name and func, which are derived from the decorated function).

Examples

from chdb import func, query
from chdb.sqltypes import INT64, STRING

# Explicit types
@func([INT64, INT64], INT64)
def add(a, b):
    return a + b

# Types inferred from annotations
@func()
def multiply(a: int, b: int) -> int:
    return a * b

# Mixed: explicit return_type, inferred arg_types
@func(return_type=STRING)
def greet(name: str):
    return f"Hello, {name}!"

print(query("SELECT add(12, 22)"))       # 34
print(query("SELECT multiply(3, 7)"))    # 21
print(query("SELECT greet(‘world’)"))    # Hello, world!

Type System {#udf-type-system}

Available Types {#udf-available-types}

Import types from chdb.sqltypes:

from chdb.sqltypes import (
    BOOL,
    INT8, INT16, INT32, INT64, INT128, INT256,
    UINT8, UINT16, UINT32, UINT64, UINT128, UINT256,
    FLOAT32, FLOAT64,
    STRING,
    DATE, DATE32, DATETIME, DATETIME64,
)

Automatic Type Mapping {#udf-automatic-type-mapping}

When types are inferred from Python annotations, the following mapping is used:

Python Type ClickHouse Type
bool Bool
int Int64
float Float64
str String
bytes String
bytearray String
datetime.date Date
datetime.datetime DateTime64(6)

Type Specification Methods {#udf-type-specification-methods}

Types can be specified in multiple ways:

from chdb import create_function
from chdb.sqltypes import INT64

# 1. ChdbType constants
create_function("f1", lambda x: x, arg_types=[INT64], return_type=INT64)

# 2. ClickHouse type strings
create_function("f2", lambda x: x, arg_types=["Int64"], return_type="Int64")

# 3. Parameterized type strings
create_function("f3", lambda x: x, arg_types=["DateTime(‘UTC’)"], return_type="DateTime(‘UTC’)")

# 4. Python types (inferred from annotations)
@func()
def f4(x: int) -> int:
    return x

NULL Handling {#udf-null-handling}

Control how NULL values are handled with the on_null parameter.

Value Enum Behavior
"skip" NullHandling.SKIP Return NULL without calling the function (default)
"pass" NullHandling.PASS Convert NULL to None and call the function
from chdb import func, query, NullHandling

# Default: NULL in → NULL out, function not called
@func(return_type="Int64")
def add_one(x: int) -> int:
    return x + 1

print(query("SELECT add_one(NULL)"))  # NULL

# Pass NULL as None
@func(return_type="Int64", on_null="pass")
def null_safe(x):
    return 0 if x is None else x + 1

print(query("SELECT null_safe(NULL)"))  # 0

Exception Handling {#udf-exception-handling}

Control how exceptions are handled with the on_error parameter.

Value Enum Behavior
"propagate" ExceptionHandling.PROPAGATE Raise the exception as a SQL error (default)
"ignore" ExceptionHandling.IGNORE Catch the exception and return NULL
from chdb import func, query

# Default: exception propagates
@func(arg_types=["Int64", "Int64"], return_type="Int64")
def divide(a, b):
    return a // b

print(query("SELECT divide(1, 0)"))  # Error: division by zero

# Ignore: exception → NULL
@func(arg_types=["Int64", "Int64"], return_type="Int64", on_error="ignore")
def safe_divide(a, b):
    return a // b

print(query("SELECT safe_divide(1, 0)"))   # NULL
print(query("SELECT safe_divide(10, 2)"))  # 5

DateTime and Timezone Support {#udf-datetime}

UDFs fully support Date, Date32, DateTime, and DateTime64 types with timezone awareness.

from chdb import func, query
from datetime import datetime, timedelta, date

@func(arg_types=["DateTime(‘UTC’)"], return_type="DateTime(‘UTC’)")
def add_one_hour(dt):
    return dt + timedelta(hours=1)

@func()
def get_year(d: date) -> int:
    return d.year

print(query("SELECT add_one_hour(toDateTime('2024-01-01 12:00:00', 'UTC'))"))  # 2024-01-01 13:00:00
print(query("SELECT get_year(toDate('2024-06-15'))"))  # 2024
  • Input ClickHouse DateTime/DateTime64 values are converted to Python datetime objects with timezone info
  • Output Python datetime objects preserve timezone info when returned to ClickHouse
  • The DATETIME64 type from chdb.sqltypes defaults to scale 6 (microseconds), equivalent to DateTime64(6)

Legacy API {#legacy-udf}

:::warning Deprecated The @chdb_udf decorator is the legacy subprocess-based UDF mechanism. It is still available but the native Python UDF API (@func / create_function) is recommended for its easier and more user-friendly API. See the Python UDF Guide for the recommended approach. :::

chdb.udf.chdb_udf {#chdb-udf}

Decorator for chDB Python UDF(User Defined Function).

Syntax

chdb.udf.chdb_udf(return_type='String')

Parameters

Parameter Type Default Description
return_type str "String" Return type of the function. Should be one of the ClickHouse data types

Notes

  1. The function should be stateless. Only UDFs are supported, not UDAFs.
  2. Default return type is String. The return type should be one of the ClickHouse data types.
  3. The function should take in arguments of type String. All arguments are strings.
  4. The function will be called for each line of input.
  5. The function should be pure python function. Import all modules used IN THE FUNCTION.
  6. Python interpreter used is the same as the one used to run the script.

Example

@chdb_udf()
def sum_udf(lhs, rhs):
    return int(lhs) + int(rhs)

@chdb_udf()
def func_use_json(arg):
    import json
    # ... use json module

chdb.udf.generate_udf {#generate-udf}

Generate UDF configuration and executable script files.

This function creates the necessary files for a User Defined Function (UDF) in chDB:

  1. A Python executable script that processes input data
  2. An XML configuration file that registers the UDF with ClickHouse

Syntax

chdb.udf.generate_udf(func_name, args, return_type, udf_body)

Parameters

Parameter Type Description
func_name str Name of the UDF function
args list List of argument names for the function
return_type str ClickHouse return type for the function
udf_body str Python source code body of the UDF function

:::note This function is typically called by the @chdb_udf decorator and shouldn't be called directly by users. :::


Utilities {#utilities}

Utility functions and helpers for chDB.

This module contains various utility functions for working with chDB, including data type inference, data conversion helpers, and debugging utilities.


chdb.utils.convert_to_columnar {#convert-to-columnar}

Converts a list of dictionaries into a columnar format.

This function takes a list of dictionaries and converts it into a dictionary where each key corresponds to a column and each value is a list of column values. Missing values in the dictionaries are represented as None.

Syntax

chdb.utils.convert_to_columnar(items: List[Dict[str, Any]]) → Dict[str, List[Any]]

Parameters

Parameter Type Description
items List[Dict[str, Any]] A list of dictionaries to convert

Returns

Return Type Description
Dict[str, List[Any]] A dictionary with keys as column names and values as lists of column values

Example

>>> items = [
...     {"name": "Alice", "age": 30, "city": "New York"},
...     {"name": "Bob", "age": 25},
...     {"name": "Charlie", "city": "San Francisco"}
... ]
>>> convert_to_columnar(items)
{
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [30, 25, None],
    'city': ['New York', None, 'San Francisco']
}

chdb.utils.flatten_dict {#flatten-dict}

Flattens a nested dictionary.

This function takes a nested dictionary and flattens it, concatenating nested keys with a separator. Lists of dictionaries are serialized to JSON strings.

Syntax

chdb.utils.flatten_dict(d: Dict[str, Any], parent_key: str = '', sep: str = '_') → Dict[str, Any]

Parameters

Parameter Type Default Description
d Dict[str, Any] required The dictionary to flatten
parent_key str "" The base key to prepend to each key
sep str "_" The separator to use between concatenated keys

Returns

Return Type Description
Dict[str, Any] A flattened dictionary

Example

>>> nested_dict = {
...     "a": 1,
...     "b": {
...         "c": 2,
...         "d": {
...             "e": 3
...         }
...     },
...     "f": [4, 5, {"g": 6}],
...     "h": [{"i": 7}, {"j": 8}]
... }
>>> flatten_dict(nested_dict)
{
    'a': 1,
    'b_c': 2,
    'b_d_e': 3,
    'f_0': 4,
    'f_1': 5,
    'f_2_g': 6,
    'h': '[{"i": 7}, {"j": 8}]'
}

chdb.utils.infer_data_type {#infer-data-type}

Infers the most suitable data type for a list of values.

This function examines a list of values and determines the most appropriate data type that can represent all the values in the list. It considers integer, unsigned integer, decimal, and float types, and defaults to “string” if the values can't be represented by any numeric type or if all values are None.

Syntax

chdb.utils.infer_data_type(values: List[Any]) → str

Parameters

Parameter Type Description
values List[Any] A list of values to analyze. The values can be of any type

Returns

Return Type Description
str A string representing the inferred data type. Possible return values are: ”int8”, “int16”, “int32”, “int64”, “int128”, “int256”, “uint8”, “uint16”,“uint32”, “uint64”, “uint128”, “uint256”, “decimal128”, “decimal256”, “float32”, “float64”, or “string”.

:::note

  • If all values in the list are None, the function returns “string”.
  • If any value in the list is a string, the function immediately returns “string”.
  • The function assumes that numeric values can be represented as integers, decimals, or floats based on their range and precision. :::

chdb.utils.infer_data_types {#infer-data-types}

Infers data types for each column in a columnar data structure.

This function analyzes the values in each column and infers the most suitable data type for each column, based on a sample of the data.

Syntax

chdb.utils.infer_data_types`(column_data: Dict[str, List[Any]], n_rows: int = 10000) → List[tuple]

Parameters

Parameter Type Default Description
column_data Dict[str, List[Any]] required A dictionary where keys are column names and values are lists of column values
n_rows int 10000 The number of rows to sample for type inference

Returns

Return Type Description
List[tuple] A list of tuples, each containing a column name and its inferred data type

Abstract Base Classes {#abstract-base-classes}

class chdb.rwabc.PyReader(data: Any)` {#pyreader}

Bases: ABC

class chdb.rwabc.PyReader(data: Any)

abstractmethod read {#read}

Read a specified number of rows from the given columns and return a list of objects, where each object is a sequence of values for a column.

abstractmethod (col_names: List[str], count: int) → List[Any]

Parameters

Parameter Type Description
col_names List[str] List of column names to read
count int Maximum number of rows to read

Returns

Return Type Description
List[Any] List of sequences, one for each column

class chdb.rwabc.PyWriter {#pywriter}

Bases: ABC

class chdb.rwabc.PyWriter(col_names: List[str], types: List[type], data: Any)

abstractmethod finalize {#finalize}

Assemble and return the final data from blocks. Must be implemented by subclasses.

abstractmethod finalize() → bytes

Returns

Return Type Description
bytes The final serialized data

abstractmethod write {#write}

Save columns of data to blocks. Must be implemented by subclasses.

abstractmethod write(col_names: List[str], columns: List[List[Any]]) → None

Parameters

Parameter Type Description
col_names List[str] List of column names that are being written
columns List[List[Any]] List of columns data, each column is represented by a list

Exception Handling {#exception-handling}

class chdb.ChdbError {#chdberror}

Bases: Exception

Base exception class for chDB-related errors.

This exception is raised when chDB query execution fails or encounters an error. It inherits from the standard Python Exception class and provides error information from the underlying ClickHouse engine.

The exception message typically contains detailed error information from ClickHouse, including syntax errors, type mismatches, missing tables/columns, and other query execution issues.

Variables

Variable Type Description
args - Tuple containing the error message and any additional arguments

Examples

>>> try:
...     result = chdb.query("SELECT * FROM non_existent_table")
... except chdb.ChdbError as e:
...     print(f"Query failed: {e}")
Query failed: Table 'non_existent_table' doesn't exist
>>> try:
...     result = chdb.query("SELECT invalid_syntax FROM")
... except chdb.ChdbError as e:
...     print(f"Syntax error: {e}")
Syntax error: Syntax error near 'FROM'

:::note This exception is automatically raised by chdb.query() and related functions when the underlying ClickHouse engine reports an error. You should catch this exception when handling potentially failing queries to provide appropriate error handling in your application. :::

Version Information {#version-information}

chdb.chdb_version = ('3', '6', '0') {#chdb-version}

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.


chdb.engine_version = '25.5.2.1' {#engine-version}

str(object=’’) -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object._str_() (if defined) or repr(object).

  • encoding defaults to ‘utf-8’.
  • errors defaults to ‘strict’.

chdb.__version__ = '3.6.0' {#version}

str(object=’’) -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object._str_() (if defined) or repr(object).

  • encoding defaults to ‘utf-8’.
  • errors defaults to ‘strict’.