Skip to content
49 changes: 35 additions & 14 deletions mindsdb/integrations/handlers/duckdb_handler/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,62 @@
# DuckDB Handler
# DuckDB Handler
This is the implementation of the DuckDB handler for MindsDB.

## DuckDB
DuckDB is an open-source analytical database system. DuckDB is designed for fast execution of analytical queries.
There are no external dependencies and the DBMS runs completly embedded within a host process, similar to SQLite.
There are no external dependencies, and the DBMS runs completely embedded within a host process, similar to SQLite.
DuckDB provides a rich SQL dialect with support for complex queries with transactional guarantees (ACID).

## Implementation
This handler was implemented using the `duckdb` python client library.
## Implementation
This handler was implemented using the `duckdb` Python client library.

### DuckDB version
The DuckDB handler is currently using the `0.7.1.dev187` pre-relase version of the python client library. In case of issues, make sure your DuckDB database is compatible with this version. See the DuckDB handler [requirements.txt](requirements.txt) for details.

The DuckDB handler is currently using the `1.1.3` release version of the Python client library. In case of issues, make sure your DuckDB or MotherDuck database is compatible with this version. See the DuckDB handler [requirements.txt](requirements.txt) for details.

The required arguments to establish a connection are:

* `database`: the name of the DuckDB database file. May also be set to `:memory:`, which will create an in-memory database.
* `database`: the name of the DuckDB or MotherDuck database file.
- Set to `:memory:` to create an in-memory database.
- For MotherDuck, specify the database and motherduck_token.

The optional arguments are:
Additional optional arguments include:

* `motherduck_token`: a token to authenticate with MotherDuck.
* `read_only`: a flag that specifies if the connection should be made in read-only mode.
This is required if multiple processes want to access the same database file at the same time.

- This is required if multiple processes want to access the same database file simultaneously.

## Usage
In order to make use of this handler and connect to a DuckDB database in MindsDB, the following syntax can be used:
To connect to a DuckDB or MotherDuck database in MindsDB, the following syntax can be used:

### DuckDB Example
```sql
CREATE DATABASE duckdb_datasource
WITH
engine='duckdb',
parameters={
"database":"db.duckdb"
"database": "db.duckdb"
};
```

Now, you can use this established connection to query your database as follows:
### MotherDuck Example
```sql
CREATE DATABASE md_datasource
WITH
engine='duckdb',
parameters={
"database": "sample_data",
"motherduck_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9."
};
```

Once the connection is established, you can query the database:

```sql
SELECT * FROM duckdb_datasource.my_table;
```
```

For MotherDuck:
```sql
SELECT * FROM md_datasource.movies;
```

By leveraging these features, MindsDB provides powerful integrations with DuckDB and MotherDuck for scalable analytics.
22 changes: 16 additions & 6 deletions mindsdb/integrations/handlers/duckdb_handler/connection_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,26 @@

from mindsdb.integrations.libs.const import HANDLER_CONNECTION_ARG_TYPE as ARG_TYPE


connection_args = OrderedDict(
database={
'type': ARG_TYPE.STR,
'description': 'The database file to read and write from. The special value :memory: (default) can be used to create an in-memory database.',
"type": ARG_TYPE.STR,
"description": (
"The database file to read and write from. The special value :memory: (default) "
"can be used to create an in-memory database."
),
},
motherduck_token={
"type": ARG_TYPE.STR,
"description": "Motherduck access token if want to connect motherduck database.",
},
read_only={
'type': ARG_TYPE.BOOL,
'description': 'A flag that specifies if the connection should be made in read-only mode.',
"type": ARG_TYPE.BOOL,
"description": ("A flag that specifies if the connection should be made in read-only mode."),
},
)

connection_args_example = OrderedDict(database='db.duckdb', read_only=True)
connection_args_example = OrderedDict(
database="sample_data",
read_only=True,
motherduck_token="ey...enKoT.SsEcCa......",
)
37 changes: 19 additions & 18 deletions mindsdb/integrations/handlers/duckdb_handler/duckdb_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,14 @@
class DuckDBHandler(DatabaseHandler):
"""This handler handles connection and execution of the DuckDB statements."""

name = 'duckdb'
name = "duckdb"

def __init__(self, name: str, **kwargs):
super().__init__(name)
self.parser = parse_sql
self.dialect = 'postgresql'
self.connection_data = kwargs.get('connection_data')
self.renderer = SqlalchemyRender('postgres')
self.dialect = "postgresql"
self.connection_data = kwargs.get("connection_data")
self.renderer = SqlalchemyRender("postgres")

self.connection = None
self.is_connected = False
Expand All @@ -44,10 +44,17 @@ def connect(self) -> DuckDBPyConnection:

if self.is_connected is True:
return self.connection
motherduck_token = self.connection_data.get("motherduck_token")
if motherduck_token:
database = (
f"md:{self.connection_data.get('database')}?motherduck_token={motherduck_token}&attach_mode=single"
Comment on lines +47 to +50
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: The motherduck_token is embedded directly in the database connection string as a URL query parameter (?motherduck_token=...), which means it will appear in plaintext in logs, error messages (e.g., the check_connection error log includes the database string), and any tracing/debugging output. This leaks a sensitive credential.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In mindsdb/integrations/handlers/duckdb_handler/duckdb_handler.py, lines 47-50, the motherduck_token is embedded in the database connection string URL. This causes the token to appear in log output (e.g., check_connection logs `self.connection_data['database']` on error, and native_query logs it too). Consider passing the token via the duckdb `config` parameter (e.g., `duckdb.connect(database='md:dbname', config={'motherduck_token': token})`) instead of embedding it in the URL, to avoid credential exposure in logs and error messages.

)
else:
database = self.connection_data.get("database")

args = {
'database': self.connection_data.get('database'),
'read_only': self.connection_data.get('read_only'),
"database": database,
"read_only": self.connection_data.get("read_only"),
}

self.connection = duckdb.connect(**args)
Expand Down Expand Up @@ -78,9 +85,7 @@ def check_connection(self) -> StatusResponse:
self.connect()
response.success = True
except Exception as e:
logger.error(
f'Error connecting to DuckDB {self.connection_data["database"]}, {e}!'
)
logger.error(f"Error connecting to DuckDB {self.connection_data['database']}, {e}!")
response.error_message = str(e)
finally:
if response.success is True and need_to_close:
Comment on lines 85 to 91
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate Code: ⚠️ Duplicate Code Detected (Similarity: 93%)

This function check_connection duplicates existing code.

📍 Original Location:

mindsdb/integrations/handlers/databend_handler/databend_handler.py:85-107

Function: check_connection

💡 Recommendation:
Implement a default check_connection() on the DatabaseHandler base class that follows this exact try/except/finally template, calling the subclass's connect(). Subclasses that need extra checks (e.g., SQLiteHandler's file existence guard) can override or call super().check_connection() and add pre-conditions. This would eliminate the boilerplate from every SQL handler.

Consider importing and reusing the existing function instead of duplicating the logic.

Expand Down Expand Up @@ -111,17 +116,13 @@ def native_query(self, query: str) -> Response:
if result:
response = Response(
RESPONSE_TYPE.TABLE,
data_frame=pd.DataFrame(
result, columns=[x[0] for x in cursor.description]
),
data_frame=pd.DataFrame(result, columns=[x[0] for x in cursor.description]),
)
else:
connection.commit()
response = Response(RESPONSE_TYPE.OK)
except Exception as e:
logger.error(
f'Error running query: {query} on {self.connection_data["database"]}!'
)
logger.error(f"Error running query: {query} on {self.connection_data['database']}!")
response = Response(RESPONSE_TYPE.ERROR, error_message=str(e))

cursor.close()
Expand Down Expand Up @@ -150,10 +151,10 @@ def get_tables(self) -> Response:
Response: Names of the tables in the database.
"""

q = 'SHOW TABLES;'
q = "SHOW TABLES;"
result = self.native_query(q)
df = result.data_frame
result.data_frame = df.rename(columns={df.columns[0]: 'table_name'})
result.data_frame = df.rename(columns={df.columns[0]: "table_name"})
return result

def get_columns(self, table_name: str) -> Response:
Expand All @@ -166,5 +167,5 @@ def get_columns(self, table_name: str) -> Response:
Response: Details of the table.
"""

query = f'DESCRIBE {table_name};'
query = f"DESCRIBE {table_name};"
Comment on lines 167 to +170
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate Code: ⚠️ Duplicate Code Detected (Similarity: 95%)

This function get_columns duplicates existing code.

📍 Original Location:

mindsdb/integrations/handlers/clickhouse_handler/clickhouse_handler.py:161-167

Function: get_columns

💡 Recommendation:
Consider a base-class or mixin get_columns() that issues a configurable describe query (e.g., a per-dialect template). For DuckDB and ClickHouse, the template is DESCRIBE {table_name}. Handlers that need extra column renaming (e.g., DatabendHandler renames Field→column_name) can override with a super() call followed by a rename step.

Consider importing and reusing the existing function instead of duplicating the logic.

return self.native_query(query)
Loading