Skip to content

Commit e1bb698

Browse files
Maximiliaan72Maximiliaanpre-commit-ci[bot]
authored
VDK-DuckDB: Introducing a new database plugin (#2561)
<VDK-plugin for the DuckDB Database.> # Why? This plugin has been created to check the differences how databases are implemented in a VDK plugin. The aim of this change is to add DuckDB to the list of database plugins. # What? I made a new DuckDB-plugin based on the SQL-plugin. For making the DuckDB work I had to change several things. For the Plugin I had to change: Module Imports, Configuration, Connection Factory Method, Ingester Method and Library Specific Code. For the Ingestion I had to change: Import Statements, Configuration and connection, Target Path, Data Type Conversion, Query compatibility, General naming and references, Connection Handling. For the Configuration I had to change: Constants, Class Names, Method for Database File, Configuration Key, Descriptions. For the Connection I had to change: Classes, Class documentation, File Name, Import Statement, Connection Method, Isolation Level. # How has this been tested? I've made a test program that first relied on the 'auto create feature' of the plugin. But after several errors I realized that probably the auto create function wasn't functioning as expected. So now the test doesn't rely on this function and instead doesn't create the table in advance, disables auto create, checks if the table is missing and makes sure the ingestion logic remains. # What type of change are you making? This is a new feature. Now the users can make use of the DuckDB database from the list of database plugins. --------- Co-authored-by: Maximiliaan <maxsobry@gmai.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 55ceb1a commit e1bb698

File tree

10 files changed

+519
-0
lines changed

10 files changed

+519
-0
lines changed
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Copyright 2021-2023 VMware, Inc.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
image: "python:3.7"
5+
6+
.build-vdk-duckdb:
7+
variables:
8+
PLUGIN_NAME: vdk-duckdb
9+
extends: .build-plugin
10+
11+
build-py37-vdk-duckdb:
12+
extends: .build-vdk-duckdb
13+
image: "python:3.7"
14+
15+
build-py311-vdk-duckdb:
16+
extends: .build-vdk-duckdb
17+
image: "python:3.11"
18+
19+
release-vdk-duckdb:
20+
variables:
21+
PLUGIN_NAME: vdk-duckdb
22+
extends: .release-plugin
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# duckdb
2+
3+
Simple description of my project.
4+
5+
TODO: what the project is about, what is its purpose
6+
7+
8+
## Usage
9+
10+
```
11+
pip install vdk-duckdb
12+
```
13+
14+
### Configuration
15+
16+
(`vdk config-help` is useful command to browse all config options of your installation of vdk)
17+
18+
| Name | Description | (example) Value |
19+
|---|---|---|
20+
| dummy_config_key | Dummy configuration | "Dummy" |
21+
22+
### Example
23+
24+
TODO
25+
26+
### Build and testing
27+
28+
```
29+
pip install -r requirements.txt
30+
pip install -e .
31+
pytest
32+
```
33+
34+
In VDK repo [../build-plugin.sh](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/build-plugin.sh) script can be used also.
35+
36+
37+
#### Note about the CICD:
38+
39+
.plugin-ci.yaml is needed only for plugins part of [Versatile Data Kit Plugin repo](https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins).
40+
41+
The CI/CD is separated in two stages, a build stage and a release stage.
42+
The build stage is made up of a few jobs, all which inherit from the same
43+
job configuration and only differ in the Python version they use (3.7, 3.8, 3.9 and 3.10).
44+
They run according to rules, which are ordered in a way such that changes to a
45+
plugin's directory trigger the plugin CI, but changes to a different plugin does not.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# this file is used to provide testing requirements
2+
# for requirements (dependencies) needed during and after installation of the plugin see (and update) setup.py install_requires section
3+
4+
click
5+
duckdb
6+
pytest
7+
8+
pytest
9+
pytest-cov
10+
11+
vdk-core
12+
vdk-test-utils
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Copyright 2021-2023 VMware, Inc.
2+
# SPDX-License-Identifier: Apache-2.0
3+
import pathlib
4+
5+
import setuptools
6+
7+
"""
8+
Builds a package with the help of setuptools in order for this package to be imported in other projects
9+
"""
10+
11+
__version__ = "0.1.0"
12+
13+
setuptools.setup(
14+
name="vdk-duckdb",
15+
version=__version__,
16+
url="https://github.com/vmware/versatile-data-kit",
17+
description="DuckDB Plugin for VDK.",
18+
long_description=pathlib.Path("README.md").read_text(),
19+
long_description_content_type="text/markdown",
20+
install_requires=["vdk-core", "tabulate"],
21+
package_dir={"": "src"},
22+
packages=setuptools.find_namespace_packages(where="src"),
23+
# This is the only vdk plugin specific part
24+
# Define entry point called "vdk.plugin.run" with name of plugin and module to act as entry point.
25+
entry_points={"vdk.plugin.run": ["vdk-duckdb = vdk.plugin.duckdb.duckdb_plugin"]},
26+
classifiers=[
27+
"Development Status :: 2 - Pre-Alpha",
28+
"License :: OSI Approved :: Apache Software License",
29+
"Programming Language :: Python :: 3.7",
30+
"Programming Language :: Python :: 3.8",
31+
"Programming Language :: Python :: 3.9",
32+
"Programming Language :: Python :: 3.10",
33+
"Programming Language :: Python :: 3.11",
34+
],
35+
project_urls={
36+
"Documentation": "https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/vdk-duckdb",
37+
"Source Code": "https://github.com/vmware/versatile-data-kit/tree/main/projects/vdk-plugins/vdk-duckdb",
38+
"Bug Tracker": "https://github.com/vmware/versatile-data-kit/issues/new/choose",
39+
},
40+
)
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Copyright 2021-2023 VMware, Inc.
2+
# SPDX-License-Identifier: Apache-2.0
3+
import pathlib
4+
import tempfile
5+
6+
from vdk.internal.core.config import Configuration
7+
from vdk.internal.core.config import ConfigurationBuilder
8+
9+
DUCKDB_FILE = "DUCKDB_FILE"
10+
DUCKDB_INGEST_AUTO_CREATE_TABLE_ENABLED = "DUCKDB_INGEST_AUTO_CREATE_TABLE_ENABLED"
11+
12+
13+
class DuckDBConfiguration:
14+
def __init__(self, configuration: Configuration):
15+
self.__config = configuration
16+
17+
def get_auto_create_table_enabled(self) -> bool:
18+
return self.__config.get_value(DUCKDB_INGEST_AUTO_CREATE_TABLE_ENABLED)
19+
20+
def get_duckdb_file(self):
21+
duckdb_file_path = self.__config.get_value(DUCKDB_FILE) or "default_path.duckdb"
22+
return pathlib.Path(duckdb_file_path)
23+
24+
25+
def add_definitions(config_builder: ConfigurationBuilder):
26+
config_builder.add(
27+
key=DUCKDB_FILE,
28+
default_value=str(
29+
pathlib.Path(tempfile.gettempdir()).joinpath("vdk-duckdb.db")
30+
),
31+
description="The file of the DuckDB database.",
32+
)
33+
config_builder.add(
34+
key=DUCKDB_INGEST_AUTO_CREATE_TABLE_ENABLED,
35+
default_value=True,
36+
description="If set to true, auto create table if it does not exist during ingestion."
37+
"This is only applicable when ingesting data into DuckDB (ingest method is DuckDB).",
38+
)
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Copyright 2021-2023 VMware, Inc.
2+
# SPDX-License-Identifier: Apache-2.0
3+
import logging
4+
import pathlib
5+
import tempfile
6+
from typing import List
7+
8+
import duckdb
9+
from vdk.internal.util.decorators import closing_noexcept_on_close
10+
11+
log = logging.getLogger(__name__)
12+
13+
14+
class DuckDBConnection:
15+
"""
16+
Create file based DuckDB database.
17+
"""
18+
19+
def __init__(
20+
self,
21+
duckdb_file: pathlib.Path = pathlib.Path(tempfile.gettempdir()).joinpath(
22+
"vdk-duckdb.db"
23+
),
24+
):
25+
self.__db_file = duckdb_file
26+
27+
def new_connection(self):
28+
log.info(
29+
f"Creating new connection against local file database located at: {self.__db_file}"
30+
)
31+
return duckdb.connect(f"{self.__db_file}")
32+
33+
def execute_query(self, query: str) -> List[List]:
34+
conn = self.new_connection()
35+
with closing_noexcept_on_close(conn.cursor()) as cursor:
36+
cursor.execute(query)
37+
return cursor.fetchall()
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Copyright 2021-2023 VMware, Inc.
2+
# SPDX-License-Identifier: Apache-2.0
3+
import logging
4+
import pathlib
5+
6+
import click
7+
import duckdb
8+
from tabulate import tabulate
9+
from vdk.api.plugin.hook_markers import hookimpl
10+
from vdk.internal.builtin_plugins.run.job_context import JobContext
11+
from vdk.internal.core.config import ConfigurationBuilder
12+
from vdk.internal.util.decorators import closing_noexcept_on_close
13+
14+
log = logging.getLogger(__name__)
15+
"""
16+
Include the plugins implementation. For example:
17+
"""
18+
19+
20+
@hookimpl
21+
def vdk_configure(config_builder: ConfigurationBuilder) -> None:
22+
"""Define the configuration settings needed for duckdb"""
23+
config_builder.add("DUCKDB_FILE", default_value="mydb.duckdb")
24+
25+
26+
@hookimpl
27+
def initialize_job(context: JobContext) -> None:
28+
conf = context.core_context.configuration
29+
duckdb_file = conf.get_value("DUCKDB_FILE")
30+
31+
context.connections.add_open_connection_factory_method(
32+
"DUCKDB", lambda: duckdb.connect(database=duckdb_file)
33+
)
34+
35+
36+
@click.command(
37+
name="duckdb-query", help="Execute a DuckDB query against a local DUCKDB database."
38+
)
39+
@click.option("-q", "--query", type=click.STRING, required=True)
40+
@click.pass_context
41+
def duckdb_query(ctx: click.Context, query):
42+
conf = ctx.obj.configuration
43+
duckdb_file = conf.get_value("DUCKDB_FILE")
44+
conn = duckdb.connect(database=duckdb_file)
45+
46+
with closing_noexcept_on_close(conn.cursor()) as cursor:
47+
cursor.execute(query)
48+
column_names = (
49+
[column_info[0] for column_info in cursor.description]
50+
if cursor.description
51+
else () # same as the default value for the headers parameters of the tabulate function
52+
)
53+
res = cursor.fetchall()
54+
click.echo(tabulate(res, headers=column_names))
55+
56+
57+
@hookimpl
58+
def vdk_command_line(root_command: click.Group):
59+
"""Here we extend the vdk with a new command called "duckdb-query"
60+
enabling users to execute"""
61+
root_command.add_command(duckdb_query)

0 commit comments

Comments
 (0)