Skip to content

Commit 2ab66fb

Browse files
haritamarjc00keclaudedevin-ai-integration[bot]
authored
feat: add Vertica adapter support (#963)
* Remove extra newlines that Vertica could not parse Without trimming the leading and trailing newlines, Vertica would fail to parse the compiled SQL. For example, `models/edr/dbt_artifacts/dbt_columns` compiles the following SQL, via `elementary.get_dbt_columns_empty_table_query`, `empty_table` and `empty_column`: ```sql select * from ( select cast('dummy_string' as varchar(4096)) as unique_id , cast('dummy_string' as varchar(4096)) as parent_unique_id , cast('dummy_string' as varchar(4096)) as name , cast('dummy_string' as varchar(4096)) as data_type , cast('this_is_just_a_long_dummy_string' as varchar(4096)) as tags , cast('this_is_just_a_long_dummy_string' as varchar(4096)) as meta , cast('dummy_string' as varchar(4096)) as database_name , cast('dummy_string' as varchar(4096)) as schema_name , cast('dummy_string' as varchar(4096)) as table_name , cast('this_is_just_a_long_dummy_string' as varchar(4096)) as description , cast('dummy_string' as varchar(4096)) as resource_type , cast('dummy_string' as varchar(4096)) as generated_at , cast('dummy_string' as varchar(4096)) as metadata_hash ) as empty_table where 1 = 0 ``` which would cause ``` SQL Error [4856] [42601]: [Vertica][VJDBC](4856) ERROR: Syntax error at or near ")" at character 1 ``` By trimming the newlines, the SQL is much tighter: ```sql select * from ( select cast('dummy_string' as varchar(4096)) as unique_id, cast('dummy_string' as varchar(4096)) as parent_unique_id, cast('dummy_string' as varchar(4096)) as name, cast('dummy_string' as varchar(4096)) as data_type, cast('this_is_just_a_long_dummy_string' as varchar(4096)) as tags, cast('this_is_just_a_long_dummy_string' as varchar(4096)) as meta, cast('dummy_string' as varchar(4096)) as database_name, cast('dummy_string' as varchar(4096)) as schema_name, cast('dummy_string' as varchar(4096)) as table_name, cast('this_is_just_a_long_dummy_string' as varchar(4096)) as description, cast('dummy_string' as varchar(4096)) as resource_type, cast('dummy_string' as varchar(4096)) as generated_at, cast('dummy_string' as varchar(4096)) as metadata_hash ) as empty_table where 1 = 0 ``` and this runs in Vertica just fine. * Add Vertica-specific escape macro This fixed 4 or 5 errors when running in my test project. * Add Vertica-specific timeadd macro * Attempt to set up Vertica in CI * Debug missing port * Add more missing env vars for CI I thought I might have to add these and not just `VERTICA_PORT`. * Try opentext namespace for CI image * Use Ratio's Vertica-CE I can't tell if OpenText pulled Vertica or what, but both the vertica and opentext namespaces were failing. Luckily I had the image pulled locally. * Add dbt-vertica-version dbt-vertica versions match dbt-core versions, and they are a bit behind, which is why we default to the latest available: 1.8.5. * Start Vertica after schema has been determined * Use Ratio's GitHub package for vertica-ce This should be a lot faster than pulling from docker.io * Set Vertica env vars & persist across steps * Forgot VERTICA_HOST * Address CodeRabbit nit * Try a healthcheck before moving on with Vertica I'm seeing `Database Error: [Errno 32] Broken pipe` in the `Check DWH connection` step. * Use env vars for Vertica healthcheck * Add test/CI profiles.yml fixture file I use this for local dev via `DBT_PROFILES_DIR="path/to/.github/fixtures/" and for GitHub Actions secret `CI_PROFILES_YML`. Linux+Wayland: `base64 .github/fixtures/profiles.yml | wl-copy` MacOS: `base64 .github/fixtures/profiles.yml | pbcopy` * Ignore the .user.yml in the fixtures dir * fix: export SCHEMA_NAME to GITHUB_ENV for Vertica docker-compose The Vertica docker-compose and env vars steps need SCHEMA_NAME to be available across GitHub Actions steps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: inline Vertica credentials instead of using env vars Hardcode Vertica connection values directly in docker-compose-vertica.yml (matching the pattern of other local adapters like sqlserver) and remove the "Set Vertica environment variables" CI step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: remove unnecessary SCHEMA_NAME export to GITHUB_ENV No longer needed since Vertica env vars step was removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove dbt-vertica-version input parameter Just install latest dbt-vertica instead of pinning a specific version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: Vertica adapter compatibility fixes for integration tests - Add VerticaDirectSeeder with direct vertica_python connection for atomic DDL+DML+COMMIT - Add vertica__get_normalized_data_type macro to normalize VARCHAR/INT types - Add vertica__get_default_config with query_max_size=250000 - Truncate message field in on_run_result_query_exceed to handle long error messages - Add vertica__edr_type_string (varchar(16000)) and vertica__edr_type_long_string (varchar(32000)) - Add vertica__full_name_split using split_part instead of array subscripts - Add vertica__buckets_cte, vertica__target_database, vertica__day_of_week, vertica__hour_of_week - Add vertica__get_relation_max_length with Vertica identifier limit (128 chars) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: upgrade dbt-core for Vertica CI to support 'arguments' test property dbt-vertica pins dbt-core~=1.8 which lacks native support for the 'arguments' test property used by the integration-test framework. This caused all Vertica tests to fail in CI with: macro 'dbt_macro__test_volume_anomalies' takes no keyword argument 'arguments' Upgrade dbt-core after installing dbt-vertica (dbt-vertica 1.8.5 works fine with newer dbt-core versions, as verified locally). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: install dbt-vertica with --no-deps to allow latest dbt-core The previous approach (pip install dbt-core after dbt-vertica) didn't upgrade because pip saw 1.8.5 as satisfying the bare requirement. Install dbt-vertica --no-deps then install vertica-python + dbt-core separately so the latest dbt-core is used. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: override dbt-vertica seed macro to use unique reject table per seed Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: address Vertica CI workflow, schema cleanup, and stddev precision 1. CI workflow: honor dbt-version input for Vertica installs and reject unsupported Vertica+Fusion combinations with explicit error message. 2. Schema cleanup: add Vertica dispatches for edr_create_schema, edr_drop_schema, edr_schema_exists, and edr_list_schemas using v_catalog.schemata (Vertica lacks information_schema) and without adapter.commit() (Vertica DDL is auto-committed). 3. Anomaly detection: add edr_normalize_stddev dispatched macro to round(training_stddev, 6) on Vertica, fixing floating-point artifacts where STDDEV returns ~4e-08 for identical values. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: add empty-seed guard and clarify query_max_size comment - VerticaDirectSeeder.seed() now raises ValueError on empty data instead of IndexError (consistent with other seeders) - Updated vertica__get_default_config comment to clarify query_max_size controls batch INSERT size, not per-column limits Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * style: address CodeRabbit nitpicks - buckets_cte.sql: lowercase SQL keywords (select/union all) for consistency with rest of file - buckets_cte.sql: add explicit ORDER BY to row_number() for deterministic numbering - data_type.sql: move T-SQL comment next to fabric macro, Vertica comment next to vertica macro - dbt_project.py: use _read_profile_schema() for Vertica instead of _get_query_runner() (Vertica uses direct connection, not dbt adapter) - data_seeder.py: make query_runner Optional in BaseSqlInsertSeeder since Vertica passes None Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * fix: use column references in row_number() ORDER BY for Vertica ORDER BY 1 in a window function causes Vertica to misinterpret the sort key, breaking bucket generation. Use explicit column references (t1.v, t2.v, t3.v, t4.v) instead, matching the Dremio implementation. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * revert: undo risky nitpick changes to isolate CI regression Reverts buckets_cte.sql, dbt_project.py, and data_seeder.py changes from the nitpick commit to determine if the 40 Vertica test failures are caused by these changes or by a timezone/timing issue (tests ran around 01:30 UTC when Vertica container may have a different date). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * style: re-apply CodeRabbit nitpick fixes (confirmed not causing CI failures) The 40 Vertica test failures are timing-related (midnight UTC timezone mismatch between CI runner and Vertica container), not caused by these changes — verified by reverting all changes in f5c11ef which still had the same 40 failures. Changes: - buckets_cte.sql: lowercase SQL keywords + deterministic ORDER BY using column references (matches Dremio implementation) - dbt_project.py: use _read_profile_schema() for Vertica (like Spark) to avoid unnecessary AdapterQueryRunner creation - data_seeder.py: make query_runner Optional since Vertica passes None Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> --------- Co-authored-by: Jesse Cooke <jesse@ratiopbc.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
1 parent e0637b0 commit 2ab66fb

File tree

28 files changed

+446
-19
lines changed

28 files changed

+446
-19
lines changed

.github/workflows/test-all-warehouses.yml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,16 @@ jobs:
4949
${{ inputs.dbt-version && fromJSON(format('["{0}"]', inputs.dbt-version)) ||
5050
fromJSON('["latest_official", "latest_pre"]') }}
5151
warehouse-type:
52-
[postgres, clickhouse, trino, dremio, spark, duckdb, sqlserver]
52+
[
53+
postgres,
54+
clickhouse,
55+
trino,
56+
dremio,
57+
spark,
58+
duckdb,
59+
sqlserver,
60+
vertica,
61+
]
5362
exclude:
5463
# latest_pre is only tested on postgres
5564
- dbt-version: latest_pre
@@ -64,6 +73,8 @@ jobs:
6473
warehouse-type: duckdb
6574
- dbt-version: latest_pre
6675
warehouse-type: sqlserver
76+
- dbt-version: latest_pre
77+
warehouse-type: vertica
6778
uses: ./.github/workflows/test-warehouse.yml
6879
with:
6980
warehouse-type: ${{ matrix.warehouse-type }}

.github/workflows/test-warehouse.yml

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ on:
2020
- duckdb
2121
- sqlserver
2222
- fabric
23+
- vertica
2324
elementary-ref:
2425
type: string
2526
required: false
@@ -151,8 +152,26 @@ jobs:
151152
if: startsWith(inputs.warehouse-type, 'databricks') && inputs.dbt-version < '1.7.0'
152153
run: pip install databricks-sql-connector==2.9.3
153154

155+
- name: Reject unsupported Vertica + Fusion combination
156+
if: inputs.warehouse-type == 'vertica' && inputs.dbt-version == 'fusion'
157+
run: |
158+
echo "::error::dbt Fusion does not support third-party adapters such as dbt-vertica."
159+
exit 1
160+
161+
- name: Install dbt-vertica
162+
if: inputs.warehouse-type == 'vertica' && inputs.dbt-version != 'fusion'
163+
run: |
164+
# dbt-vertica pins dbt-core~=1.8 which lacks native support for the
165+
# "arguments" test property used by the integration-test framework.
166+
# Install dbt-vertica without deps, then install the requested
167+
# dbt-core version separately (dbt-vertica works fine with newer
168+
# dbt-core versions).
169+
pip install dbt-vertica --no-deps
170+
pip install vertica-python \
171+
"dbt-core${{ (!startsWith(inputs.dbt-version, 'latest') && format('=={0}', inputs.dbt-version)) || '' }}"
172+
154173
- name: Install dbt
155-
if: ${{ inputs.dbt-version != 'fusion' }}
174+
if: ${{ inputs.dbt-version != 'fusion' && inputs.warehouse-type != 'vertica' }}
156175
run:
157176
pip install${{ (inputs.dbt-version == 'latest_pre' && ' --pre') || '' }}
158177
"dbt-core${{ (!startsWith(inputs.dbt-version, 'latest') && format('=={0}', inputs.dbt-version)) || '' }}"
@@ -198,6 +217,18 @@ jobs:
198217
ln -sfn ${{ github.workspace }}/dbt-data-reliability dbt_project/dbt_packages/elementary
199218
pip install -r requirements.txt
200219
220+
- name: Start Vertica
221+
if: inputs.warehouse-type == 'vertica'
222+
working-directory: ${{ env.TESTS_DIR }}
223+
run: docker compose -f docker-compose-vertica.yml up -d
224+
225+
- name: Wait for Vertica to be ready
226+
if: inputs.warehouse-type == 'vertica'
227+
run: |
228+
echo "Waiting for Vertica to be healthy..."
229+
timeout 60 bash -c 'until [ "$(docker inspect --format="{{.State.Health.Status}}" vertica)" == "healthy" ]; do echo "Waiting..."; sleep 5; done'
230+
echo "Vertica is ready!"
231+
201232
- name: Check DWH connection
202233
working-directory: ${{ env.TESTS_DIR }}
203234
run: |

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ dbt_internal_packages/
44
logs/
55
scripts/
66

7+
.github/fixtures/.user.yml
78
.idea
89
.DS_Store
910

integration_tests/dbt_project/macros/ci_schemas_cleanup/test_drop_stale_ci_schemas.sql

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,3 +98,9 @@
9898
{% set safe_schema = schema_name | replace("`", "``") %}
9999
{% do run_query("CREATE DATABASE IF NOT EXISTS `" ~ safe_schema ~ "`") %}
100100
{% endmacro %}
101+
102+
{% macro vertica__edr_create_schema(database, schema_name) %}
103+
{#- Vertica DDL is auto-committed; an explicit adapter.commit() would
104+
fail with "no transaction in progress". -#}
105+
{% do run_query("CREATE SCHEMA IF NOT EXISTS " ~ schema_name) %}
106+
{% endmacro %}

integration_tests/dbt_project/macros/clear_env.sql

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,9 @@
8282
{% do run_query("DROP SCHEMA IF EXISTS " ~ schema_name ~ " CASCADE") %}
8383
{% do adapter.commit() %}
8484
{% endmacro %}
85+
86+
{% macro vertica__edr_drop_schema(database_name, schema_name) %}
87+
{#- Vertica DDL is auto-committed; an explicit adapter.commit() would
88+
fail with "no transaction in progress". -#}
89+
{% do run_query("DROP SCHEMA IF EXISTS " ~ schema_name ~ " CASCADE") %}
90+
{% endmacro %}

integration_tests/dbt_project/macros/schema_utils/list_schemas.sql

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,12 @@
5454
{% for row in results %} {% do schemas.append(row[0]) %} {% endfor %}
5555
{% do return(schemas) %}
5656
{% endmacro %}
57+
58+
{% macro vertica__edr_list_schemas(database) %}
59+
{#- Vertica's v_catalog.schemata is scoped to the current database and
60+
does not have a database_name filter column. -#}
61+
{% set results = run_query("SELECT schema_name FROM v_catalog.schemata") %}
62+
{% set schemas = [] %}
63+
{% for row in results %} {% do schemas.append(row[0]) %} {% endfor %}
64+
{% do return(schemas) %}
65+
{% endmacro %}

integration_tests/dbt_project/macros/schema_utils/schema_exists.sql

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,14 @@
6464
{% set result = run_query("SHOW DATABASES LIKE '" ~ safe_schema ~ "'") %}
6565
{% do return(result | length > 0) %}
6666
{% endmacro %}
67+
68+
{% macro vertica__edr_schema_exists(database, schema_name) %}
69+
{#- Vertica's v_catalog.schemata is scoped to the current database. -#}
70+
{% set safe_schema = schema_name | replace("'", "''") %}
71+
{% set result = run_query(
72+
"SELECT schema_name FROM v_catalog.schemata WHERE lower(schema_name) = lower('"
73+
~ safe_schema
74+
~ "')"
75+
) %}
76+
{% do return(result | length > 0) %}
77+
{% endmacro %}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{#- Override the dbt-vertica seed helper so that each seed file uses a
2+
unique reject-table name. The upstream macro hardcodes
3+
``seed_rejects`` for every seed, which causes "Object already exists"
4+
errors when ``dbt seed`` processes more than one file. -#}
5+
{% macro copy_local_load_csv_rows(model, agate_table) %}
6+
{% set cols_sql = get_seed_column_quoted_csv(model, agate_table.column_names) %}
7+
8+
{#- Build a per-seed reject table name so concurrent seeds don't clash. -#}
9+
{% set reject_table = model["alias"] ~ "_rejects" %}
10+
11+
{% set sql %}
12+
copy {{ this.render() }}
13+
({{ cols_sql }})
14+
from local '{{ agate_table.original_abspath }}'
15+
delimiter ','
16+
enclosed by '"'
17+
skip 1
18+
abort on error
19+
rejected data as table {{ this.without_identifier() }}.{{ reject_table }};
20+
{% endset %}
21+
22+
{{ return(sql) }}
23+
{% endmacro %}
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
services:
2+
vertica:
3+
environment:
4+
VERTICA_USER: dbadmin
5+
VERTICA_PASS: vertica
6+
VERTICA_HOST: localhost
7+
VERTICA_PORT: 5433
8+
VERTICA_DATABASE: elementary_tests
9+
VERTICA_SCHEMA: ${SCHEMA_NAME}
10+
APP_DB_USER: dbadmin
11+
APP_DB_PASSWORD: vertica
12+
TZ: "America/Los_Angeles"
13+
VERTICA_DB_NAME: elementary_tests
14+
VMART_ETL_SCRIPT: ""
15+
container_name: vertica
16+
image: ghcr.io/ratiopbc/vertica-ce
17+
ports:
18+
- "5433:5433"
19+
- "5444:5444"
20+
deploy:
21+
mode: global
22+
ulimits:
23+
nofile:
24+
soft: 65536
25+
hard: 65536
26+
volumes:
27+
- type: volume
28+
source: vertica-data
29+
target: /data
30+
healthcheck:
31+
test:
32+
[
33+
"CMD-SHELL",
34+
"/opt/vertica/bin/vsql -U dbadmin -w vertica -c 'SELECT 1;'",
35+
]
36+
interval: 5s
37+
timeout: 5s
38+
retries: 10
39+
volumes:
40+
vertica-data:

integration_tests/profiles/profiles.yml.j2

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,18 @@ elementary_tests:
7575
trust_cert: true
7676
threads: 4
7777

78+
vertica: &vertica
79+
type: vertica
80+
host: localhost
81+
port: 5433
82+
username: dbadmin
83+
password: vertica
84+
database: elementary_tests
85+
schema: {{ schema_name }}
86+
connection_load_balance: false
87+
retries: 2
88+
threads: 4
89+
7890
# ── Cloud targets (secrets substituted at CI time) ─────────────────
7991

8092
snowflake: &snowflake
@@ -150,7 +162,7 @@ elementary_tests:
150162
elementary:
151163
target: postgres
152164
outputs:
153-
{%- set targets = ['postgres', 'clickhouse', 'trino', 'dremio', 'spark', 'duckdb', 'sqlserver', 'snowflake', 'bigquery', 'redshift', 'databricks_catalog', 'athena', 'fabric'] %}
165+
{%- set targets = ['postgres', 'clickhouse', 'trino', 'dremio', 'spark', 'duckdb', 'sqlserver', 'vertica', 'snowflake', 'bigquery', 'redshift', 'databricks_catalog', 'athena', 'fabric'] %}
154166
{%- for t in targets %}
155167
{{ t }}:
156168
<<: *{{ t }}

0 commit comments

Comments
 (0)