Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8b84461
Remove extra newlines that Vertica could not parse
jc00ke Oct 8, 2025
7146da5
Add Vertica-specific escape macro
jc00ke Nov 17, 2025
27e924d
Add Vertica-specific timeadd macro
jc00ke Nov 17, 2025
667054b
Attempt to set up Vertica in CI
jc00ke Nov 17, 2025
2130a7a
Debug missing port
jc00ke Nov 17, 2025
d7a9c0f
Add more missing env vars for CI
jc00ke Nov 17, 2025
8d1975d
Try opentext namespace for CI image
jc00ke Nov 17, 2025
fe2585e
Use Ratio's Vertica-CE
jc00ke Nov 17, 2025
2154163
Add dbt-vertica-version
jc00ke Nov 18, 2025
19e41b5
Start Vertica after schema has been determined
jc00ke Nov 18, 2025
c585dae
Use Ratio's GitHub package for vertica-ce
jc00ke Nov 18, 2025
d198a9e
Set Vertica env vars & persist across steps
jc00ke Nov 18, 2025
570dcd0
Forgot VERTICA_HOST
jc00ke Nov 18, 2025
bd69507
Address CodeRabbit nit
jc00ke Nov 18, 2025
c886e4c
Try a healthcheck before moving on with Vertica
jc00ke Nov 18, 2025
bcc9d80
Use env vars for Vertica healthcheck
jc00ke Nov 18, 2025
9a2bf01
Add test/CI profiles.yml fixture file
jc00ke Nov 18, 2025
2a996ef
Ignore the .user.yml in the fixtures dir
jc00ke Nov 18, 2025
795fe85
Merge master into vertica-compat and migrate profiles to .j2 template
haritamar Mar 11, 2026
b671822
fix: export SCHEMA_NAME to GITHUB_ENV for Vertica docker-compose
haritamar Mar 11, 2026
a26598c
refactor: inline Vertica credentials instead of using env vars
haritamar Mar 11, 2026
4326960
revert: remove unnecessary SCHEMA_NAME export to GITHUB_ENV
haritamar Mar 11, 2026
731ca2f
refactor: remove dbt-vertica-version input parameter
haritamar Mar 11, 2026
cf6aed7
fix: Vertica adapter compatibility fixes for integration tests
devin-ai-integration[bot] Mar 11, 2026
899f146
fix: upgrade dbt-core for Vertica CI to support 'arguments' test prop…
devin-ai-integration[bot] Mar 11, 2026
139799c
fix: install dbt-vertica with --no-deps to allow latest dbt-core
devin-ai-integration[bot] Mar 11, 2026
71b930e
fix: override dbt-vertica seed macro to use unique reject table per seed
devin-ai-integration[bot] Mar 11, 2026
326e20c
fix: address Vertica CI workflow, schema cleanup, and stddev precision
devin-ai-integration[bot] Mar 11, 2026
7d77e11
fix: add empty-seed guard and clarify query_max_size comment
devin-ai-integration[bot] Mar 12, 2026
e0900da
style: address CodeRabbit nitpicks
devin-ai-integration[bot] Mar 12, 2026
4edd6f3
fix: use column references in row_number() ORDER BY for Vertica
devin-ai-integration[bot] Mar 12, 2026
f5c11ef
revert: undo risky nitpick changes to isolate CI regression
devin-ai-integration[bot] Mar 12, 2026
02ae168
style: re-apply CodeRabbit nitpick fixes (confirmed not causing CI fa…
devin-ai-integration[bot] Mar 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .github/workflows/test-all-warehouses.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,16 @@ jobs:
${{ inputs.dbt-version && fromJSON(format('["{0}"]', inputs.dbt-version)) ||
fromJSON('["latest_official", "latest_pre"]') }}
warehouse-type:
[postgres, clickhouse, trino, dremio, spark, duckdb, sqlserver]
[
postgres,
clickhouse,
trino,
dremio,
spark,
duckdb,
sqlserver,
vertica,
]
exclude:
# latest_pre is only tested on postgres
- dbt-version: latest_pre
Expand All @@ -64,6 +73,8 @@ jobs:
warehouse-type: duckdb
- dbt-version: latest_pre
warehouse-type: sqlserver
- dbt-version: latest_pre
warehouse-type: vertica
uses: ./.github/workflows/test-warehouse.yml
with:
warehouse-type: ${{ matrix.warehouse-type }}
Expand Down
33 changes: 32 additions & 1 deletion .github/workflows/test-warehouse.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ on:
- duckdb
- sqlserver
- fabric
- vertica
elementary-ref:
type: string
required: false
Expand Down Expand Up @@ -151,8 +152,26 @@ jobs:
if: startsWith(inputs.warehouse-type, 'databricks') && inputs.dbt-version < '1.7.0'
run: pip install databricks-sql-connector==2.9.3

- name: Reject unsupported Vertica + Fusion combination
if: inputs.warehouse-type == 'vertica' && inputs.dbt-version == 'fusion'
run: |
echo "::error::dbt Fusion does not support third-party adapters such as dbt-vertica."
exit 1

- name: Install dbt-vertica
if: inputs.warehouse-type == 'vertica' && inputs.dbt-version != 'fusion'
run: |
# dbt-vertica pins dbt-core~=1.8 which lacks native support for the
# "arguments" test property used by the integration-test framework.
# Install dbt-vertica without deps, then install the requested
# dbt-core version separately (dbt-vertica works fine with newer
# dbt-core versions).
pip install dbt-vertica --no-deps
pip install vertica-python \
"dbt-core${{ (!startsWith(inputs.dbt-version, 'latest') && format('=={0}', inputs.dbt-version)) || '' }}"

- name: Install dbt
if: ${{ inputs.dbt-version != 'fusion' }}
if: ${{ inputs.dbt-version != 'fusion' && inputs.warehouse-type != 'vertica' }}
Comment thread
coderabbitai[bot] marked this conversation as resolved.
run:
pip install${{ (inputs.dbt-version == 'latest_pre' && ' --pre') || '' }}
"dbt-core${{ (!startsWith(inputs.dbt-version, 'latest') && format('=={0}', inputs.dbt-version)) || '' }}"
Expand Down Expand Up @@ -198,6 +217,18 @@ jobs:
ln -sfn ${{ github.workspace }}/dbt-data-reliability dbt_project/dbt_packages/elementary
pip install -r requirements.txt

- name: Start Vertica
if: inputs.warehouse-type == 'vertica'
working-directory: ${{ env.TESTS_DIR }}
run: docker compose -f docker-compose-vertica.yml up -d

- name: Wait for Vertica to be ready
if: inputs.warehouse-type == 'vertica'
run: |
echo "Waiting for Vertica to be healthy..."
timeout 60 bash -c 'until [ "$(docker inspect --format="{{.State.Health.Status}}" vertica)" == "healthy" ]; do echo "Waiting..."; sleep 5; done'
echo "Vertica is ready!"

- name: Check DWH connection
working-directory: ${{ env.TESTS_DIR }}
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ dbt_internal_packages/
logs/
scripts/

.github/fixtures/.user.yml
.idea
.DS_Store

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,9 @@
{% set safe_schema = schema_name | replace("`", "``") %}
{% do run_query("CREATE DATABASE IF NOT EXISTS `" ~ safe_schema ~ "`") %}
{% endmacro %}

{% macro vertica__edr_create_schema(database, schema_name) %}
{#- Vertica DDL is auto-committed; an explicit adapter.commit() would
fail with "no transaction in progress". -#}
{% do run_query("CREATE SCHEMA IF NOT EXISTS " ~ schema_name) %}
{% endmacro %}
6 changes: 6 additions & 0 deletions integration_tests/dbt_project/macros/clear_env.sql
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,9 @@
{% do run_query("DROP SCHEMA IF EXISTS " ~ schema_name ~ " CASCADE") %}
{% do adapter.commit() %}
{% endmacro %}

{% macro vertica__edr_drop_schema(database_name, schema_name) %}
{#- Vertica DDL is auto-committed; an explicit adapter.commit() would
fail with "no transaction in progress". -#}
{% do run_query("DROP SCHEMA IF EXISTS " ~ schema_name ~ " CASCADE") %}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,12 @@
{% for row in results %} {% do schemas.append(row[0]) %} {% endfor %}
{% do return(schemas) %}
{% endmacro %}

{% macro vertica__edr_list_schemas(database) %}
{#- Vertica's v_catalog.schemata is scoped to the current database and
does not have a database_name filter column. -#}
{% set results = run_query("SELECT schema_name FROM v_catalog.schemata") %}
{% set schemas = [] %}
{% for row in results %} {% do schemas.append(row[0]) %} {% endfor %}
{% do return(schemas) %}
{% endmacro %}
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,14 @@
{% set result = run_query("SHOW DATABASES LIKE '" ~ safe_schema ~ "'") %}
{% do return(result | length > 0) %}
{% endmacro %}

{% macro vertica__edr_schema_exists(database, schema_name) %}
{#- Vertica's v_catalog.schemata is scoped to the current database. -#}
{% set safe_schema = schema_name | replace("'", "''") %}
{% set result = run_query(
"SELECT schema_name FROM v_catalog.schemata WHERE lower(schema_name) = lower('"
~ safe_schema
~ "')"
) %}
{% do return(result | length > 0) %}
{% endmacro %}
23 changes: 23 additions & 0 deletions integration_tests/dbt_project/macros/vertica_seed_override.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{#- Override the dbt-vertica seed helper so that each seed file uses a
unique reject-table name. The upstream macro hardcodes
``seed_rejects`` for every seed, which causes "Object already exists"
errors when ``dbt seed`` processes more than one file. -#}
{% macro copy_local_load_csv_rows(model, agate_table) %}
{% set cols_sql = get_seed_column_quoted_csv(model, agate_table.column_names) %}

{#- Build a per-seed reject table name so concurrent seeds don't clash. -#}
{% set reject_table = model["alias"] ~ "_rejects" %}

{% set sql %}
copy {{ this.render() }}
({{ cols_sql }})
from local '{{ agate_table.original_abspath }}'
delimiter ','
enclosed by '"'
skip 1
abort on error
rejected data as table {{ this.without_identifier() }}.{{ reject_table }};
{% endset %}

{{ return(sql) }}
{% endmacro %}
40 changes: 40 additions & 0 deletions integration_tests/docker-compose-vertica.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
services:
vertica:
environment:
VERTICA_USER: dbadmin
VERTICA_PASS: vertica
VERTICA_HOST: localhost
VERTICA_PORT: 5433
VERTICA_DATABASE: elementary_tests
VERTICA_SCHEMA: ${SCHEMA_NAME}
APP_DB_USER: dbadmin
APP_DB_PASSWORD: vertica
TZ: "America/Los_Angeles"
VERTICA_DB_NAME: elementary_tests
VMART_ETL_SCRIPT: ""
container_name: vertica
image: ghcr.io/ratiopbc/vertica-ce
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the image is accessible and get basic info
# Note: This verifies the image exists and is pullable

# Check GitHub Container Registry for the image
curl -s "https://ghcr.io/v2/ratiopbc/vertica-ce/tags/list" | head -c 500

# Alternative: check if there's an official Vertica CE image
echo "---"
echo "Searching for official Vertica images..."
curl -s "https://hub.docker.com/v2/search/repositories/?query=vertica&page_size=5" | jq '.results[] | {name: .repo_name, description: .short_description}'

Repository: elementary-data/dbt-data-reliability

Length of output: 638


🏁 Script executed:

# Search for references to the Vertica image in the codebase
rg "vertica-ce" -i --type yaml --type json

# Check git history for this file
git log --oneline --all -- integration_tests/docker-compose-vertica.yml | head -20

# Search for any documentation about image choice
rg "ratiopbc" -i

Repository: elementary-data/dbt-data-reliability

Length of output: 309


🌐 Web query:

Vertica Community Edition official Docker image repository

💡 Result:

  • Official source repository (to build the CE image yourself): vertica/vertica-containersone-node-ce directory. [1][2]
  • Prebuilt “official” CE Docker Hub image: OpenText’s current docs state the CE single-node image is no longer available (so there isn’t an active official Docker Hub repo for vertica-ce anymore). [3]

Sources:
[1] (raw.githubusercontent.com)
[2] (raw.githubusercontent.com)
[3] (docs.vertica.com)

Citations:


🏁 Script executed:

# Search GitHub for the ratiopbc/vertica-ce repository info
curl -s "https://api.github.com/repos/ratiopbc/vertica-ce" | jq '{name, description, updated_at, watchers_count, forks_count, open_issues_count}' 2>/dev/null || echo "Not found or private"

# Also check if the file has any comments or documentation about why this image was chosen
rg -B 5 -A 5 "vertica-ce" integration_tests/docker-compose-vertica.yml

Repository: elementary-data/dbt-data-reliability

Length of output: 500


Replace or document the container image source.

The image ghcr.io/ratiopbc/vertica-ce appears to have no public repository presence. Since the official Vertica CE prebuilt image is no longer available, either:

  1. Build from the official source: vertica/vertica-containers repository (one-node-ce directory), or
  2. Document and justify the choice of this third-party image and verify its maintenance status.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/docker-compose-vertica.yml` at line 16, The docker-compose
entry uses the unverified image string "ghcr.io/ratiopbc/vertica-ce" in
docker-compose-vertica.yml; either replace that value with a reproducible build
from the official vertica/vertica-containers one-node-ce (build and reference
your local image or a trusted registry tag) or add a clear comment and
documentation (in the compose file header and project README) justifying the
third-party image choice, its source, how it was vetted, and how to
reproduce/build it; update the image line and accompanying docs accordingly to
ensure provenance and maintainability.

ports:
- "5433:5433"
- "5444:5444"
deploy:
mode: global
ulimits:
nofile:
soft: 65536
hard: 65536
volumes:
- type: volume
source: vertica-data
target: /data
healthcheck:
test:
[
"CMD-SHELL",
"/opt/vertica/bin/vsql -U dbadmin -w vertica -c 'SELECT 1;'",
]
interval: 5s
timeout: 5s
retries: 10
volumes:
vertica-data:
14 changes: 13 additions & 1 deletion integration_tests/profiles/profiles.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,18 @@ elementary_tests:
trust_cert: true
threads: 4

vertica: &vertica
type: vertica
host: localhost
port: 5433
username: dbadmin
password: vertica
database: elementary_tests
schema: {{ schema_name }}
connection_load_balance: false
retries: 2
threads: 4

# ── Cloud targets (secrets substituted at CI time) ─────────────────

snowflake: &snowflake
Expand Down Expand Up @@ -150,7 +162,7 @@ elementary_tests:
elementary:
target: postgres
outputs:
{%- set targets = ['postgres', 'clickhouse', 'trino', 'dremio', 'spark', 'duckdb', 'sqlserver', 'snowflake', 'bigquery', 'redshift', 'databricks_catalog', 'athena', 'fabric'] %}
{%- set targets = ['postgres', 'clickhouse', 'trino', 'dremio', 'spark', 'duckdb', 'sqlserver', 'vertica', 'snowflake', 'bigquery', 'redshift', 'databricks_catalog', 'athena', 'fabric'] %}
{%- for t in targets %}
{{ t }}:
<<: *{{ t }}
Expand Down
120 changes: 118 additions & 2 deletions integration_tests/tests/data_seeder.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from contextlib import contextmanager
from pathlib import Path
from types import MappingProxyType
from typing import TYPE_CHECKING, ClassVar, Dict, Generator, List, Mapping
from typing import TYPE_CHECKING, ClassVar, Dict, Generator, List, Mapping, Optional

from elementary.clients.dbt.base_dbt_runner import BaseDbtRunner
from logger import get_logger
Expand Down Expand Up @@ -121,7 +121,7 @@ class BaseSqlInsertSeeder(ABC):

def __init__(
self,
query_runner: "AdapterQueryRunner",
query_runner: Optional["AdapterQueryRunner"],
schema: str,
seeds_dir_path: Path,
) -> None:
Expand Down Expand Up @@ -454,3 +454,119 @@ def _create_table_sql(self, fq_table: str, col_defs: str) -> str:
f"CREATE TABLE {fq_table} ({col_defs}) "
f"ENGINE = MergeTree() ORDER BY tuple()"
)


class VerticaDirectSeeder(BaseSqlInsertSeeder):
"""Fast seeder for Vertica: executes CREATE TABLE + INSERT directly.

Bypasses ``dbt seed`` (which uses Vertica's COPY command) because COPY
rejects empty CSV fields for non-string columns instead of treating them
as NULL. Direct INSERT statements handle NULL correctly.

Uses a *direct* ``vertica_python`` connection (rather than dbt's adapter
connection pool) so that all DDL + DML runs in a single session and can
be committed atomically. dbt's ``connection_named`` context manager
releases (and effectively rolls back) the connection after each
``execute_sql`` call, which caused INSERT data to be invisible to
subsequent ``dbt test`` sessions.

Vertica uses double-quote identifiers (not backticks), so this class
overrides the ``seed`` method to use ``"col"`` quoting.
"""

def _type_string(self) -> str:
# Must match edr_type_string (varchar(16000)) so that schema-change
# detection sees a consistent type between seeded tables and
# elementary metadata columns.
return "VARCHAR(16000)"

def _type_boolean(self) -> str:
return "BOOLEAN"

def _type_integer(self) -> str:
return "INTEGER"

def _type_float(self) -> str:
return "FLOAT"

def _format_value(self, value: object, col_type: str) -> str:
if value is None or (isinstance(value, str) and value == ""):
return "NULL"
if isinstance(value, bool):
return "true" if value else "false"
if isinstance(value, (int, float)):
return str(value)
text = str(value)
text = text.replace("'", "''")
return f"'{text}'"

def _create_table_sql(self, fq_table: str, col_defs: str) -> str:
return f"CREATE TABLE {fq_table} ({col_defs})"

@staticmethod
def _vertica_connection():
"""Open a direct vertica_python connection from env / defaults."""
import vertica_python # available in the test venv

conn_info = {
"host": os.environ.get("VERTICA_HOST", "localhost"),
"port": int(os.environ.get("VERTICA_PORT", "5433")),
"user": os.environ.get("VERTICA_USER", "dbadmin"),
"password": os.environ.get("VERTICA_PASSWORD", "vertica"),
"database": os.environ.get("VERTICA_DATABASE", "elementary_tests"),
}
return vertica_python.connect(**conn_info)

@contextmanager
def seed(self, data: List[dict], table_name: str) -> Generator[None, None, None]:
"""Override base seed to use double-quote identifiers for Vertica."""
if not data:
raise ValueError(f"Seed data for '{table_name}' must not be empty")
columns = list(data[0].keys())
Comment thread
coderabbitai[bot] marked this conversation as resolved.
col_types: Dict[str, str] = {
col: self._infer_column_type([row.get(col) for row in data])
for col in columns
}
# Vertica uses double-quote identifiers, not backticks.
col_defs = ", ".join(f'"{col}" {col_types[col]}' for col in columns)
fq_table = f'"{self._schema}"."{table_name}"'

seed_path = self._write_csv(data, table_name)

try:
# Use a direct connection so DDL + DML share the same session
# and the COMMIT is guaranteed to persist the data.
conn = self._vertica_connection()
try:
cur = conn.cursor()
cur.execute(f"DROP TABLE IF EXISTS {fq_table}")
cur.execute(self._create_table_sql(fq_table, col_defs))

for batch_start in range(0, len(data), _INSERT_BATCH_SIZE):
batch = data[batch_start : batch_start + _INSERT_BATCH_SIZE]
rows_sql = ", ".join(
"("
+ ", ".join(
self._format_value(row.get(c), col_types[c])
for c in columns
)
+ ")"
for row in batch
)
cur.execute(f"INSERT INTO {fq_table} VALUES {rows_sql}")

conn.commit()
finally:
conn.close()

logger.info(
"%s: loaded %d rows into %s (%s)",
type(self).__name__,
len(data),
fq_table,
", ".join(f"{c}: {t}" for c, t in col_types.items()),
)

yield
finally:
seed_path.unlink(missing_ok=True)
Loading
Loading