Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
213d12a
feat: add weekly cleanup workflow for stale CI schemas
devin-ai-integration[bot] Feb 28, 2026
4605b7f
fix: remove unused ns.errors variable
devin-ai-integration[bot] Feb 28, 2026
fbe4059
test: add integration test for drop_stale_ci_schemas macro
devin-ai-integration[bot] Feb 28, 2026
a124fbe
fix: guard recent schema cleanup with existence check
devin-ai-integration[bot] Feb 28, 2026
fb87fe6
Merge remote-tracking branch 'origin/master' into devin/1772278180-cl…
devin-ai-integration[bot] Feb 28, 2026
f327572
fix: add ClickHouse-compatible list_ci_schemas dispatch and use eleme…
devin-ai-integration[bot] Feb 28, 2026
c373f8a
fix: add clickhouse__schema_exists implementation
devin-ai-integration[bot] Feb 28, 2026
a9f398d
fix: return dict instead of tojson (double-encoding) and use exact-ma…
devin-ai-integration[bot] Feb 28, 2026
fe9ff60
test: remove Dremio skip to let CI validate schema cleanup
devin-ai-integration[bot] Feb 28, 2026
5c58b2e
fix: address CodeRabbit review comments
devin-ai-integration[bot] Feb 28, 2026
572eeca
fix: revert to utcnow() - modules.datetime.timezone not available in …
devin-ai-integration[bot] Feb 28, 2026
dcd447a
fix: use SQL queries instead of adapter methods in run-operation context
devin-ai-integration[bot] Feb 28, 2026
1c6b4db
fix: add BigQuery-specific list_ci_schemas and ci_schema_exists imple…
devin-ai-integration[bot] Feb 28, 2026
f85599c
test: re-add Dremio skip - DremioRelation path model incompatible wit…
devin-ai-integration[bot] Feb 28, 2026
5adb93f
fix: increase test max_age_hours to 8760 to avoid race with parallel …
devin-ai-integration[bot] Feb 28, 2026
dbd9047
docs: fix comment - year 2020, not year 2000
devin-ai-integration[bot] Feb 28, 2026
3265525
refactor: move cleanup macros to integration tests, use regex parser,…
devin-ai-integration[bot] Feb 28, 2026
1001681
refactor: split helpers into schema_utils/, named regex groups, renam…
devin-ai-integration[bot] Feb 28, 2026
357d9a2
rename: schema_utils/ -> ci_schema_utils/ for clarity
devin-ai-integration[bot] Feb 28, 2026
14a0995
refactor: split into schema_utils/ (generic) + ci_schemas_cleanup/ (C…
devin-ai-integration[bot] Feb 28, 2026
6913c74
fix: rename macros to edr_ prefix to avoid shadowing dbt-databricks i…
devin-ai-integration[bot] Feb 28, 2026
3ca9f64
fix: remove duplicate edr_drop_schema - already exists in clear_env.sql
devin-ai-integration[bot] Feb 28, 2026
45c65ef
fix: add SQL escaping for defense-in-depth (CodeRabbit suggestion)
devin-ai-integration[bot] Feb 28, 2026
320a250
fix: add BigQuery backtick escaping for database identifiers (CodeRab…
devin-ai-integration[bot] Feb 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .github/workflows/cleanup-stale-schemas.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: Cleanup stale CI schemas

on:
schedule:
# Every Sunday at 03:00 UTC
- cron: "0 3 * * 0"
workflow_dispatch:
inputs:
max-age-hours:
type: string
required: false
default: "24"
description: Drop schemas older than this many hours

env:
TESTS_DIR: ${{ github.workspace }}/dbt-data-reliability/integration_tests

jobs:
cleanup:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
warehouse-type:
- snowflake
- bigquery
- redshift
- databricks_catalog
- athena
steps:
- name: Checkout dbt package
uses: actions/checkout@v4
with:
path: dbt-data-reliability

- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: "3.10"
cache: "pip"

- name: Install dbt
run: >
pip install
"dbt-core"
"dbt-${{ (matrix.warehouse-type == 'databricks_catalog' && 'databricks') || (matrix.warehouse-type == 'athena' && 'athena-community') || matrix.warehouse-type }}"

- name: Write dbt profiles
env:
CI_WAREHOUSE_SECRETS: ${{ secrets.CI_WAREHOUSE_SECRETS || '' }}
run: |
# The cleanup job doesn't create schemas, but generate_profiles.py
# requires --schema-name. Use a dummy value.
python "${{ github.workspace }}/dbt-data-reliability/integration_tests/profiles/generate_profiles.py" \
--template "${{ github.workspace }}/dbt-data-reliability/integration_tests/profiles/profiles.yml.j2" \
--output ~/.dbt/profiles.yml \
--schema-name "cleanup_placeholder"

- name: Install dbt deps
working-directory: ${{ env.TESTS_DIR }}/dbt_project
run: dbt deps

- name: Symlink local elementary package
run: ln -sfn ${{ github.workspace }}/dbt-data-reliability ${{ env.TESTS_DIR }}/dbt_project/dbt_packages/elementary

- name: Drop stale CI schemas
working-directory: ${{ env.TESTS_DIR }}/dbt_project
run: >
dbt run-operation elementary.drop_stale_ci_schemas
--args '{prefixes: ["dbt_"], max_age_hours: ${{ inputs.max-age-hours || '24' }}}'
-t "${{ matrix.warehouse-type }}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
{#
Integration-test helper for elementary.drop_stale_ci_schemas.

Creates two CI-style schemas (one with an old timestamp, one recent),
runs the cleanup macro, checks which schemas survived, cleans up,
and returns a JSON result dict.
#}

{% macro test_drop_stale_ci_schemas() %}
{% set database = elementary.target_database() %}
{% set now = modules.datetime.datetime.utcnow() %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "test_drop_stale_ci_schemas.sql" -type f

Repository: elementary-data/dbt-data-reliability

Length of output: 149


🌐 Web query:

Python 3.12 datetime.utcnow() deprecation removal

💡 Result:

  • datetime.datetime.utcnow() (and utcfromtimestamp()) were deprecated in Python 3.12 and now emit a DeprecationWarning saying they’re “scheduled for removal in a future version.” The official replacement is timezone-aware UTC datetimes, e.g. datetime.datetime.now(tz=datetime.UTC). [1]
  • As of the Python 3.15.0a0 docs, utcnow() is still present (still marked “Deprecated since version 3.12”), which indicates it has not been removed yet, and no specific removal version is stated in the docs. [2]
  • The deprecation notice is still listed in Python 3.14 deprecations as well. [3]

Recommended migration:

from datetime import datetime, UTC

dt = datetime.now(UTC)  # instead of datetime.utcnow()

Sources: [1] [2] [3]


🏁 Script executed:

cat -n ./integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 1691


🏁 Script executed:

# Check if there are other datetime usages in the codebase to understand patterns
rg "modules\.datetime" integration_tests/dbt_project/macros/ -A 2 -B 2

# Also check dbt version or requirements
fd requirements -type f -name "*.txt" -o -name "*.yml" -o -name "setup.py" | head -5

Repository: elementary-data/dbt-data-reliability

Length of output: 818


🌐 Web query:

dbt Jinja modules.datetime.timezone Python version compatibility

💡 Result:

modules.datetime.timezone is part of Python’s standard library datetime module, and it’s available in all modern Python 3 versions that dbt supports—so it’s not a dbt/Python-version incompatibility issue in practice.

  • In dbt’s Jinja context, the modules object exposes Python stdlib modules, and modules.datetime is commonly used for date/time logic in macros (e.g., modules.datetime.datetime.now(), modules.datetime.timedelta(...)). That same object should also expose modules.datetime.timezone (e.g., modules.datetime.timezone.utc). [3]
  • Current dbt-core releases (e.g., dbt-core 1.11.6, released Feb 17, 2026) require Python >= 3.10 and declare support for Python 3.10–3.13. So datetime.timezone will be present. [1]
  • dbt-core 1.11 explicitly dropped Python 3.9, reinforcing that you’re in “modern Python” territory where datetime.timezone exists. [2]

If you’re seeing an error like “timezone is undefined” or “has no attribute timezone”, it’s more likely due to how the Jinja context is being executed (e.g., a non-dbt Jinja renderer, older tooling that doesn’t provide modules, or a linter/templater mismatch), not Python’s datetime module.

Sources: [1] [2] [3]


Use timezone-aware UTC instead of utcnow() in the test macro.

The datetime.utcnow() method is deprecated in Python 3.12+ and will be removed in a future version. Replace it with datetime.now(timezone.utc) to maintain forward compatibility and avoid runtime warnings.

♻️ Proposed fix
-  {% set now = modules.datetime.datetime.utcnow() %}
+  {% set now = modules.datetime.datetime.now(modules.datetime.timezone.utc) %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% set now = modules.datetime.datetime.utcnow() %}
{% set now = modules.datetime.datetime.now(modules.datetime.timezone.utc) %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/test_drop_stale_ci_schemas.sql` at line
11, Replace the deprecated call modules.datetime.datetime.utcnow() used to set
the now variable with a timezone-aware call: use
modules.datetime.datetime.now(modules.datetime.timezone.utc) so the now variable
is UTC-aware; update the expression where now is set (the line containing
modules.datetime.datetime.utcnow()) to call .now(...) with
modules.datetime.timezone.utc to avoid the deprecation and runtime warnings.


{# Old schema: timestamp in the past (2020-01-01 00:00:00) #}
{% set old_schema = 'dbt_200101_000000_citest_00000000' %}
{# Recent schema: timestamp = now #}
{% set recent_ts = now.strftime('%y%m%d_%H%M%S') %}
{% set recent_schema = 'dbt_' ~ recent_ts ~ '_citest_11111111' %}

{{ log("TEST: creating old schema: " ~ old_schema, info=true) }}
{{ log("TEST: creating recent schema: " ~ recent_schema, info=true) }}

{# ── Create both schemas ───────────────────────────────────────────── #}
{% do elementary_tests.edr_create_schema(database, old_schema) %}
{% do elementary_tests.edr_create_schema(database, recent_schema) %}

{# ── Verify both exist before running cleanup ──────────────────────── #}
{% set old_exists_before = adapter.check_schema_exists(database, old_schema) %}
{% set recent_exists_before = adapter.check_schema_exists(database, recent_schema) %}
{{ log("TEST: old_exists_before=" ~ old_exists_before ~ ", recent_exists_before=" ~ recent_exists_before, info=true) }}

{# ── Run cleanup with 1-hour threshold ─────────────────────────────── #}
{% do elementary.drop_stale_ci_schemas(prefixes=['dbt_'], max_age_hours=1) %}

{# ── Check which schemas survived ─────────────────────────────────── #}
{% set old_exists_after = adapter.check_schema_exists(database, old_schema) %}
{% set recent_exists_after = adapter.check_schema_exists(database, recent_schema) %}
{{ log("TEST: old_exists_after=" ~ old_exists_after ~ ", recent_exists_after=" ~ recent_exists_after, info=true) }}

{# ── Cleanup: drop any remaining test schemas ─────────────────────── #}
{% if old_exists_after %}
{% do elementary.drop_ci_schema(database, old_schema) %}
{% endif %}
{% if recent_exists_after %}
{% do elementary.drop_ci_schema(database, recent_schema) %}
{% endif %}

{# ── Return results ────────────────────────────────────────────────── #}
{% set results = {
"old_exists_before": old_exists_before,
"recent_exists_before": recent_exists_before,
"old_dropped": not old_exists_after,
"recent_kept": recent_exists_after
} %}
{% do return(tojson(results)) %}
{% endmacro %}


{# ── Per-adapter schema creation ─────────────────────────────────────── #}

{% macro edr_create_schema(database, schema_name) %}
{% do return(adapter.dispatch('edr_create_schema', 'elementary_tests')(database, schema_name)) %}
{% endmacro %}

{% macro default__edr_create_schema(database, schema_name) %}
{% set schema_relation = api.Relation.create(database=database, schema=schema_name) %}
{% do dbt.create_schema(schema_relation) %}
{% do adapter.commit() %}
{% endmacro %}

{% macro clickhouse__edr_create_schema(database, schema_name) %}
{% do run_query("CREATE DATABASE IF NOT EXISTS " ~ schema_name) %}
{% do adapter.commit() %}
{% endmacro %}
23 changes: 23 additions & 0 deletions integration_tests/tests/test_ci_cleanup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
"""Integration test for elementary.drop_stale_ci_schemas macro."""

import json

import pytest
from dbt_project import DbtProject


# Dremio does not support DROP SCHEMA reliably; skip it.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that Dremio has "DROP FOLDER", should we perhaps use that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I looked into it — Dremio's DROP FOLDER is specifically for Arctic/Nessie catalog folders, not general schemas/spaces.

Looking at the existing codebase, clear_env.sql doesn't have a Dremio-specific dispatch — it falls through to the default dbt.drop_schema, which the dbt-dremio adapter handles internally. So the cleanup macro's default dbt.drop_schema path should already work for Dremio.

The skip was a conservative choice since I wasn't sure if Dremio could create arbitrary schemas via dbt.create_schema (which the test needs). I can try removing the skip and let CI tell us if it works — would you like me to do that?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let's try

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed the Dremio skip in fe9ff60. CI will tell us if it works. Also fixed two other issues in the same push:\n- Double-encoding bug (test macro was returning tojson(results) but log_macro_results already calls tojson)\n- ClickHouse schema_exists now uses exact-match (system.databases WHERE name = ...) instead of SHOW DATABASES LIKE which treats _ as wildcard

# ClickHouse uses a custom dispatch (CREATE/DROP DATABASE) which is tested implicitly.
@pytest.mark.skip_targets(["dremio"])
def test_drop_stale_ci_schemas(dbt_project: DbtProject):
"""Verify that old CI schemas are dropped and recent ones are kept."""
result = dbt_project.dbt_runner.run_operation(
"elementary_tests.test_drop_stale_ci_schemas",
)
assert result, "run_operation returned no output"
data = json.loads(result[0])

assert data["old_exists_before"], "Setup failed: old schema was not created"
assert data["recent_exists_before"], "Setup failed: recent schema was not created"
assert data["old_dropped"], "Old schema should have been dropped by cleanup"
assert data["recent_kept"], "Recent schema should have been kept by cleanup"
80 changes: 80 additions & 0 deletions macros/utils/ci_cleanup/drop_stale_ci_schemas.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{#
drop_stale_ci_schemas – clean up timestamped CI schemas.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this macro be in the integration tests project?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved the entire macro to integration_tests/dbt_project/macros/drop_stale_ci_schemas.sql. The old macros/utils/ci_cleanup/ directory is deleted. Workflow now calls drop_stale_ci_schemas without the elementary. prefix.


Schema naming convention produced by CI:
<prefix><YYMMDD_HHMMSS>_<branch>_<hash>
Examples:
dbt_260228_112345_master_abcd1234
py_260228_112345_master_abcd1234
dbt_260228_112345_master_abcd1234_elementary (suffixed variant)

Call from a GitHub Actions workflow via:
dbt run-operation elementary.drop_stale_ci_schemas \
--args '{prefixes: ["dbt_", "py_"], max_age_hours: 24}'
#}

{% macro drop_stale_ci_schemas(prefixes=None, max_age_hours=24) %}
Comment thread
haritamar marked this conversation as resolved.
Outdated
{% if prefixes is none %}
{% set prefixes = ['dbt_', 'py_'] %}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make prefixes a mandatory input, and fail if it is not specified

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — prefixes now defaults to none and raises a CompilationError with a clear message if not provided or empty.

{% endif %}

{% set database = elementary.target_database() %}
{% set all_schemas = adapter.list_schemas(database) %}
{% set now = modules.datetime.datetime.utcnow() %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Python 3.12 datetime.utcnow() deprecation replacement

💡 Result:

In Python 3.12, datetime.datetime.utcnow() is deprecated; use a timezone-aware UTC datetime instead. (docs.python.org)

Replacement

from datetime import datetime, UTC
now_utc = datetime.now(UTC)

(docs.python.org)

If you need compatibility with older Python versions

from datetime import datetime, timezone
now_utc = datetime.now(timezone.utc)

(docs.python.org)

Related: utcfromtimestamp() replacement

from datetime import datetime, UTC
dt_utc = datetime.fromtimestamp(ts, tz=UTC)

(docs.python.org)

Citations:


datetime.utcnow() is deprecated in Python 3.12+.

Use datetime.now(timezone.utc) instead. The suggested fix should avoid stripping the timezone information with .replace(tzinfo=None), as this defeats the purpose of using timezone-aware datetimes. If the code elsewhere compares with naive datetimes, those should be updated to be timezone-aware as well.

♻️ Suggested fix
-  {% set now = modules.datetime.datetime.utcnow() %}
+  {% set now = modules.datetime.datetime.now(modules.datetime.timezone.utc) %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql` at line 23, The code sets
now = modules.datetime.datetime.utcnow(), which uses a deprecated naive UTC;
change this to a timezone-aware UTC datetime by calling
modules.datetime.datetime.now(modules.datetime.timezone.utc) (or equivalent)
when assigning now in the macro, and remove any .replace(tzinfo=None) usage so
timezone info is preserved; also audit places that compare now to other
datetimes (e.g., functions referencing now) and update those comparisons to use
timezone-aware datetimes so no naive vs aware comparisons occur.

{% set max_age_seconds = max_age_hours * 3600 %}
{% set ns = namespace(dropped=0) %}

{{ log("CI schema cleanup: scanning " ~ all_schemas | length ~ " schema(s) in database '" ~ database ~ "' for prefixes " ~ prefixes | string, info=true) }}

{% for schema_name in all_schemas | sort %}
{% set schema_lower = schema_name.lower() %}
{% for prefix in prefixes %}
{% if schema_lower.startswith(prefix.lower()) %}
{% set remainder = schema_lower[prefix | length :] %}
{# Timestamp format: YYMMDD_HHMMSS (13 chars) followed by _ #}
{% if remainder | length >= 14 and remainder[6:7] == '_' and remainder[13:14] == '_' %}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. could we use a regex with groups to match the branch name parts? it's a bit cleaner no? modules.re is available in dbt.
  2. let's make a parse_ci_branch_name macro and use it here to simplify the main macro.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — replaced the manual character-by-character validation with modules.re.match('^(\\d{6})_(\\d{6})_.+', remainder) and extracted it into a parse_ci_schema_name macro. The main loop is now much cleaner — just calls parse_ci_schema_name(schema_name, prefixes) which returns a datetime or none.

Named it parse_ci_schema_name (instead of parse_ci_branch_name) since it takes a schema name as input and returns the parsed timestamp. Let me know if you'd prefer a different name.

{% set ts_str = remainder[:13] %}
{# Validate: positions 0-5 and 7-12 must be digits #}
{% set digits = ts_str[:6] ~ ts_str[7:] %}
{% set ns_valid = namespace(ok=true) %}
{% for c in digits %}
{% if c not in '0123456789' %}
{% set ns_valid.ok = false %}
{% endif %}
{% endfor %}
{% if ns_valid.ok %}
{% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

strptime can raise ValueError for invalid date values.

The digit validation ensures characters are numeric but doesn't validate actual date ranges. For example, 260232_112345 (day 32) would pass digit validation but cause strptime to raise ValueError, aborting the entire cleanup run.

Consider wrapping in a try/except to skip malformed schemas gracefully:

🛡️ Suggested defensive handling
          {% if ns_valid.ok %}
-           {% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %}
-           {% set age_seconds = (now - schema_ts).total_seconds() %}
-           {% if age_seconds > max_age_seconds %}
+           {% set schema_ts = none %}
+           {% try %}
+             {% set schema_ts = modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') %}
+           {% except %}
+             {{ log("  skip " ~ schema_name ~ "  (invalid timestamp: " ~ ts_str ~ ")", info=true) }}
+           {% endtry %}
+           {% if schema_ts is not none %}
+             {% set age_seconds = (now - schema_ts).total_seconds() %}
+             {% if age_seconds > max_age_seconds %}

Note: Jinja2 doesn't have native try/except. You may need to validate month (01-12) and day (01-31) ranges explicitly, or accept the current behavior given CI schemas are auto-generated with valid timestamps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql` around lines 45 - 46, The
current call to modules.datetime.datetime.strptime(ts_str, '%y%m%d_%H%M%S') can
raise ValueError for malformed dates; before calling strptime (inside the
ns_valid.ok branch where schema_ts is set), validate the numeric ranges of the
parsed components from ts_str (year/day/month/hour/minute/second substrings) —
e.g., month 01-12, day 01-31, hour 00-23, minute/second 00-59 — and only call
modules.datetime.datetime.strptime to create schema_ts when those checks pass;
if validation fails, skip that schema (do not call strptime) so malformed ts_str
values don’t abort the cleanup run.

{% set age_seconds = (now - schema_ts).total_seconds() %}
{% if age_seconds > max_age_seconds %}
{{ log(" DROP " ~ schema_name ~ " (age: " ~ (age_seconds / 3600) | round(1) ~ " h)", info=true) }}
{% do elementary.drop_ci_schema(database, schema_name) %}
{% set ns.dropped = ns.dropped + 1 %}
{% else %}
{{ log(" keep " ~ schema_name ~ " (age: " ~ (age_seconds / 3600) | round(1) ~ " h)", info=true) }}
{% endif %}
{% endif %}
{% endif %}
{% endif %}
{% endfor %}
{% endfor %}

{{ log("CI schema cleanup complete. Dropped " ~ ns.dropped ~ " stale schema(s).", info=true) }}
{% endmacro %}


{# ── Per-adapter schema drop ─────────────────────────────────────────── #}

{% macro drop_ci_schema(database, schema_name) %}
{% do return(adapter.dispatch('drop_ci_schema', 'elementary')(database, schema_name)) %}
{% endmacro %}

{% macro default__drop_ci_schema(database, schema_name) %}
{% set schema_relation = api.Relation.create(database=database, schema=schema_name) %}
{% do dbt.drop_schema(schema_relation) %}
{% do adapter.commit() %}
{% endmacro %}

{% macro clickhouse__drop_ci_schema(database, schema_name) %}
{% do run_query("DROP DATABASE IF EXISTS " ~ schema_name) %}
{% do adapter.commit() %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Quote the ClickHouse database identifier before executing DROP DATABASE.

Line 100 concatenates schema_name directly into SQL, which is brittle for non-simple identifiers.

♻️ Proposed fix
 {% macro clickhouse__drop_ci_schema(database, schema_name) %}
-  {% do run_query("DROP DATABASE IF EXISTS " ~ schema_name) %}
+  {% do run_query("DROP DATABASE IF EXISTS " ~ adapter.quote(schema_name)) %}
   {% do adapter.commit() %}
 {% endmacro %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% macro clickhouse__drop_ci_schema(database, schema_name) %}
{% do run_query("DROP DATABASE IF EXISTS " ~ schema_name) %}
{% do adapter.commit() %}
{% macro clickhouse__drop_ci_schema(database, schema_name) %}
{% do run_query("DROP DATABASE IF EXISTS " ~ adapter.quote(schema_name)) %}
{% do adapter.commit() %}
{% endmacro %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/utils/ci_cleanup/drop_stale_ci_schemas.sql` around lines 99 - 101, The
macro clickhouse__drop_ci_schema currently injects schema_name directly into the
SQL passed to run_query; change it to quote/escape the identifier before
building the DROP DATABASE statement (e.g., wrap the schema name in backticks or
use the adapter's identifier-quoting helper) and ensure any existing quote
characters in schema_name are escaped to avoid SQL injection or syntax errors;
update the run_query call that builds "DROP DATABASE IF EXISTS ..." to use the
quoted/escaped schema identifier instead of raw schema_name.

{% endmacro %}
Loading