Skip to content

Commit 3c6e7bb

Browse files
haritamarcedric-orangeclaude
authored
feat: opt-in config to filter artifacts to current project only (#968)
* fix: filter dbt artifacts to upload only current project When using tools like dbt-loom, graph.nodes.values() returns nodes from multiple projects. Filter by package_name == project_name to avoid uploading unnecessary information from dependency packages. Co-Authored-By: Cédric OLIVIER <76560097+cedric-orange@users.noreply.github.com> * feat: make project-scoped artifact filtering opt-in via config var Add `upload_only_current_project_artifacts` config var (default: false) that, when enabled, filters graph entities by `package_name` so only artifacts from the current dbt project are uploaded -- excluding those from dependency packages. A shared `filter_to_current_project` helper macro centralizes the filtering logic and is called by all 7 upload_dbt_*.sql macros. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add integration tests for upload_only_current_project_artifacts Tests cover: - Default behavior includes dependency package artifacts - Filtering excludes dependency artifacts from dbt_models - Filtering applies to sources - Filtering applies to tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: rename filter_to_current_project to filter_to_current_project_if_needed The macro conditionally filters based on a config flag, so the name should reflect that the filtering is not always applied. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use post_hook full-replace in project filtering tests The on-run-end hook uses diff-based upload when artifact hashes are available, which doesn't remove pre-existing rows from dependency packages. Switch tests to run dbt_models/dbt_tests models directly, triggering their post_hook which does a full table replace, ensuring a clean slate for assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use DELETE + on-run-end diff for project filtering tests replace_table_data is broken on multiple adapters when called from a post_hook context (Trino: invalid kwarg, Redshift: multi-statement prepared statement, etc.). These are pre-existing bugs masked by cache_artifacts defaulting to true. Instead, clear the table with DELETE first, then use the on-run-end hook with diff-based upload. The diff sees an empty table and inserts only the filtered (current-project) artifacts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unused tmp_path parameter from test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cédric OLIVIER <76560097+cedric-orange@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a4143a5 commit 3c6e7bb

File tree

10 files changed

+137
-13
lines changed

10 files changed

+137
-13
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
"""
2+
Integration tests for the upload_only_current_project_artifacts config var.
3+
4+
When enabled, artifact uploads should only include resources from the current
5+
project (by package_name), excluding artifacts from dependency packages.
6+
When disabled (default), all artifacts including dependencies should be uploaded.
7+
"""
8+
9+
import uuid
10+
11+
from dbt_project import DbtProject
12+
13+
TEST_MODEL = "one"
14+
15+
16+
def test_default_includes_dependency_artifacts(dbt_project: DbtProject):
17+
"""
18+
By default (upload_only_current_project_artifacts=false), artifacts from
19+
dependency packages (like 'elementary') should be present in dbt_models.
20+
"""
21+
dbt_project.dbt_runner.vars["disable_dbt_artifacts_autoupload"] = False
22+
dbt_project.dbt_runner.vars["cache_artifacts"] = False
23+
24+
dbt_project.dbt_runner.run(select=TEST_MODEL)
25+
26+
all_models = dbt_project.read_table("dbt_models", raise_if_empty=True)
27+
package_names = {row["package_name"] for row in all_models}
28+
29+
assert "elementary" in package_names, (
30+
"Expected 'elementary' package artifacts to be present by default, "
31+
f"but only found packages: {package_names}"
32+
)
33+
assert "elementary_tests" in package_names, (
34+
"Expected 'elementary_tests' package artifacts to be present, "
35+
f"but only found packages: {package_names}"
36+
)
37+
38+
39+
def test_filtering_excludes_dependency_artifacts(dbt_project: DbtProject):
40+
"""
41+
When upload_only_current_project_artifacts=true, only artifacts from the
42+
current project should be uploaded — dependency packages like 'elementary'
43+
should be excluded.
44+
45+
We first clear the dbt_models table, then run with filtering enabled via
46+
the on-run-end hook. The diff upload sees an empty table and inserts all
47+
(filtered) artifacts, so only current-project rows end up in the table.
48+
"""
49+
# Clear existing rows so the diff upload will insert fresh filtered data.
50+
dbt_project.run_query("DELETE FROM {{ ref('dbt_models') }} WHERE 1=1")
51+
52+
dbt_project.dbt_runner.vars["disable_dbt_artifacts_autoupload"] = False
53+
dbt_project.dbt_runner.vars["upload_only_current_project_artifacts"] = True
54+
55+
dbt_project.dbt_runner.run(select=TEST_MODEL)
56+
57+
all_models = dbt_project.read_table("dbt_models", raise_if_empty=True)
58+
package_names = {row["package_name"] for row in all_models}
59+
60+
assert package_names == {"elementary_tests"}, (
61+
"Expected only 'elementary_tests' artifacts when filtering is enabled, "
62+
f"but found packages: {package_names}"
63+
)
64+
65+
66+
def test_filtering_applies_to_tests(dbt_project: DbtProject):
67+
"""
68+
When upload_only_current_project_artifacts=true, tests from the current
69+
project should still be uploaded.
70+
"""
71+
unique_id = str(uuid.uuid4()).replace("-", "_")
72+
model_name = f"filter_test_model_{unique_id}"
73+
model_sql = "select 1 as col"
74+
schema_yaml = {
75+
"version": 2,
76+
"models": [
77+
{
78+
"name": model_name,
79+
"columns": [{"name": "col", "tests": ["unique"]}],
80+
}
81+
],
82+
}
83+
84+
with dbt_project.write_yaml(
85+
schema_yaml, name=f"schema_filter_test_{unique_id}.yml"
86+
):
87+
dbt_model_path = dbt_project.models_dir_path / "tmp" / f"{model_name}.sql"
88+
dbt_model_path.parent.mkdir(parents=True, exist_ok=True)
89+
dbt_model_path.write_text(model_sql)
90+
try:
91+
# Clear existing rows so the diff upload inserts fresh filtered data.
92+
dbt_project.run_query("DELETE FROM {{ ref('dbt_tests') }} WHERE 1=1")
93+
94+
dbt_project.dbt_runner.vars["disable_dbt_artifacts_autoupload"] = False
95+
dbt_project.dbt_runner.vars["upload_only_current_project_artifacts"] = True
96+
97+
dbt_project.dbt_runner.run(select=model_name)
98+
99+
tests = dbt_project.read_table("dbt_tests", raise_if_empty=True)
100+
test_packages = {row["package_name"] for row in tests}
101+
assert test_packages == {"elementary_tests"}, (
102+
"Expected only 'elementary_tests' tests when filtering is enabled, "
103+
f"but found packages: {test_packages}"
104+
)
105+
finally:
106+
if dbt_model_path.exists():
107+
dbt_model_path.unlink()
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{%- macro filter_to_current_project_if_needed(entities) -%}
2+
{%- if elementary.get_config_var("upload_only_current_project_artifacts") -%}
3+
{% set project_name = elementary.get_project_name() %}
4+
{% do return(
5+
entities | selectattr("package_name", "==", project_name) | list
6+
) %}
7+
{%- else -%} {% do return(entities | list) %}
8+
{%- endif -%}
9+
{%- endmacro -%}

macros/edr/dbt_artifacts/upload_dbt_columns.sql

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
{%- macro upload_dbt_columns(should_commit=false, metadata_hashes=none) -%}
22
{% set relation = elementary.get_elementary_relation("dbt_columns") %}
33
{% if execute and relation %}
4-
{% set tables = graph.nodes.values() | list + graph.sources.values() | list %}
4+
{% set tables = elementary.filter_to_current_project_if_needed(
5+
graph.nodes.values()
6+
) + elementary.filter_to_current_project_if_needed(
7+
graph.sources.values()
8+
) %}
59
{% do elementary.upload_artifacts_to_table(
610
relation,
711
tables,

macros/edr/dbt_artifacts/upload_dbt_exposures.sql

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
{%- macro upload_dbt_exposures(should_commit=false, metadata_hashes=none) -%}
22
{% set relation = elementary.get_elementary_relation("dbt_exposures") %}
33
{% if execute and relation %}
4-
{% set exposures = graph.exposures.values() | selectattr(
5-
"resource_type", "==", "exposure"
4+
{% set exposures = elementary.filter_to_current_project_if_needed(
5+
graph.exposures.values()
6+
| selectattr("resource_type", "==", "exposure")
67
) %}
78
{% do elementary.upload_artifacts_to_table(
89
relation,

macros/edr/dbt_artifacts/upload_dbt_models.sql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
{%- macro upload_dbt_models(should_commit=false, metadata_hashes=none) -%}
22
{% set relation = elementary.get_elementary_relation("dbt_models") %}
33
{% if execute and relation %}
4-
{% set models = graph.nodes.values() | selectattr(
5-
"resource_type", "==", "model"
4+
{% set models = elementary.filter_to_current_project_if_needed(
5+
graph.nodes.values() | selectattr("resource_type", "==", "model")
66
) %}
77
{% do elementary.upload_artifacts_to_table(
88
relation,

macros/edr/dbt_artifacts/upload_dbt_seeds.sql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
{%- macro upload_dbt_seeds(should_commit=false, metadata_hashes=none) -%}
22
{% set relation = elementary.get_elementary_relation("dbt_seeds") %}
33
{% if execute and relation %}
4-
{% set seeds = graph.nodes.values() | selectattr(
5-
"resource_type", "==", "seed"
4+
{% set seeds = elementary.filter_to_current_project_if_needed(
5+
graph.nodes.values() | selectattr("resource_type", "==", "seed")
66
) %}
77
{% do elementary.upload_artifacts_to_table(
88
relation,

macros/edr/dbt_artifacts/upload_dbt_snapshots.sql

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
{%- macro upload_dbt_snapshots(should_commit=false, metadata_hashes=none) -%}
22
{% set relation = elementary.get_elementary_relation("dbt_snapshots") %}
33
{% if execute and relation %}
4-
{% set snapshots = graph.nodes.values() | selectattr(
5-
"resource_type", "==", "snapshot"
4+
{% set snapshots = elementary.filter_to_current_project_if_needed(
5+
graph.nodes.values()
6+
| selectattr("resource_type", "==", "snapshot")
67
) %}
78
{% do elementary.upload_artifacts_to_table(
89
relation,

macros/edr/dbt_artifacts/upload_dbt_sources.sql

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
{%- macro upload_dbt_sources(should_commit=false, metadata_hashes=none) -%}
22
{% set relation = elementary.get_elementary_relation("dbt_sources") %}
33
{% if execute and relation %}
4-
{% set sources = graph.sources.values() | selectattr(
5-
"resource_type", "==", "source"
4+
{% set sources = elementary.filter_to_current_project_if_needed(
5+
graph.sources.values()
6+
| selectattr("resource_type", "==", "source")
67
) %}
78
{% do elementary.upload_artifacts_to_table(
89
relation,

macros/edr/dbt_artifacts/upload_dbt_tests.sql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
{%- macro upload_dbt_tests(should_commit=false, metadata_hashes=none) -%}
22
{% set relation = elementary.get_elementary_relation("dbt_tests") %}
33
{% if execute and relation %}
4-
{% set tests = graph.nodes.values() | selectattr(
5-
"resource_type", "==", "test"
4+
{% set tests = elementary.filter_to_current_project_if_needed(
5+
graph.nodes.values() | selectattr("resource_type", "==", "test")
66
) %}
77
{% do elementary.upload_artifacts_to_table(
88
relation,

macros/edr/system/system_utils/get_config_var.sql

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@
144144
"disable_samples_on_pii_tags": false,
145145
"pii_tags": ["pii"],
146146
"bigquery_disable_partitioning": false,
147+
"upload_only_current_project_artifacts": false,
147148
} %}
148149
{{- return(default_config) -}}
149150
{%- endmacro -%}

0 commit comments

Comments
 (0)