Skip to content

Commit 17aaf7f

Browse files
joostboonclaude
andauthored
feat: add show_sample_rows tag and extend PII protection to model, column, and test level (#973)
* feat: add show_sample_rows tag to override PII-based sample hiding Adds a new tag mechanism that is the inverse of the existing PII tag behavior. When enable_samples_on_show_sample_rows_tags is true, models or columns tagged with show_sample_rows will have their samples shown even when PII tags would otherwise suppress them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: apply sqlfmt formatting to get_pii_columns_from_parent_model.sql * chore: add .claude/ to .gitignore * fix: use string check for show_sample_rows_tags normalization to avoid char-splitting * fix: pii tag takes precedence over show_sample_rows tag * fix: hide samples by default when enable_samples_on_show_sample_rows_tags is true * feat: expand show_sample_rows to model, column, and test level independently of PII * feat: add test-level pii tag support for consistent behavior with show_sample_rows * docs(code): add explanatory comments to show_sample_rows and PII sampling macros * fix: address review findings in show_sample_rows and PII macros - Restructure empty elif branch in test.sql with explicit comments - Column-level show_sample_rows now checks model-level PII tags too - Add explicit parens for Jinja operator precedence in get_pii_columns - Replace 'is iterable' with 'is string' guard in is_pii_table.sql (strings are iterable in Jinja, causing single tags to be split) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply sqlfmt formatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle Vertica Decimal special values (Infinity/NaN) in query runner Vertica's adapter returns Decimal special values where as_tuple().exponent is a string ('F'/'n') instead of int, causing a TypeError on comparison. Cherry-picked from worktree-fix-dimension-alerts (0eecf7d). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent f4e8e6e commit 17aaf7f

File tree

8 files changed

+198
-12
lines changed

8 files changed

+198
-12
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ __pycache__/
2727

2828
# vscode
2929
.vscode/
30+
31+
# Claude Code
32+
.claude/
3033
dbt_internal_packages/
3134

3235
/package-lock.yml

integration_tests/tests/adapter_query_runner.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,14 @@ def _serialize_value(val: Any) -> Any:
5252
* Everything else is returned unchanged.
5353
"""
5454
if isinstance(val, Decimal):
55-
# Match the Jinja macro: normalize, then int or float
55+
# Match the Jinja macro: normalize, then int or float.
56+
# Note: for special values (Infinity, NaN), as_tuple().exponent is a
57+
# string ('F' or 'n'), not an int — convert those directly to float.
5658
normalized = val.normalize()
57-
if normalized.as_tuple().exponent >= 0:
59+
exponent = normalized.as_tuple().exponent
60+
if isinstance(exponent, str):
61+
return float(normalized)
62+
if exponent >= 0:
5863
return int(normalized)
5964
return float(normalized)
6065
if isinstance(val, (datetime, date, time)):

macros/edr/materializations/test/test.sql

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,25 @@
7575
{% set disable_test_samples = flattened_test["meta"]["disable_test_samples"] %}
7676
{% endif %}
7777

78+
{#
79+
Sampling control precedence (highest to lowest):
80+
1. disable_test_samples meta flag — explicit per-test kill switch, always wins.
81+
2. show_sample_rows tag (model/test/column) — opt-in when
82+
enable_samples_on_show_sample_rows_tags is true. If the tag is present,
83+
skip all further checks and keep the sample_limit.
84+
3. enable_samples_on_show_sample_rows_tags — hide-by-default mode: if the
85+
feature is on but no show_sample_rows tag was found, disable samples.
86+
4. PII tag detection (model/test/column) — hide when disable_samples_on_pii_tags
87+
is true and a PII tag is detected at any level.
88+
#}
7889
{% if disable_test_samples %} {% set sample_limit = 0 %}
90+
{% elif elementary.should_show_sample_rows(flattened_test) %}
91+
{# Tag explicitly opts in — keep sample_limit as-is #}
92+
{% elif elementary.get_config_var("enable_samples_on_show_sample_rows_tags") %}
93+
{# Feature is on but no show_sample_rows tag found — hide by default #}
94+
{% set sample_limit = 0 %}
7995
{% elif elementary.is_pii_table(flattened_test) %} {% set sample_limit = 0 %}
96+
{% elif elementary.is_pii_test(flattened_test) %} {% set sample_limit = 0 %}
8097
{% elif elementary.should_disable_sampling_for_pii(flattened_test) %}
8198
{% set sample_limit = 0 %}
8299
{% endif %}

macros/edr/system/system_utils/get_config_var.sql

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,8 @@
143143
"anomaly_exclude_metrics": none,
144144
"disable_samples_on_pii_tags": false,
145145
"pii_tags": ["pii"],
146+
"enable_samples_on_show_sample_rows_tags": false,
147+
"show_sample_rows_tags": ["show_sample_rows"],
146148
"bigquery_disable_partitioning": false,
147149
"bigquery_disable_clustering": false,
148150
"upload_only_current_project_artifacts": false,

macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,36 @@
3838
{% set column_nodes = parent_model.get("columns") %}
3939
{% if not column_nodes %} {% do return(pii_columns) %} {% endif %}
4040

41+
{#
42+
A column tagged show_sample_rows (without pii) should still appear in samples
43+
even when disable_samples_on_pii_tags is active — it is intentionally opted in.
44+
We only skip it from the PII columns list if it does NOT also carry a PII tag,
45+
since PII always takes precedence over show_sample_rows.
46+
#}
47+
{% set enable_show_tags = elementary.get_config_var(
48+
"enable_samples_on_show_sample_rows_tags"
49+
) %}
50+
{% set raw_show_tags = elementary.get_config_var("show_sample_rows_tags") %}
51+
{% if raw_show_tags is string %} {% set show_tags = [raw_show_tags | lower] %}
52+
{% else %} {% set show_tags = (raw_show_tags or []) | map("lower") | list %}
53+
{% endif %}
54+
4155
{% for column_node in column_nodes.values() %}
4256
{% set all_column_tags_lower = elementary.get_column_tags(column_node) %}
4357

58+
{# Skip column from PII list only if show_sample_rows is set and pii is not #}
59+
{% set has_show_tag = enable_show_tags and (
60+
elementary.lists_intersection(all_column_tags_lower, show_tags)
61+
| length
62+
> 0
63+
) %}
64+
{% set has_pii_tag = (
65+
elementary.lists_intersection(all_column_tags_lower, pii_tags)
66+
| length
67+
> 0
68+
) %}
69+
{% if has_show_tag and not has_pii_tag %} {% continue %} {% endif %}
70+
4471
{% for pii_tag in pii_tags %}
4572
{% if pii_tag in all_column_tags_lower %}
4673
{% do pii_columns.append(column_node.get("name")) %} {% break %}

macros/edr/system/system_utils/is_pii_table.sql

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,16 @@
55
{% if not disable_samples_on_pii_tags %} {% do return(false) %} {% endif %}
66

77
{% set raw_pii_tags = elementary.get_config_var("pii_tags") %}
8-
{% set pii_tags = (
9-
(raw_pii_tags if raw_pii_tags is iterable else [raw_pii_tags])
10-
| map("lower")
11-
| list
12-
) %}
8+
{% if raw_pii_tags is string %} {% set pii_tags = [raw_pii_tags | lower] %}
9+
{% else %} {% set pii_tags = (raw_pii_tags or []) | map("lower") | list %}
10+
{% endif %}
1311

1412
{% set raw_model_tags = elementary.insensitive_get_dict_value(
1513
flattened_test, "model_tags", []
1614
) %}
17-
{% set model_tags = (
18-
(raw_model_tags if raw_model_tags is iterable else [raw_model_tags])
19-
| map("lower")
20-
| list
21-
) %}
15+
{% if raw_model_tags is string %} {% set model_tags = [raw_model_tags | lower] %}
16+
{% else %} {% set model_tags = (raw_model_tags or []) | map("lower") | list %}
17+
{% endif %}
2218

2319
{% set intersection = elementary.lists_intersection(model_tags, pii_tags) %}
2420
{% set is_pii = intersection | length > 0 %}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{#
2+
Complements is_pii_table (model-level) and should_disable_sampling_for_pii
3+
(column-level) by adding test-level PII tag support. A test tagged with a PII
4+
tag will have its samples disabled, consistent with the other two levels.
5+
#}
6+
{% macro is_pii_test(flattened_test) %}
7+
{% if not elementary.get_config_var("disable_samples_on_pii_tags") %}
8+
{% do return(false) %}
9+
{% endif %}
10+
11+
{% set raw_pii_tags = elementary.get_config_var("pii_tags") %}
12+
{% if raw_pii_tags is string %} {% set pii_tags = [raw_pii_tags | lower] %}
13+
{% else %} {% set pii_tags = (raw_pii_tags or []) | map("lower") | list %}
14+
{% endif %}
15+
16+
{% set raw_test_tags = elementary.insensitive_get_dict_value(
17+
flattened_test, "tags", []
18+
) %}
19+
{% if raw_test_tags is string %} {% set test_tags = [raw_test_tags | lower] %}
20+
{% else %} {% set test_tags = (raw_test_tags or []) | map("lower") | list %}
21+
{% endif %}
22+
23+
{% do return(elementary.lists_intersection(test_tags, pii_tags) | length > 0) %}
24+
{% endmacro %}
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
{#
2+
Inverse of PII protection: when enable_samples_on_show_sample_rows_tags is true,
3+
samples are hidden by default and only shown when the show_sample_rows tag is present.
4+
5+
Checks three levels in order: model → test → column (test's target column only).
6+
Returns true if any level has a matching show_sample_rows tag.
7+
8+
PII precedence: if disable_samples_on_pii_tags is also enabled and the model
9+
or column has a PII tag, PII wins and this returns false. A model-level PII
10+
tag blocks show_sample_rows at every level (model, test, and column).
11+
12+
All tag matching is case-insensitive (tags are normalized to lowercase).
13+
#}
14+
{% macro should_show_sample_rows(flattened_test) %}
15+
{% if not elementary.get_config_var("enable_samples_on_show_sample_rows_tags") %}
16+
{% do return(false) %}
17+
{% endif %}
18+
19+
{% set raw_show_tags = elementary.get_config_var("show_sample_rows_tags") %}
20+
{% if raw_show_tags is string %} {% set show_tags = [raw_show_tags | lower] %}
21+
{% else %} {% set show_tags = (raw_show_tags or []) | map("lower") | list %}
22+
{% endif %}
23+
24+
{#
25+
Resolve PII tags once upfront. We use `is string` (not `is iterable`) because
26+
strings are iterable in Jinja — iterating a string gives individual characters.
27+
#}
28+
{% set check_pii = elementary.get_config_var("disable_samples_on_pii_tags") %}
29+
{% if check_pii %}
30+
{% set raw_pii_tags = elementary.get_config_var("pii_tags") %}
31+
{% if raw_pii_tags is string %} {% set pii_tags = [raw_pii_tags | lower] %}
32+
{% else %} {% set pii_tags = (raw_pii_tags or []) | map("lower") | list %}
33+
{% endif %}
34+
{% else %} {% set pii_tags = [] %}
35+
{% endif %}
36+
37+
{# Model-level: show_sample_rows on the model applies to all its tests #}
38+
{% set raw_model_tags = elementary.insensitive_get_dict_value(
39+
flattened_test, "model_tags", []
40+
) %}
41+
{% if raw_model_tags is string %} {% set model_tags = [raw_model_tags | lower] %}
42+
{% else %} {% set model_tags = (raw_model_tags or []) | map("lower") | list %}
43+
{% endif %}
44+
{% if elementary.lists_intersection(model_tags, show_tags) | length > 0 %}
45+
{# PII on the model takes precedence over show_sample_rows on the same model #}
46+
{% if check_pii and elementary.lists_intersection(
47+
model_tags, pii_tags
48+
) | length > 0 %}
49+
{% do return(false) %}
50+
{% endif %}
51+
{% do return(true) %}
52+
{% endif %}
53+
54+
{# Test-level: show_sample_rows on the test definition itself #}
55+
{% set raw_test_tags = elementary.insensitive_get_dict_value(
56+
flattened_test, "tags", []
57+
) %}
58+
{% if raw_test_tags is string %} {% set test_tags = [raw_test_tags | lower] %}
59+
{% else %} {% set test_tags = (raw_test_tags or []) | map("lower") | list %}
60+
{% endif %}
61+
{% if elementary.lists_intersection(test_tags, show_tags) | length > 0 %}
62+
{# If the model itself is PII-tagged, respect that even for test-level overrides #}
63+
{% if check_pii and elementary.lists_intersection(
64+
model_tags, pii_tags
65+
) | length > 0 %}
66+
{% do return(false) %}
67+
{% endif %}
68+
{% do return(true) %}
69+
{% endif %}
70+
71+
{#
72+
Column-level: only checks the specific column the test targets (test_column_name),
73+
not all columns on the model. This avoids showing samples for unrelated columns.
74+
#}
75+
{% set test_column_name = elementary.insensitive_get_dict_value(
76+
flattened_test, "test_column_name"
77+
) %}
78+
{% if test_column_name %}
79+
{% set parent_model_unique_id = elementary.insensitive_get_dict_value(
80+
flattened_test, "parent_model_unique_id"
81+
) %}
82+
{% set parent_model = elementary.get_node(parent_model_unique_id) %}
83+
{% if parent_model %}
84+
{% set column_nodes = parent_model.get("columns", {}) %}
85+
{% for col_name, col_node in column_nodes.items() %}
86+
{% if col_name | lower == test_column_name | lower %}
87+
{% set col_tags = elementary.get_column_tags(col_node) %}
88+
{% if elementary.lists_intersection(
89+
col_tags, show_tags
90+
) | length > 0 %}
91+
{# PII on the column or model takes precedence over show_sample_rows #}
92+
{% if check_pii and (
93+
elementary.lists_intersection(col_tags, pii_tags)
94+
| length
95+
> 0
96+
or elementary.lists_intersection(
97+
model_tags, pii_tags
98+
)
99+
| length
100+
> 0
101+
) %}
102+
{% do return(false) %}
103+
{% endif %}
104+
{% do return(true) %}
105+
{% endif %}
106+
{% endif %}
107+
{% endfor %}
108+
{% endif %}
109+
{% endif %}
110+
111+
{% do return(false) %}
112+
{% endmacro %}

0 commit comments

Comments
 (0)