Fix BigQuery scan for test result rows by tlangton3 · Pull Request #1940 · elementary-data/elementary

tlangton3 · 2025-06-20T09:37:30Z

For my test result rows, this reduced the scan from 26GB to 35.7MB on a 1 day lookback, as the current implementation omits partition pruning.

Summary by CodeRabbit

New Features
- Added a dispatcher and adapter-specific implementations for fetching and grouping recent test result rows, including a BigQuery-tailored path for timestamp handling.
Bug Fixes
- Standardized date-based filtering and optional ID filtering to ensure consistent selection and grouping of result rows across platforms.

coderabbitai · 2025-06-20T09:37:36Z

Walkthrough

Added an adapter-dispatched macro get_result_rows_agate(days_back, valid_ids_query) with default and BigQuery-specific implementations; both apply date-based filtering, optionally filter by valid_ids_query, and return results grouped by elementary_test_results_id.

Changes

Cohort / File(s)	Change Summary
Macros — get_result_rows_agate `elementary/monitor/dbt_project/macros/get_result_rows_agate.sql`	Added dispatcher macro `get_result_rows_agate(days_back, valid_ids_query = none)` and two implementations: `default__get_result_rows_agate` (DATEDIFF-style filtering) and `bigquery__get_result_rows_agate` (uses `edr_timeadd`/timestamp logic). Both group by `elementary_test_results_id` and support optional `valid_ids_query` filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Macro caller
    participant Adapter as adapter.dispatch
    participant Default as default__get_result_rows_agate
    participant BigQuery as bigquery__get_result_rows_agate

    Note right of Caller: call get_result_rows_agate(days_back, valid_ids_query)

    Caller->>Adapter: adapter.dispatch("get_result_rows_agate", args)
    alt Adapter selects BigQuery
        Adapter->>BigQuery: invoke bigquery__get_result_rows_agate
        BigQuery-->>Caller: return grouped rows (by elementary_test_results_id)
    else Adapter selects other
        Adapter->>Default: invoke default__get_result_rows_agate
        Default-->>Caller: return grouped rows (by elementary_test_results_id)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review SQL date/timestamp expressions in both implementations.
Verify adapter.dispatch usage and default parameter handling.
Check grouping logic and integration of valid_ids_query for correctness and injection safety.

Poem

In a burrow of macros I hop and I sing,
I route days and timestamps with a joyful spring.
Default counts days, BigQuery counts time,
Grouped rows return in a neat little rhyme.
Hooray for small hops that make queries fine! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix BigQuery scan for test result rows' directly aligns with the PR's main objective of addressing BigQuery scan issues and optimizing query performance through partition pruning.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 60fa3db and df4693f.

📒 Files selected for processing (1)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2 hunks)

🔇 Additional comments (3)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (3)

1-3: Dispatcher macro correctly routes to adapter-specific implementations.

The use of adapter.dispatch() follows dbt conventions and enables the appropriate implementation to be selected at runtime.

5-21: Default implementation looks good.

The datediff-based approach with empty result handling is appropriate for databases that don't have partition pruning constraints. The conditional filtering by valid_ids_query is correctly implemented.

23-39: BigQuery implementation correctly enables partition pruning.

Line 29's direct timestamp comparison (detected_at > edr_timeadd(...)) avoids function calls on the column, which enables BigQuery's native partition pruning—aligning with the PR objective. The operator difference from the default implementation (> vs. <) is semantically correct: both filters represent the same logical window ("rows within the last N days"), just expressed differently due to the distinct comparison forms. Empty result handling is properly included (lines 34-38), addressing the concern from the previous review.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-06-20T09:37:39Z

👋 @tlangton3
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)

11-14: Inconsistent date-filter boundaries across implementations.

The default macro uses edr_datediff(...) < days_back, which excludes rows exactly days_back old, while the BigQuery implementation uses detected_at >= now - days_back, including them. Align the boundary logic to guarantee uniform behavior across adapters.

Also applies to: 25-28

🧹 Nitpick comments (1)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)

5-17: Default implementation may hinder partition pruning.

Applying edr_datediff on the detected_at column can prevent partition pruning in partitioned tables for adapters that support it. Consider using a direct timestamp range filter (e.g., detected_at >= …) in the default implementation or providing adapter-specific overrides to ensure optimal performance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95999ac and 62f8fe9.

📒 Files selected for processing (1)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2 hunks)

🔇 Additional comments (2)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2)

1-3: Use adapter.dispatch for clearer adapter-specific logic.

Switching to adapter.dispatch cleanly separates implementations and paves the way for other adapter support.

19-31: Direct timestamp comparison enables partition pruning.

The BigQuery-specific macro’s use of detected_at >= edr_timeadd(...) correctly leverages partition pruning and drastically reduces scanned data. This change addresses the core performance issue.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)

5-35: Consider extracting common SELECT logic to reduce duplication.

The SELECT statement and conditional filtering logic are identical between the default and BigQuery implementations. While acceptable given the different WHERE clauses, extracting the common parts into a helper macro could improve maintainability. This is optional since the duplication is minimal.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 62f8fe9 and 60fa3db.

📒 Files selected for processing (1)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2 hunks)

🔇 Additional comments (2)

elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2)

1-3: Dispatcher macro correctly routes to adapter-specific implementations.

The adapter dispatch pattern is correctly implemented and will route to the appropriate implementation based on the database adapter.

5-21: Default implementation logic is sound and consistent.

The use of edr_datediff with empty result handling is appropriate for non-BigQuery adapters. The optional filtering by valid_ids_query and grouping by elementary_test_results_id are correct.

michael-myaskovsky · 2025-11-18T14:34:20Z

+    elementary_test_results_id,
+    result_row
+  from {{ ref("elementary", "test_result_rows") }}
+  where detected_at >= {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}


Suggested change

where detected_at >= {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}

where detected_at > {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}

for consistency with the default macro

Add new macros for retrieving result rows in Agate

62f8fe9

tlangton3 had a problem deploying to elementary_test_env June 20, 2025 09:37 — with GitHub Actions Error

coderabbitai Bot reviewed Jun 20, 2025

View reviewed changes

ng-oliver approved these changes Jun 20, 2025

View reviewed changes

Merge branch 'master' into fix-bigquery-test_result_rows_scan

d5234ea

tlangton3 had a problem deploying to elementary_test_env June 24, 2025 08:53 — with GitHub Actions Failure

Merge branch 'master' into fix-bigquery-test_result_rows_scan

60fa3db

arbiv temporarily deployed to elementary_test_env November 11, 2025 09:36 — with GitHub Actions Inactive

coderabbitai Bot reviewed Nov 11, 2025

View reviewed changes

Comment thread elementary/monitor/dbt_project/macros/get_result_rows_agate.sql

michael-myaskovsky reviewed Nov 18, 2025

View reviewed changes

pr comments

df4693f

tlangton3 had a problem deploying to elementary_test_env November 18, 2025 14:52 — with GitHub Actions Error

Merge branch 'master' into fix-bigquery-test_result_rows_scan

0b46e8b

michael-myaskovsky temporarily deployed to elementary_test_env November 18, 2025 14:59 — with GitHub Actions Inactive

michael-myaskovsky approved these changes Nov 18, 2025

View reviewed changes

michael-myaskovsky merged commit 9ffe357 into elementary-data:master Nov 18, 2025
5 checks passed

elazarlachkar mentioned this pull request Nov 24, 2025

Fix namespace of macro call #2061

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BigQuery scan for test result rows#1940

Fix BigQuery scan for test result rows#1940
michael-myaskovsky merged 5 commits intoelementary-data:masterfrom
tlangton3:fix-bigquery-test_result_rows_scan

tlangton3 commented Jun 20, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 20, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 20, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

michael-myaskovsky Nov 18, 2025

Uh oh!

tlangton3 Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	where detected_at >= {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}
	where detected_at > {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}

Conversation

tlangton3 commented Jun 20, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions Bot commented Jun 20, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michael-myaskovsky Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

tlangton3 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tlangton3 commented Jun 20, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 20, 2025 •

edited

Loading