Skip to content

Fix BigQuery scan for test result rows#1940

Merged
michael-myaskovsky merged 5 commits intoelementary-data:masterfrom
tlangton3:fix-bigquery-test_result_rows_scan
Nov 18, 2025
Merged

Fix BigQuery scan for test result rows#1940
michael-myaskovsky merged 5 commits intoelementary-data:masterfrom
tlangton3:fix-bigquery-test_result_rows_scan

Conversation

@tlangton3
Copy link
Copy Markdown
Contributor

@tlangton3 tlangton3 commented Jun 20, 2025

For my test result rows, this reduced the scan from 26GB to 35.7MB on a 1 day lookback, as the current implementation omits partition pruning.

Summary by CodeRabbit

  • New Features
    • Added a dispatcher and adapter-specific implementations for fetching and grouping recent test result rows, including a BigQuery-tailored path for timestamp handling.
  • Bug Fixes
    • Standardized date-based filtering and optional ID filtering to ensure consistent selection and grouping of result rows across platforms.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 20, 2025

Walkthrough

Added an adapter-dispatched macro get_result_rows_agate(days_back, valid_ids_query) with default and BigQuery-specific implementations; both apply date-based filtering, optionally filter by valid_ids_query, and return results grouped by elementary_test_results_id.

Changes

Cohort / File(s) Change Summary
Macros — get_result_rows_agate
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql
Added dispatcher macro get_result_rows_agate(days_back, valid_ids_query = none) and two implementations: default__get_result_rows_agate (DATEDIFF-style filtering) and bigquery__get_result_rows_agate (uses edr_timeadd/timestamp logic). Both group by elementary_test_results_id and support optional valid_ids_query filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Macro caller
    participant Adapter as adapter.dispatch
    participant Default as default__get_result_rows_agate
    participant BigQuery as bigquery__get_result_rows_agate

    Note right of Caller: call get_result_rows_agate(days_back, valid_ids_query)

    Caller->>Adapter: adapter.dispatch("get_result_rows_agate", args)
    alt Adapter selects BigQuery
        Adapter->>BigQuery: invoke bigquery__get_result_rows_agate
        BigQuery-->>Caller: return grouped rows (by elementary_test_results_id)
    else Adapter selects other
        Adapter->>Default: invoke default__get_result_rows_agate
        Default-->>Caller: return grouped rows (by elementary_test_results_id)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review SQL date/timestamp expressions in both implementations.
  • Verify adapter.dispatch usage and default parameter handling.
  • Check grouping logic and integration of valid_ids_query for correctness and injection safety.

Poem

In a burrow of macros I hop and I sing,
I route days and timestamps with a joyful spring.
Default counts days, BigQuery counts time,
Grouped rows return in a neat little rhyme.
Hooray for small hops that make queries fine! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix BigQuery scan for test result rows' directly aligns with the PR's main objective of addressing BigQuery scan issues and optimizing query performance through partition pruning.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 60fa3db and df4693f.

📒 Files selected for processing (1)
  • elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2 hunks)
🔇 Additional comments (3)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (3)

1-3: Dispatcher macro correctly routes to adapter-specific implementations.

The use of adapter.dispatch() follows dbt conventions and enables the appropriate implementation to be selected at runtime.


5-21: Default implementation looks good.

The datediff-based approach with empty result handling is appropriate for databases that don't have partition pruning constraints. The conditional filtering by valid_ids_query is correctly implemented.


23-39: BigQuery implementation correctly enables partition pruning.

Line 29's direct timestamp comparison (detected_at > edr_timeadd(...)) avoids function calls on the column, which enables BigQuery's native partition pruning—aligning with the PR objective. The operator difference from the default implementation (> vs. <) is semantically correct: both filters represent the same logical window ("rows within the last N days"), just expressed differently due to the distinct comparison forms. Empty result handling is properly included (lines 34-38), addressing the concern from the previous review.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

👋 @tlangton3
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)

11-14: Inconsistent date-filter boundaries across implementations.

The default macro uses edr_datediff(...) < days_back, which excludes rows exactly days_back old, while the BigQuery implementation uses detected_at >= now - days_back, including them. Align the boundary logic to guarantee uniform behavior across adapters.

Also applies to: 25-28

🧹 Nitpick comments (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)

5-17: Default implementation may hinder partition pruning.

Applying edr_datediff on the detected_at column can prevent partition pruning in partitioned tables for adapters that support it. Consider using a direct timestamp range filter (e.g., detected_at >= …) in the default implementation or providing adapter-specific overrides to ensure optimal performance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95999ac and 62f8fe9.

📒 Files selected for processing (1)
  • elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2 hunks)
🔇 Additional comments (2)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2)

1-3: Use adapter.dispatch for clearer adapter-specific logic.

Switching to adapter.dispatch cleanly separates implementations and paves the way for other adapter support.


19-31: Direct timestamp comparison enables partition pruning.

The BigQuery-specific macro’s use of detected_at >= edr_timeadd(...) correctly leverages partition pruning and drastically reduces scanned data. This change addresses the core performance issue.

@tlangton3 tlangton3 had a problem deploying to elementary_test_env June 24, 2025 08:53 — with GitHub Actions Failure
@arbiv arbiv temporarily deployed to elementary_test_env November 11, 2025 09:36 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)

5-35: Consider extracting common SELECT logic to reduce duplication.

The SELECT statement and conditional filtering logic are identical between the default and BigQuery implementations. While acceptable given the different WHERE clauses, extracting the common parts into a helper macro could improve maintainability. This is optional since the duplication is minimal.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 62f8fe9 and 60fa3db.

📒 Files selected for processing (1)
  • elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2 hunks)
🔇 Additional comments (2)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2)

1-3: Dispatcher macro correctly routes to adapter-specific implementations.

The adapter dispatch pattern is correctly implemented and will route to the appropriate implementation based on the database adapter.


5-21: Default implementation logic is sound and consistent.

The use of edr_datediff with empty result handling is appropriate for non-BigQuery adapters. The optional filtering by valid_ids_query and grouping by elementary_test_results_id are correct.

Comment thread elementary/monitor/dbt_project/macros/get_result_rows_agate.sql
elementary_test_results_id,
result_row
from {{ ref("elementary", "test_result_rows") }}
where detected_at >= {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where detected_at >= {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}
where detected_at > {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }}

for consistency with the default macro

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@michael-myaskovsky michael-myaskovsky merged commit 9ffe357 into elementary-data:master Nov 18, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants