Fix BigQuery scan for test result rows#1940
Fix BigQuery scan for test result rows#1940michael-myaskovsky merged 5 commits intoelementary-data:masterfrom
Conversation
WalkthroughAdded an adapter-dispatched macro Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Macro caller
participant Adapter as adapter.dispatch
participant Default as default__get_result_rows_agate
participant BigQuery as bigquery__get_result_rows_agate
Note right of Caller: call get_result_rows_agate(days_back, valid_ids_query)
Caller->>Adapter: adapter.dispatch("get_result_rows_agate", args)
alt Adapter selects BigQuery
Adapter->>BigQuery: invoke bigquery__get_result_rows_agate
BigQuery-->>Caller: return grouped rows (by elementary_test_results_id)
else Adapter selects other
Adapter->>Default: invoke default__get_result_rows_agate
Default-->>Caller: return grouped rows (by elementary_test_results_id)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🔇 Additional comments (3)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 @tlangton3 |
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)
11-14: Inconsistent date-filter boundaries across implementations.The default macro uses
edr_datediff(...) < days_back, which excludes rows exactlydays_backold, while the BigQuery implementation usesdetected_at >= now - days_back, including them. Align the boundary logic to guarantee uniform behavior across adapters.Also applies to: 25-28
🧹 Nitpick comments (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)
5-17: Default implementation may hinder partition pruning.Applying
edr_datediffon thedetected_atcolumn can prevent partition pruning in partitioned tables for adapters that support it. Consider using a direct timestamp range filter (e.g.,detected_at >= …) in the default implementation or providing adapter-specific overrides to ensure optimal performance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql(2 hunks)
🔇 Additional comments (2)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2)
1-3: Use adapter.dispatch for clearer adapter-specific logic.Switching to
adapter.dispatchcleanly separates implementations and paves the way for other adapter support.
19-31: Direct timestamp comparison enables partition pruning.The BigQuery-specific macro’s use of
detected_at >= edr_timeadd(...)correctly leverages partition pruning and drastically reduces scanned data. This change addresses the core performance issue.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (1)
5-35: Consider extracting common SELECT logic to reduce duplication.The SELECT statement and conditional filtering logic are identical between the default and BigQuery implementations. While acceptable given the different WHERE clauses, extracting the common parts into a helper macro could improve maintainability. This is optional since the duplication is minimal.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql(2 hunks)
🔇 Additional comments (2)
elementary/monitor/dbt_project/macros/get_result_rows_agate.sql (2)
1-3: Dispatcher macro correctly routes to adapter-specific implementations.The adapter dispatch pattern is correctly implemented and will route to the appropriate implementation based on the database adapter.
5-21: Default implementation logic is sound and consistent.The use of
edr_datediffwith empty result handling is appropriate for non-BigQuery adapters. The optional filtering byvalid_ids_queryand grouping byelementary_test_results_idare correct.
| elementary_test_results_id, | ||
| result_row | ||
| from {{ ref("elementary", "test_result_rows") }} | ||
| where detected_at >= {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }} |
There was a problem hiding this comment.
| where detected_at >= {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }} | |
| where detected_at > {{ elementary.edr_timeadd('day', -1 * days_back, elementary.edr_current_timestamp()) }} |
for consistency with the default macro
For my test result rows, this reduced the scan from 26GB to 35.7MB on a 1 day lookback, as the current implementation omits partition pruning.
Summary by CodeRabbit