Skip to content

Commit 8accf9f

Browse files
authored
docs: add pre-aggregates reference for materialized query optimization (#455)
1 parent 4c408f3 commit 8accf9f

2 files changed

Lines changed: 160 additions & 0 deletions

File tree

docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@
196196
"references/dimensions",
197197
"references/tables",
198198
"references/joins",
199+
"references/pre-aggregates",
199200
"references/lightdash-cli",
200201
"references/lightdash-config-yml",
201202
"references/sql-variables"

references/pre-aggregates.mdx

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
---
2+
title: "Pre-aggregates reference"
3+
description: "Pre-aggregates materialize aggregated data inside Lightdash so queries can be served from pre-computed results instead of hitting your warehouse."
4+
sidebarTitle: "Pre-aggregates"
5+
---
6+
7+
<Info>
8+
**Availability:** Pre-aggregates are an [Early Access](/references/workspace/feature-maturity-levels) feature available on **Enterprise plans** only.
9+
</Info>
10+
11+
## What are pre-aggregates?
12+
13+
Pre-aggregates let you define materialized summaries of your data directly in your dbt YAML. When a user runs a query in Lightdash, the system checks if the query can be answered from a pre-aggregate instead of querying your warehouse. If it matches, the query is served from the pre-computed results — making it significantly faster and reducing warehouse load.
14+
15+
This is especially useful for dashboards with high traffic or expensive aggregations that don't need real-time data.
16+
17+
### How it works
18+
19+
1. You define a pre-aggregate on a model, specifying which dimensions and metrics to include
20+
2. Lightdash materializes the aggregated data on a schedule
21+
3. When a user runs a query, Lightdash checks if all requested dimensions, metrics, and filters are covered by a pre-aggregate
22+
4. If a match is found, the query is served from the materialized data instead of your warehouse
23+
24+
## Defining pre-aggregates
25+
26+
Pre-aggregates are defined in your dbt model's YAML file under the `pre_aggregates` key in the model's `meta` (or `config.meta` for dbt v1.10+).
27+
28+
<Tabs>
29+
<Tab title="dbt v1.9 and earlier">
30+
```yaml
31+
models:
32+
- name: orders
33+
meta:
34+
pre_aggregates:
35+
- name: orders_daily_by_status
36+
dimensions:
37+
- status
38+
metrics:
39+
- total_order_amount
40+
- average_order_size
41+
time_dimension: order_date
42+
granularity: day
43+
```
44+
</Tab>
45+
<Tab title="dbt v1.10+ and Fusion">
46+
```yaml
47+
models:
48+
- name: orders
49+
config:
50+
meta:
51+
pre_aggregates:
52+
- name: orders_daily_by_status
53+
dimensions:
54+
- status
55+
metrics:
56+
- total_order_amount
57+
- average_order_size
58+
time_dimension: order_date
59+
granularity: day
60+
```
61+
</Tab>
62+
</Tabs>
63+
64+
### Configuration reference
65+
66+
| Property | Required | Description |
67+
|---|---|---|
68+
| `name` | Yes | Unique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores. |
69+
| `dimensions` | Yes | List of dimension names to include. Must contain at least one dimension. |
70+
| `metrics` | Yes | List of metric names to include. Must contain at least one metric. |
71+
| `time_dimension` | No | A time-based dimension to use for date grouping. Must be paired with `granularity`. |
72+
| `granularity` | No | Time granularity for the `time_dimension`. Must be paired with `time_dimension`. Valid values: `hour`, `day`, `week`, `month`, `year`. |
73+
74+
<Note>
75+
If you specify `time_dimension`, you **must** also specify `granularity`, and vice versa.
76+
</Note>
77+
78+
## Query matching
79+
80+
When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when **all** of the following are true:
81+
82+
- Every dimension in the query is included in the pre-aggregate
83+
- Every metric in the query is included in the pre-aggregate
84+
- Every dimension used in filters is included in the pre-aggregate
85+
- All metrics use [supported metric types](#supported-metric-types)
86+
- The query does not contain custom dimensions, custom metrics, or table calculations
87+
- If the query uses a time dimension, the requested granularity is **equal to or coarser** than the pre-aggregate's granularity (e.g., a `day` pre-aggregate can serve `day`, `week`, `month`, or `year` queries, but not `hour`)
88+
89+
When multiple pre-aggregates match a query, Lightdash picks the smallest one (fewest dimensions, then fewest metrics as tiebreaker).
90+
91+
### Dimensions from joined tables
92+
93+
Pre-aggregates support dimensions from joined tables. Reference them by their full name (e.g., `customers.first_name`) in the `dimensions` list.
94+
95+
## Supported metric types
96+
97+
Pre-aggregates support metrics that can be re-aggregated from pre-computed results. The following metric types are supported:
98+
99+
- `sum`
100+
- `count`
101+
- `min`
102+
- `max`
103+
- `average`
104+
105+
Queries that include metrics with other types (e.g., `count_distinct`, `median`, `number`) will not match a pre-aggregate and will query the warehouse directly.
106+
107+
## Example
108+
109+
Here's a complete example showing a model with a pre-aggregate:
110+
111+
```yaml
112+
models:
113+
- name: orders
114+
config:
115+
meta:
116+
joins:
117+
- join: customers
118+
sql_on: ${customers.customer_id} = ${orders.customer_id}
119+
pre_aggregates:
120+
- name: orders_daily_by_status
121+
dimensions:
122+
- status
123+
metrics:
124+
- total_order_amount
125+
- average_order_size
126+
time_dimension: order_date
127+
granularity: day
128+
columns:
129+
- name: order_date
130+
config:
131+
meta:
132+
dimension:
133+
type: date
134+
- name: status
135+
config:
136+
meta:
137+
dimension:
138+
type: string
139+
- name: amount
140+
config:
141+
meta:
142+
metrics:
143+
total_order_amount:
144+
type: sum
145+
average_order_size:
146+
type: average
147+
```
148+
149+
With this pre-aggregate, the following queries would be served from materialized data:
150+
151+
- Total order amount by status, grouped by day/week/month/year
152+
- Average order size by status, grouped by month
153+
- Total order amount filtered by status
154+
155+
These queries would **not** match and would query the warehouse directly:
156+
157+
- Queries including `count_distinct` metrics
158+
- Queries grouped by a dimension not in the pre-aggregate (e.g., `customer_id`)
159+
- Queries with hourly granularity (finer than the pre-aggregate's `day`)

0 commit comments

Comments
 (0)