Skip to content

Commit 971cd55

Browse files
Update data modeling guide with balanced trade-offs discussion
Generated-By: mintlify-agent
1 parent b9ff109 commit 971cd55

1 file changed

Lines changed: 45 additions & 18 deletions

File tree

guides/developer/dbt-model-best-practices.mdx

Lines changed: 45 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,23 @@ description: Learn how to structure your dbt models for optimal performance and
66

77
When building your dbt models for Lightdash, following these best practices will help you create a better experience in Lightdash for your end users and improve query performance.
88

9-
## Use wide, flat tables in the BI layer
9+
## Choosing your data modeling approach
1010

11-
We recommend using **wide, flat tables** in the BI layer because this minimizes complex joins that need to be handled at runtime.
11+
When deciding between wide materialized tables and smaller normalized tables that require multiple joins, you're balancing trade-offs across three dimensions: **performance**, **self-serve usability**, and **analytical depth**.
1212

13-
### Why wide tables work better
13+
There's no one-size-fits-all answer—the right approach depends on your team's needs and how you want to empower your users.
1414

15-
Modern columnar data warehouses (like Snowflake, BigQuery, and Redshift) are optimized for wide table formats. The star schema was initially introduced to optimize performance for row-based data warehouses, but with today's columnar warehouses, wide and flat is the way to go.
15+
### Wide tables vs. multi-join models
16+
17+
| Factor | Wide tables | Multi-join models |
18+
|--------|-------------|-------------------|
19+
| **Performance** | Generally faster—materialized at the grain you need for specific analyses | Joins computed at query time; mitigated by fanout protection |
20+
| **Self-serve usability** | Easier for business users—all fields in one place, less overwhelming | Can be intimidating—10 joined tables on one page is a lot for most users |
21+
| **Analytical depth** | May require multiple wide tables for different use cases | More flexible for complex analyses across multiple dimensions |
22+
23+
### When wide tables work best
24+
25+
Wide tables are ideal when you want to **empower self-serve analytics**. Business users can navigate the "New chart" page without confusion, and all related fields appear together in the Lightdash sidebar.
1626

1727
Wide tables offer several advantages:
1828

@@ -21,6 +31,24 @@ Wide tables offer several advantages:
2131
- **Simpler to understand**: End users don't need to understand complex relationships between multiple tables
2232
- **More accurate [AI agents](/guides/ai-agents)**: AI agents have more context when working with wide tables, so they provide more accurate answers
2333

34+
Modern columnar data warehouses (like Snowflake, BigQuery, and Redshift) are optimized for wide table formats. The star schema was initially introduced to optimize performance for row-based data warehouses, but with today's columnar warehouses, wide and flat often performs better.
35+
36+
### When multi-join models are necessary
37+
38+
Some analyses genuinely require multiple joins. For example, analyzing Bookings while filtering by an upstream dimension (like organic SEO traffic source) and breaking down by a downstream dimension (like a specific location) will likely require joins to be correct.
39+
40+
Lightdash handles multi-join explores well technically—[fanout protection](/references/joins#many-to-many-or-one-to-many-with-fanout-protection) ensures accuracy. However, looking at a page with many joined tables can be overwhelming for business users who aren't familiar with the data model.
41+
42+
### Common approaches
43+
44+
Most Lightdash customers land on one of these patterns:
45+
46+
- **Wide tables for self-serve**: Build wide, materialized tables for common business use cases. This is the most popular approach when the goal is enabling business users to explore data independently.
47+
48+
- **Both pathways**: Maintain both wide tables (for business users) and complex multi-join explores (for power users or the data team). You can [hide the complex version](/references/workspace/user-attributes) from business users to reduce confusion.
49+
50+
- **Multi-join only**: Some teams use only the normalized, multi-join approach. This works well technically, but means the data team (or AI agents) often remain a bottleneck since the ad hoc query page can be intimidating for business users. AI agents are getting better at handling multi-join explores, so this may become less of a concern over time.
51+
2452
### How to implement wide tables
2553

2654
If your data is already modeled in a star schema upstream, you can maintain that structure in your transformation layer, then combine the models into wide tables that you surface in the BI layer.
@@ -59,27 +87,17 @@ models/
5987

6088
While Lightdash supports having all model definitions in a single `schema.yml` file at the directory level, we've found that separate files per model scales better as your project grows.
6189

62-
## What about star schema?
63-
64-
While we recommend wide flat tables, **we do support joins in Lightdash** and via [AI agents](/guides/ai-agents), so you have the flexibility to build out your semantic layer in a way that works best for your team.
65-
66-
If you're using a star schema, keep in mind:
6790

68-
- Fields get split into multiple sections in the Lightdash sidebar, which can be less intuitive for business users
69-
- Cross-model references in underlying values become more complex to manage
70-
- Now that Lightdash has fanout protection, the main performance concern with joins is mitigated
71-
72-
One approach is to maintain your star schema upstream for data modeling purposes, then materialize wide summary tables for specific business use cases as needed. This gives you the best of both worlds: clean data modeling practices upstream and optimized tables for BI consumption.
7391

7492
## Optimizing query performance and warehouse costs
7593

76-
All Lightdash queries run against your data warehouse. Beyond using wide, flat tables (covered above), these additional strategies help improve performance and reduce costs.
94+
All Lightdash queries run against your data warehouse. These strategies help improve performance and reduce costs.
7795

7896
| Strategy | Performance impact | Cost impact |
7997
|----------|-------------------|-------------|
8098
| [Materialize as tables](#materialize-models-as-tables) | High | High |
8199
| [Index and partition data](#index-and-partition-your-data) | High | High |
82-
| [Minimize joins](#minimize-joins-at-query-time) | High | Medium |
100+
| [Use pre-aggregates](#use-pre-aggregates) | High | High |
83101
| [Enable caching](#leverage-caching) | Medium | High |
84102
| [Limit exposed models](#limit-models-exposed-to-the-bi-layer) | Low | Medium |
85103
| [Monitor usage](#monitor-query-usage) || Visibility |
@@ -132,9 +150,18 @@ Best practices:
132150
- Cluster by columns frequently used in `WHERE` clauses or `GROUP BY`
133151
- Review your warehouse's query history to identify high-cost queries that could benefit from partitioning
134152

135-
### Minimize joins at query time
153+
### Use pre-aggregates
154+
155+
Pre-aggregates are summary tables that compute metrics at a coarser grain ahead of time. They're separate from the wide-vs-normalized architecture decision, but they can provide significant performance improvements regardless of which modeling approach you choose.
156+
157+
For example, if users frequently query daily revenue by region, a pre-aggregated `daily_revenue_by_region` table will be much faster than computing it from raw transaction data at query time.
158+
159+
Pre-aggregates are especially useful for:
160+
- Dashboard queries that aggregate large datasets
161+
- Commonly used metric combinations
162+
- Time-series data at standard intervals (daily, weekly, monthly)
136163

137-
Pre-join data in your dbt models rather than joining at query time. As discussed in [wide, flat tables](#use-wide-flat-tables-in-the-bi-layer), this approach outperforms runtime joins.
164+
The trade-off is maintenance overhead—you need to keep pre-aggregates in sync with your source data and ensure users understand when to use them vs. the detailed tables.
138165

139166
### Leverage caching
140167

0 commit comments

Comments
 (0)