Skip to content

Commit 73ffa5d

Browse files
authored
[refactor](next) performance tuning (#3603)
1 parent 91b8bd5 commit 73ffa5d

70 files changed

Lines changed: 2229 additions & 1734 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
{
3+
"title": "Caching",
4+
"language": "en",
5+
"description": "Navigate Apache Doris caching capabilities: SQL Cache for query result reuse, Condition Cache for repeated filter acceleration, and Data Cache for external table file caching."
6+
}
7+
---
8+
9+
import GettingStartedCard from '@site/src/components/getting-started-card/getting-started-card';
10+
11+
Apache Doris provides multiple layers of caching to accelerate queries: result-level caching reuses query outputs across identical SQL, segment-level caching reuses filter evaluations across queries, and file-level caching brings remote lakehouse data closer to the compute layer. Pick the cache that matches your workload pattern.
12+
13+
## Result & Filter Cache
14+
15+
<div className="cards-grid">
16+
<GettingStartedCard
17+
title="SQL Cache"
18+
description="Cache full query results keyed by SQL text and table versions to skip repeated computation in T+1 and low-update scenarios."
19+
link="sql-cache-manual"
20+
/>
21+
22+
<GettingStartedCard
23+
title="Condition Cache"
24+
description="Cache the filtering result of a condition on each segment so subsequent queries with the same predicate can skip redundant scans and filtering."
25+
link="condition-cache"
26+
/>
27+
</div>
28+
29+
## External Table File Cache
30+
31+
<div className="cards-grid">
32+
<GettingStartedCard
33+
title="Data Cache"
34+
description="Cache files from HDFS and object storage on local disks to accelerate Hive, Iceberg, Hudi, and Paimon queries with cache warmup and admission control."
35+
link="../lakehouse/data-cache"
36+
/>
37+
</div>
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
{
3+
"title": "Distinct Counts",
4+
"language": "en",
5+
"description": "Navigate Apache Doris distinct count solutions: precise deduplication with BITMAP and approximate deduplication with HLL."
6+
}
7+
---
8+
9+
import GettingStartedCard from '@site/src/components/getting-started-card/getting-started-card';
10+
11+
Distinct count (deduplication) is one of the most resource-intensive operations in analytics. Apache Doris provides two purpose-built data types to replace `COUNT DISTINCT` with far lower memory and latency cost: choose **BITMAP** when you need exact results, and **HLL** when a 1%–2% error is acceptable in exchange for even smaller storage.
12+
13+
## Precise Deduplication
14+
15+
<div className="cards-grid">
16+
<GettingStartedCard
17+
title="BITMAP Precise Deduplication"
18+
description="Replace COUNT DISTINCT with the BITMAP type for exact deduplication, faster queries, and lower memory and disk usage."
19+
link="bitmap-precise-deduplication"
20+
/>
21+
</div>
22+
23+
## Approximate Deduplication
24+
25+
<div className="cards-grid">
26+
<GettingStartedCard
27+
title="HLL Approximate Deduplication"
28+
description="Use HyperLogLog for high-cardinality UV and distinct-count workloads with controlled 1%–2% error and minimal storage footprint."
29+
link="hll-approximate-deduplication"
30+
/>
31+
</div>
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
{
3+
"title": "High Concurrency & Point Queries",
4+
"language": "en",
5+
"description": "Navigate Apache Doris capabilities for high-concurrency, low-latency workloads: primary-key point query optimization with row store and short-circuit execution, and dictionary tables for in-memory key-value lookups."
6+
}
7+
---
8+
9+
import GettingStartedCard from '@site/src/components/getting-started-card/getting-started-card';
10+
11+
Apache Doris is built for analytical scans, but with the right table design and execution paths it can also serve high-QPS point queries and key-value lookups. Use primary-key point query optimization for short-path retrieval on Unique tables, and dictionary tables to replace dimension-table joins with in-memory lookups.
12+
13+
## Point Queries
14+
15+
<div className="cards-grid">
16+
<GettingStartedCard
17+
title="High-Concurrency Point Query"
18+
description="Enable row store, short-circuit execution, and PreparedStatement on Unique-key tables to push primary-key lookups to high QPS with low latency."
19+
link="high-concurrent-point-query"
20+
/>
21+
</div>
22+
23+
## Key-Value Lookup
24+
25+
<div className="cards-grid">
26+
<GettingStartedCard
27+
title="Dictionary Table"
28+
description="Pre-load dimension columns into memory and replace LEFT OUTER JOIN with dict_get function calls for fast key-value lookups in JOIN-heavy workloads."
29+
link="dictionary"
30+
/>
31+
</div>

docs-next/query-acceleration/hints/hints-overview.md

Lines changed: 0 additions & 90 deletions
This file was deleted.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
{
3+
"title": "Join Optimization",
4+
"language": "en",
5+
"description": "Navigate Apache Doris Join optimization techniques: Colocation Join, Distribute Hint for shuffle/broadcast, and Leading Hint for join order."
6+
}
7+
---
8+
9+
import GettingStartedCard from '@site/src/components/getting-started-card/getting-started-card';
10+
11+
Apache Doris adaptively optimizes most Join queries out of the box, but for performance-critical scenarios you can guide the planner with table colocation and hints. Start with Colocation Join to remove network shuffle for bucket-aligned tables, then use Distribute and Leading hints to fine-tune shuffle method and join order when the optimizer's choice is suboptimal.
12+
13+
## Colocation
14+
15+
<div className="cards-grid">
16+
<GettingStartedCard
17+
title="Colocation Join"
18+
description="Co-locate data of related tables on the same BE so bucket-column joins run locally, eliminating cross-node shuffle for the most demanding multi-table joins."
19+
link="colocation-join"
20+
/>
21+
</div>
22+
23+
## Hint-Based Tuning
24+
25+
<div className="cards-grid">
26+
<GettingStartedCard
27+
title="Adjusting Join Shuffle Mode"
28+
description="Use the [shuffle] and [broadcast] Distribute Hints to override the optimizer's data distribution choice for the right table in a Join."
29+
link="tuning/tuning-plan/adjusting-join-shuffle"
30+
/>
31+
32+
<GettingStartedCard
33+
title="Reordering Join with Leading Hint"
34+
description="Use Leading Hint to manually specify the join order of multiple tables — left-deep, right-deep, or bushy — when CBO does not pick the ideal plan."
35+
link="tuning/tuning-plan/reordering-join-with-leading-hint"
36+
/>
37+
</div>

docs-next/query-acceleration/materialized-view/async-materialized-view/use-advice.md

Lines changed: 0 additions & 144 deletions
This file was deleted.

0 commit comments

Comments
 (0)