Skip to content

Commit 9c3de31

Browse files
authored
[refactor](next) opt key features (#3634)
1 parent bc2589c commit 9c3de31

54 files changed

Lines changed: 1017 additions & 436 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs-next/key-features/analytic-functions.mdx

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,42 +20,44 @@ featureCard:
2020
- performance
2121
---
2222

23-
> **TL;DR** Analytic functions (window functions) compute a value for every row of a result set without collapsing rows the way `GROUP BY` does. Doris supports the standard set, `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`, `LAG`, `LEAD`, `FIRST_VALUE`, `LAST_VALUE`, `NTH_VALUE`, `PERCENT_RANK`, `CUME_DIST`, plus every aggregate (`SUM`, `AVG`, `COUNT`, `MIN`, `MAX`) used with `OVER`. The Nereids optimizer also rewrites `WHERE row_num <= K` into a partition-aware Top-N below the window, so a "top 10 per category" query never sorts the whole partition.
23+
> **TL;DR** Apache Doris analytic functions (window functions) compute a value for every row of a result set without collapsing rows the way `GROUP BY` does. Apache Doris supports the standard set, `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`, `LAG`, `LEAD`, `FIRST_VALUE`, `LAST_VALUE`, `NTH_VALUE`, `PERCENT_RANK`, `CUME_DIST`, plus every aggregate (`SUM`, `AVG`, `COUNT`, `MIN`, `MAX`) used with `OVER`. The Nereids optimizer also rewrites `WHERE row_num <= K` into a partition-aware Top-N below the window, so a "top 10 per category" query never sorts the whole partition.
2424
2525
![Apache Doris Analytic Functions: SQL window functions in Doris rank, accumulate, lag, and average across rows; the planner pushes Top-N filters down into the window operator.](/images/next/key-features/analytic-functions.jpg)
26-
## Why it matters {#why}
26+
## Why use analytic functions in Apache Doris? {#why}
2727

28-
Three questions show up in almost every analytics workload, and `GROUP BY` answers none of them on its own.
28+
Apache Doris analytic functions answer three questions that show up in almost every analytics workload and that `GROUP BY` cannot answer on its own.
2929

3030
- "Number every order per customer in time order, then keep the first 10." A self-join on `(customer_id, order_date)` works but reads the table twice.
3131
- "What was the running total of sales by day, and the 7-day moving average?" `GROUP BY` collapses the rows you still want to see.
3232
- "How does each row compare to the previous row?" Year-over-year, period-over-period, gap-between-events, this is the lag/lead pattern, and writing it without window functions means a self-join with an offset predicate that the planner cannot optimize.
3333

34-
Analytic functions answer all three in one pass over the data, with no self-join and the original rows preserved.
34+
Apache Doris analytic functions answer all three in one pass over the data, with no self-join and the original rows preserved.
3535

36-
## What it is {#what}
36+
## What are Apache Doris analytic functions? {#what}
3737

38-
An analytic function is a SQL function that, for each input row, computes a value over a *window* of related rows defined by an `OVER` clause. The window can be the whole partition, a fixed range around the current row, or anything in between. The output has the same number of rows as the input.
38+
An Apache Doris analytic function is a SQL function that, for each input row, computes a value over a *window* of related rows defined by an `OVER` clause. The window can be the whole partition, a fixed range around the current row, or anything in between. The output has the same number of rows as the input.
3939

4040
```
4141
function(args) OVER ( [PARTITION BY ...] [ORDER BY ...] [<frame>] )
4242
```
4343

4444
**Key terms**
4545

46-
- **`OVER` clause**: tells Doris this call is a window function rather than a regular aggregate. Required.
46+
- **`OVER` clause**: tells Apache Doris this call is a window function rather than a regular aggregate. Required.
4747
- **`PARTITION BY`**: splits the input into independent groups. Each partition is computed on its own. Different from table partitions, this is a runtime concept.
4848
- **`ORDER BY` (inside `OVER`)**: orders rows within each partition. `LAG`, `LEAD`, `ROW_NUMBER`, `RANK`, and any frame with `PRECEDING`/`FOLLOWING` need it.
49-
- **Window frame**: the slice of the partition the function reads for the current row. Doris supports `ROWS BETWEEN ... PRECEDING/FOLLOWING/CURRENT ROW/UNBOUNDED ...` and a restricted form of `RANGE`.
49+
- **Window frame**: the slice of the partition the function reads for the current row. Apache Doris supports `ROWS BETWEEN ... PRECEDING/FOLLOWING/CURRENT ROW/UNBOUNDED ...` and a restricted form of `RANGE`.
5050
- **PartitionTopN**: an internal operator the planner inserts when it can prove a `WHERE rank <= K` filter only needs the top K rows per partition. Source: `CreatePartitionTopNFromWindow.java`.
5151

52-
## How it works {#how}
52+
## How do Apache Doris analytic functions work? {#how}
53+
54+
Apache Doris analytic functions run through a five-step pipeline: plan, shuffle, sort, evaluate, and push down.
5355

5456
1. **Plan.** The optimizer parses the `OVER` clause into a `WindowExpression`, normalizes the frame (`CheckAndStandardizeWindowFunctionAndFrame`), and groups window calls that share the same `PARTITION BY` plus `ORDER BY` into a single physical window operator. Calls that share a partition and order do not pay for an extra sort.
5557
2. **Shuffle by partition.** If `PARTITION BY` is present, the engine shuffles rows so all rows for the same partition land on the same backend. With no `PARTITION BY`, the whole window runs in a single pipeline (the parallelism upper bound).
5658
3. **Sort within partition.** Each backend sorts its partitions by the `ORDER BY` columns. Ties produce a non-deterministic row order unless the `ORDER BY` is unique, which is why the docs warn that `SUM() OVER (ORDER BY date_col)` can return different results on tied dates.
57-
4. **Evaluate per row.** Doris walks each partition once, maintaining the window frame as it goes. Ranking functions emit one integer per row; aggregate functions over `ROWS UNBOUNDED PRECEDING` keep a running total; sliding-window aggregates add the new row and drop the row that fell off the back.
58-
5. **Push down filters and Top-N.** `CreatePartitionTopNFromWindow` turns `WHERE row_number() OVER (PARTITION BY a ORDER BY b) <= K` into a `PartitionTopN(K)` operator below the window, so each partition only carries the top K rows into the window operator. `PushDownFilterThroughWindow` lifts filters on `PARTITION BY` columns past the window, so Doris filters before sorting instead of after.
59+
4. **Evaluate per row.** Apache Doris walks each partition once, maintaining the window frame as it goes. Ranking functions emit one integer per row; aggregate functions over `ROWS UNBOUNDED PRECEDING` keep a running total; sliding-window aggregates add the new row and drop the row that fell off the back.
60+
5. **Push down filters and Top-N.** `CreatePartitionTopNFromWindow` turns `WHERE row_number() OVER (PARTITION BY a ORDER BY b) <= K` into a `PartitionTopN(K)` operator below the window, so each partition only carries the top K rows into the window operator. `PushDownFilterThroughWindow` lifts filters on `PARTITION BY` columns past the window, so Apache Doris filters before sorting instead of after.
5961

6062
## Quick start {#quick-start}
6163

@@ -92,11 +94,13 @@ FROM orders;
9294

9395
One pass, three windowed columns, no self-joins. The original five rows are preserved. `LAG(..., 1, 0)` returns `0` instead of `NULL` for the first row of each partition.
9496

95-
## When to use it {#when}
97+
## When should you use Apache Doris analytic functions? {#when}
98+
99+
Apache Doris analytic functions fit any row-preserving computation that depends on neighboring rows, including Top-N per group, running totals, lag/lead comparisons, percentile bucketing, and share-of-total ratios.
96100

97101
**Good fit**
98102

99-
- Top-N per group: "top 10 orders per customer," "best-selling product per region." Add a `WHERE rn <= 10` filter and Doris pushes a partition Top-N below the window.
103+
- Top-N per group: "top 10 orders per customer," "best-selling product per region." Add a `WHERE rn <= 10` filter and Apache Doris pushes a partition Top-N below the window.
100104
- Running totals, moving averages, and centered moving averages with `ROWS BETWEEN n PRECEDING AND m FOLLOWING`.
101105
- Year-over-year, day-over-day, and gap-between-events analysis with `LAG` and `LEAD`. One pass over the table replaces a self-join.
102106
- Bucketing for percentile or quartile reports with `NTILE`.
@@ -105,7 +109,7 @@ One pass, three windowed columns, no self-joins. The original five rows are pres
105109
**Not a good fit**
106110

107111
- A pure aggregation that collapses rows. `SELECT category, SUM(amount) FROM t GROUP BY category` does the same work without sorting and without keeping every row in memory. Reach for `GROUP BY` first, and only switch to `OVER` when you also need the unaggregated columns.
108-
- `RANGE` frames with numeric offsets, like `RANGE BETWEEN 5 PRECEDING AND CURRENT ROW`. Doris's `RANGE` frame is restricted to `UNBOUNDED` boundaries or `CURRENT ROW`; arbitrary `RANGE n PRECEDING/FOLLOWING` is not supported. Use `ROWS` if you need a numeric offset.
112+
- `RANGE` frames with numeric offsets, like `RANGE BETWEEN 5 PRECEDING AND CURRENT ROW`. The Apache Doris `RANGE` frame is restricted to `UNBOUNDED` boundaries or `CURRENT ROW`; arbitrary `RANGE n PRECEDING/FOLLOWING` is not supported. Use `ROWS` if you need a numeric offset.
109113
- A window with no `PARTITION BY`. The whole result set lands on one pipeline, and that pipeline becomes the bottleneck on large inputs. Add a partition key whenever the workload allows it.
110114
- `ORDER BY` on a non-unique column when you care about deterministic output. `SUM(x) OVER (ORDER BY day)` can return different cumulative totals across runs when several rows share the same day. Add a tie-breaker; see [Window Functions Overview](../sql-manual/sql-functions/window-functions/overview).
111115
- Recomputing the same window result on every refresh. If the same `ROW_NUMBER()` query runs every minute against a slow-moving table, an [async materialized view](../query-acceleration/materialized-view/async-materialized-view/overview) with a partial refresh is cheaper than re-windowing each time.
@@ -118,3 +122,4 @@ One pass, three windowed columns, no self-joins. The original five rows are pres
118122
- [Pipeline Execution Engine: how partitioned windows get parallelized across BEs](./pipeline-execution-engine)
119123
- [Vectorized Execution](./vectorized-execution): the engine that runs each window's batch through SIMD-accelerated operators.
120124
- [Async Materialized View: precompute window results that re-run every minute](../query-acceleration/materialized-view/async-materialized-view/overview)
125+
- [MPP Architecture](./mpp): how window/analytic operators are shuffled and partitioned across BEs.

docs-next/key-features/batch-load.mdx

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,22 +20,22 @@ featureCard:
2020
- bulk-ingestion
2121
---
2222

23-
> **TL;DR** Batch Load is how you move large files from S3, HDFS, or another warehouse into Doris in one shot. You submit a `LOAD LABEL` statement, the FE plans the work and fans it out to BEs, and you check progress with `SHOW LOAD`. The job is asynchronous, so the client can disconnect; the label is what lets you find the result and what dedupes a retry. For [external catalogs](./multi-catalog) and TVF reads, the synchronous `INSERT INTO ... SELECT` form covers the same ground with simpler ETL.
23+
> **TL;DR** Apache Doris Batch Load moves large files from S3, HDFS, or another warehouse into Doris in one shot. You submit a `LOAD LABEL` statement, the FE plans the work and fans it out to BEs, and you check progress with `SHOW LOAD`. The job is asynchronous, so the client can disconnect; the label is what lets you find the result and what dedupes a retry. For [external catalogs](./multi-catalog) and TVF reads, the synchronous `INSERT INTO ... SELECT` form covers the same ground with simpler ETL.
2424
2525
![Apache Doris Batch Load: Asynchronous bulk ingestion from S3, HDFS, and external catalogs into Doris via LOAD LABEL or INSERT INTO SELECT, with label-based dedup.](/images/next/key-features/batch-load.jpg)
26-
## Why it matters {#why}
26+
## Why use batch load in Apache Doris? {#why}
2727

28-
Real-time loaders are the right tool for events trickling in, but they fall apart when you need to move a TB of historical data, or rebuild a fact table from a Hive snapshot, or backfill last quarter from Parquet on S3. A [Stream Load](./stream-load) over HTTP times out. A per-row `INSERT` would take days. The job has to run server-side, retry on its own, and survive the client losing its connection.
28+
Apache Doris Batch Load is the right tool when a single ingestion job runs for hours and the warehouse, not the client, has to own its lifecycle. Real-time loaders are the right tool for events trickling in, but they fall apart when you need to move a TB of historical data, or rebuild a fact table from a Hive snapshot, or backfill last quarter from Parquet on S3. A [Stream Load](./stream-load) over HTTP times out. A per-row `INSERT` would take days. The job has to run server-side, retry on its own, and survive the client losing its connection.
2929

3030
- A nightly ETL has to land a few hundred GB of Parquet from S3 before the morning dashboards refresh.
3131
- A migration from Snowflake or Hive needs to copy whole partitions, with column-level filters and type casts.
3232
- A backfill of one missing day cannot block the connection that submitted it.
3333

3434
Batch Load handles all three by treating large bulk ingestion as a long-running, label-tracked job that the warehouse owns end to end.
3535

36-
## What it is {#what}
36+
## What is Apache Doris batch load? {#what}
3737

38-
Batch Load is the family of asynchronous bulk-load methods in Doris. The main entry point is the `LOAD LABEL ... WITH S3|HDFS|BROKER` statement, historically called Broker Load, which today covers S3, HDFS, and any storage system reachable through a Broker process. The synchronous `INSERT INTO ... SELECT` variant covers the same ingestion needs when the source is an external catalog (Hive, Iceberg, JDBC) or a file behind a TVF such as `S3()` or `HDFS()`. Both share the same transaction model as the rest of Doris loading: one label per job, atomic commit, replica-level publish.
38+
Apache Doris Batch Load is the family of asynchronous bulk-load methods in Doris. The main entry point is the `LOAD LABEL ... WITH S3|HDFS|BROKER` statement, historically called Broker Load, which today covers S3, HDFS, and any storage system reachable through a Broker process. The synchronous `INSERT INTO ... SELECT` variant covers the same ingestion needs when the source is an external catalog (Hive, Iceberg, JDBC) or a file behind a TVF such as `S3()` or `HDFS()`. Both share the same transaction model as the rest of Doris loading: one label per job, atomic commit, replica-level publish.
3939

4040
**Key terms**
4141

@@ -45,7 +45,9 @@ Batch Load is the family of asynchronous bulk-load methods in Doris. The main en
4545
- **TVF (Table-Valued Function)**: `S3(...)`, `HDFS(...)`, `LOCAL(...)`, `iceberg_meta(...)`, and the like. They expose files or external metadata as a table you can `SELECT` from, then pipe into `INSERT INTO ... SELECT` for synchronous ingestion.
4646
- **`max_filter_ratio`**: the per-job tolerance for malformed rows. Default is zero, so a single bad row cancels the load.
4747

48-
## How it works {#how}
48+
## How does Apache Doris batch load work? {#how}
49+
50+
Apache Doris Batch Load registers a labeled job on the FE, parallelizes file scans across BEs, and atomically commits the result once every replica publishes the new version.
4951

5052
1. **Submit and label.** A `LOAD LABEL my_db.daily_orders ...` request lands on the FE. The FE registers the label, opens a transaction, and the load enters `PENDING`. Resubmitting the same label inside the retention window short-circuits to the original job.
5153
2. **Plan and fan out.** The FE estimates file sizes (one BE handles between `min_bytes_per_broker_scanner` and `max_bytes_per_broker_scanner`, default 64 MB to 500 GB), splits the work, and dispatches scanner tasks to BEs. Job state moves through `LOADING`.
@@ -87,7 +89,9 @@ PROPERTIES ("timeout" = "3600", "max_filter_ratio" = "0.01");
8789

8890
`SHOW LOAD ORDER BY CreateTime DESC LIMIT 1` returns when the job is `FINISHED`. Up to 1% of malformed rows are tolerated; anything more cancels the job and the URL field points at a sample of the offending rows.
8991

90-
## When to use it {#when}
92+
## When should you use Apache Doris batch load? {#when}
93+
94+
Use Apache Doris Batch Load when a single ingestion run handles tens of GB or more from object storage, an external warehouse, or a Hive-partitioned directory.
9195

9296
**Good fit**
9397

0 commit comments

Comments
 (0)