Skip to content

Commit 9f54c3b

Browse files
authored
Merge pull request #5906 from ClickHouse/blog-to-docs-async-inserts
Blog to Docs: Async inserts
2 parents bd292af + 7684dbb commit 9f54c3b

File tree

1 file changed

+51
-12
lines changed

1 file changed

+51
-12
lines changed

docs/best-practices/_snippets/_async_inserts.md

Lines changed: 51 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,35 +13,55 @@ The core behavior is controlled via the [`async_insert`](/operations/settings/se
1313

1414
<Image img={async_inserts} size="lg" alt="Async inserts"/>
1515

16-
When enabled (1), inserts are buffered and only written to disk once one of the flush conditions is met:
16+
Asynchronous inserts are supported over both the HTTP and native TCP interfaces.
1717

18-
(1) the buffer reaches a specified size (async_insert_max_data_size)
19-
(2) a time threshold elapses (async_insert_busy_timeout_ms) or
20-
(3) a maximum number of insert queries accumulate (async_insert_max_query_number).
18+
When enabled (`async_insert = 1`), inserts are buffered and only written to disk once one of the flush conditions is met:
19+
20+
- The buffer reaches a specified data size ([`async_insert_max_data_size`](/operations/settings/settings#async_insert_max_data_size), default 10 MiB).
21+
- A time threshold elapses ([`async_insert_busy_timeout_ms`](/operations/settings/settings#async_insert_busy_timeout_max_ms), default 200 ms or 1000 ms on Cloud).
22+
- A maximum number of insert queries accumulate ([`async_insert_max_query_number`](/operations/settings/settings#async_insert_max_query_number), default 450).
23+
24+
Whichever threshold is reached first triggers the flush.
2125

2226
This batching process is invisible to clients and helps ClickHouse efficiently merge insert traffic from multiple sources. However, until a flush occurs, the data can't be queried. Importantly, there are multiple buffers per insert shape and settings combination, and in clusters, buffers are maintained per node—enabling fine-grained control across multi-tenant environments. Insert mechanics are otherwise identical to those described for [synchronous inserts](/best-practices/selecting-an-insert-strategy#synchronous-inserts-by-default).
2327

2428
### Choosing a return mode {#choosing-a-return-mode}
2529

26-
The behavior of asynchronous inserts is further refined using the [`wait_for_async_insert`](/operations/settings/settings#wait_for_async_insert) setting.
30+
The behavior of asynchronous inserts is further refined using the [`wait_for_async_insert`](/operations/settings/settings#wait_for_async_insert) setting.
2731

28-
When set to 1 (the default), ClickHouse only acknowledges the insert after the data is successfully flushed to disk. This ensures strong durability guarantees and makes error handling straightforward: if something goes wrong during the flush, the error is returned to the client. This mode is recommended for most production scenarios, especially when insert failures must be tracked reliably.
32+
When set to 1 (the default), ClickHouse only acknowledges the insert after the data is successfully flushed to disk. This ensures strong durability guarantees and makes error handling straightforward: if something goes wrong during the flush, the error is returned to the client. This mode is recommended for most production scenarios, especially when insert failures must be tracked reliably.
2933

3034
[Benchmarks](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) show it scales well with concurrency—whether you're running 200 or 500 clients—thanks to adaptive inserts and stable part creation behavior.
3135

32-
Setting `wait_for_async_insert = 0` enables "fire-and-forget" mode. Here, the server acknowledges the insert as soon as the data is buffered, without waiting for it to reach storage.
36+
Setting `wait_for_async_insert = 0` enables "fire-and-forget" mode. Here, the server acknowledges the insert as soon as the data is buffered, without waiting for it to reach storage.
3337

34-
This offers ultra-low-latency inserts and maximal throughput, ideal for high-velocity, low-criticality data. However, this comes with trade-offs: there's no guarantee the data will be persisted, errors may only surface during flush, and it's difficult to trace failed inserts. Use this mode only if your workload can tolerate data loss.
38+
This offers ultra-low-latency inserts and maximal throughput, ideal for high-velocity, low-criticality data. However, this comes with trade-offs: there's no guarantee the data will be persisted, errors only surface during flush, and there is no dead-letter queue for failed inserts — tracing failures requires inspecting server logs and system tables after the fact. Use this mode only if your workload can tolerate data loss.
3539

3640
[Benchmarks also demonstrate](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse) substantial part reduction and lower CPU usage when buffer flushes are infrequent (e.g. every 30 seconds), but the risk of silent failure remains.
3741

38-
Our strong recommendation is to use `async_insert=1,wait_for_async_insert=1` if using asynchronous inserts. Using `wait_for_async_insert=0` is very risky because your INSERT client may not be aware if there are errors, and also can cause potential overload if your client continues to write quickly in a situation where the ClickHouse server needs to slow down the writes and create some backpressure in order to ensure reliability of the service.
42+
Our strong recommendation is to use `async_insert=1,wait_for_async_insert=1` if using asynchronous inserts. Using `wait_for_async_insert=0` is very risky because your INSERT client may not be aware if there are errors, and also can cause potential overload if your client continues to write quickly in a situation where the ClickHouse server needs to slow down the writes and create some backpressure to ensure reliability of the service.
43+
44+
### Adaptive async inserts {#adaptive-async-inserts}
45+
46+
Since version 24.2, ClickHouse uses adaptive flush timeouts by default ([`async_insert_use_adaptive_busy_timeout`](/operations/settings/settings#async_insert_use_adaptive_busy_timeout)). Instead of a fixed flush interval, the timeout dynamically adjusts between a minimum ([`async_insert_busy_timeout_min_ms`](/operations/settings/settings#async_insert_busy_timeout_min_ms), default 50 ms) and maximum ([`async_insert_busy_timeout_max_ms`](/operations/settings/settings#async_insert_busy_timeout_max_ms), default 200 ms or 1000 ms on Cloud) based on incoming data rate.
47+
48+
When data arrives frequently, the timeout stays closer to the minimum to flush sooner and reduce end-to-end latency. When data is sparse, it grows toward the maximum to accumulate larger batches. This is especially useful in default mode (`wait_for_async_insert=1`), where a fixed high timeout would force clients to block for the full interval even when data is ready to flush.
49+
50+
### Error handling {#error-handling}
51+
52+
Schema validation and data parsing happen during buffer flush, not when the insert is received. If any row in an insert query has a parsing or type error, **none of the data from that query is flushed** — the entire query's payload is rejected. In default mode (`wait_for_async_insert=1`), the error is returned to the client. In fire-and-forget mode, errors are written to server logs and the [`system.asynchronous_inserts`](/operations/system-tables/asynchronous_inserts) table.
53+
54+
Each flush creates at least one part per distinct partition key value in the buffer. Even for tables without a partition key, a single flush can produce multiple parts if the buffered data exceeds [`max_insert_block_size`](/operations/settings/settings#max_insert_block_size) (default ~1 million rows).
55+
56+
:::note
57+
Despite using async inserts, you can still encounter ["too many parts"](/knowledgebase/exception-too-many-parts) errors if the partitioning key has high cardinality.
58+
:::
3959

4060
### Deduplication and reliability {#deduplication-and-reliability}
4161

42-
By default, ClickHouse performs automatic deduplication for synchronous inserts, which makes retries safe in failure scenarios. However, this is disabled for asynchronous inserts unless explicitly enabled (this shouldn't be enabled if you have dependent materialized views[see issue](https://github.com/ClickHouse/ClickHouse/issues/66003)).
62+
By default, ClickHouse performs automatic deduplication for synchronous inserts, which makes retries safe in failure scenarios. However, this is disabled for asynchronous inserts unless explicitly enabled (this shouldn't be enabled if you have dependent materialized views[see issue](https://github.com/ClickHouse/ClickHouse/issues/66003)).
4363

44-
In practice, if deduplication is turned on and the same insert is retrieddue to, for instance, a timeout or network dropClickHouse can safely ignore the duplicate. This helps maintain idempotency and avoids double-writing data. Still, it's worth noting that insert validation and schema parsing happen only during buffer flush—so errors (like type mismatches) will only surface at that point.
64+
In practice, if deduplication is turned on and the same insert is retrieddue to, for instance, a timeout or network dropClickHouse can safely ignore the duplicate. This helps maintain idempotency and avoids double-writing data.
4565

4666
### Enabling asynchronous inserts {#enabling-asynchronous-inserts}
4767

@@ -63,5 +83,24 @@ Asynchronous inserts can be enabled for a particular user, or for a specific que
6383
```
6484

6585
:::note
66-
Asynchronous inserts do not apply to `INSERT INTO ... SELECT` queries. When the insert contains a `SELECT` clause, the query is always executed synchronously regardless of the `async_insert` setting.
86+
Asynchronous inserts don't apply to `INSERT INTO ... SELECT` queries. When the insert contains a `SELECT` clause, the query is always executed synchronously regardless of the `async_insert` setting.
6787
:::
88+
89+
### Flushing buffers on shutdown {#flushing-buffers-on-shutdown}
90+
91+
To flush all pending async insert buffers — for example, during a graceful shutdown or before maintenance — run:
92+
93+
```sql
94+
SYSTEM FLUSH ASYNC INSERT QUEUE
95+
```
96+
97+
This ensures any buffered data is written to storage before the server stops.
98+
99+
### Comparison with buffer tables {#comparison-with-buffer-tables}
100+
101+
Asynchronous inserts are the modern replacement for [Buffer tables](/engines/table-engines/special/buffer). Key differences:
102+
103+
- **No DDL changes required.** Async inserts are transparent — you enable a setting, not create additional tables.
104+
- **Per-shape buffering.** Async inserts maintain separate buffers per unique query shape and settings combination, enabling granular flush policies. Buffer tables use a single buffer per target table.
105+
- **Durability.** In default mode (`wait_for_async_insert=1`), data is confirmed on disk before the client receives acknowledgment. Buffer tables behave like fire-and-forget — buffered data is lost on crash.
106+
- **Cluster behavior.** In clusters, async insert buffers are maintained per node. Buffer tables require explicit creation on each node.

0 commit comments

Comments
 (0)