Skip to content

Commit 161d002

Browse files
committed
Finish docs. Remove setup-guide to let that live in Fivetran side
1 parent d7ad917 commit 161d002

9 files changed

Lines changed: 184 additions & 426 deletions

File tree

docs/integrations/data-ingestion/etl-tools/fivetran/index.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -43,27 +43,32 @@ The destination connector is developed and maintained together by ClickHouse and
4343
</div>
4444

4545
## Key features {#key-features}
46-
47-
- **Automatic schema creation** — destination tables and databases are created automatically based on source schema.
48-
- **History Mode (SCD Type 2)** — preserves complete history of all record versions for point-in-time analysis and audit trails.
49-
- **Retry on network failures** — transient network errors are retried with exponential backoff. Duplicates from retries are handled by `ReplacingMergeTree`.
50-
- **Configurable batch sizes** — tune write, select, mutation, and hard delete batch sizes via a JSON configuration file.
46+
- **ClickHouse Cloud compatible**: use your ClickHouse Cloud database as a Fivetran destination.
47+
- **SaaS deployment model**: fully managed by Fivetran, no need to manage your own infrastructure.
48+
- **History Mode (SCD Type 2)**: preserves complete history of all record versions for point-in-time analysis and audit trails.
49+
- **Configurable batch sizes**: You can adapt Fivetran to your particular use case by tuning write, select, mutation, and hard delete batch sizes via a JSON configuration file.
5150

5251
## Limitations {#limitations}
53-
52+
- Schema migrations is not supported yet, but we are working on it.
5453
- Adding, removing, or modifying primary key columns is not supported.
5554
- Custom ClickHouse settings on `CREATE TABLE` statements are not supported.
56-
- Role-based grants are not fully supported — the connector's grants check only queries direct user grants. Use [direct grants](/integrations/fivetran/troubleshooting#role-based-grants) instead.
55+
- Role-based grants are not fully supported. The connector's grants check only queries direct user grants. Use [direct grants](/integrations/fivetran/troubleshooting#role-based-grants) instead.
5756

5857
## Related pages {#related-pages}
58+
- [Setup Guide](/integrations/fivetran/setup-guide): step-by-step configuration instructions
59+
- [Technical Reference](/integrations/fivetran/reference): type mappings, table engines, metadata columns
60+
- [Troubleshooting & Best Practices](/integrations/fivetran/troubleshooting): common errors and optimization tips
61+
- [ClickHouse Fivetran destination on GitHub](https://github.com/ClickHouse/clickhouse-fivetran-destination)
5962

60-
- [Setup Guide](/integrations/fivetran/setup-guide) — step-by-step configuration instructions
61-
- [Technical Reference](/integrations/fivetran/reference) — type mappings, table engines, metadata columns
62-
- [Troubleshooting & Best Practices](/integrations/fivetran/troubleshooting) — common errors and optimization tips
63+
## Setup guide {#setup-guide}
64+
- If you're looking for configurations and general technical details, please refer to the [technical reference](/integrations/fivetran/reference).
65+
- For a comprehensive guide, check the [setup guide](https://fivetran.com/docs/destinations/clickhouse/setup-guide) in the Fivetran documentation.
6366

64-
## Additional resources {#additional-resources}
67+
## Contact and support {#contact-us}
6568

66-
- [Fivetran ClickHouse destination docs](https://fivetran.com/docs/destinations/clickhouse)
67-
- [Fivetran ClickHouse setup guide](https://fivetran.com/docs/destinations/clickhouse/setup-guide)
68-
- [ClickHouse Fivetran destination on GitHub](https://github.com/ClickHouse/clickhouse-fivetran-destination)
69-
- [ClickHouse Support](/about-us/support)
69+
The ClickHouse Fivetran destination has a split ownership model:
70+
71+
- **ClickHouse** develops and maintains the destination connector code.
72+
- **Fivetran** hosts the connector and is responsible for data movement, pipeline scheduling, and source connectors.
73+
74+
Both Fivetran and ClickHouse provide support for the Fivetran ClickHouse destination. For general inquiries, we recommend reaching out to Fivetran, as they are the experts on the Fivetran platform. For any ClickHouse-specific questions or issues, our support team is happy to help. Create a [support ticket](/about-us/support) to ask a question or report an issue.

docs/integrations/data-ingestion/etl-tools/fivetran/reference.md

Lines changed: 116 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,79 @@ sidebar_label: 'Reference'
33
slug: /integrations/fivetran/reference
44
sidebar_position: 3
55
description: 'Type mappings, table engine details, metadata columns, and debugging queries for the Fivetran ClickHouse destination.'
6-
title: 'Fivetran ClickHouse Destination - Technical Reference'
6+
title: 'Technical Reference'
77
doc_type: 'guide'
88
keywords: ['fivetran', 'clickhouse destination', 'type mapping', 'SharedReplacingMergeTree', 'deduplication', 'FINAL', 'reference']
99
---
1010

11-
# Fivetran ClickHouse Destination - Technical Reference
11+
# Technical Reference
12+
13+
## Setup details {#setup-details}
14+
15+
### User and role management {#user-and-role-management}
16+
17+
Consider not using the `default` user; instead, create a dedicated one to use it with this Fivetran
18+
destination only. The following commands, executed with the `default` user, will create a new `fivetran_user` with the
19+
required privileges.
20+
21+
```sql
22+
CREATE USER fivetran_user IDENTIFIED BY '<password>'; -- use a secure password generator
23+
24+
GRANT CURRENT GRANTS ON *.* TO fivetran_user;
25+
```
26+
27+
Additionally, you can revoke access to certain databases from the `fivetran_user`.
28+
For example, by executing the following statement, we restrict access to the `default` database:
29+
30+
```sql
31+
REVOKE ALL ON default.* FROM fivetran_user;
32+
```
33+
34+
You can execute these statements in the ClickHouse SQL console.
35+
36+
### Advanced configuration {#advanced-configuration}
37+
38+
The ClickHouse Cloud destination supports an optional JSON configuration file for advanced use cases. This file allows you to fine-tune destination behavior by overriding the default settings that control batch sizes, parallelism, connection pools, and request timeouts.
39+
40+
> NOTE: This configuration is entirely optional. If no file is uploaded, the destination uses
41+
> sensible defaults that work well for most use cases.
42+
43+
The file must be valid JSON and conform to the schema described below.
44+
45+
If you need to modify the configuration after the initial setup, you can edit the destination configurations in the Fivetran dashboard and upload an updated file.
46+
47+
The configuration file has a top-level section:
48+
49+
```json
50+
{
51+
"destination_configurations": { ... }
52+
}
53+
```
54+
55+
Inside of it you can specify the following configurations that control the internal behavior of the ClickHouse destination connector itself.
56+
These configurations affect how the connector processes data before sending it to ClickHouse.
57+
58+
| Setting | Type | Default | Allowed Range | Description |
59+
|---------|------|---------|---------------|-------------|
60+
| `write_batch_size` | integer | `100000` | 5,000 – 100,000 | Number of rows per batch for insert, update, and replace operations. |
61+
| `select_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for SELECT queries used during updates. |
62+
| `mutation_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for ALTER TABLE UPDATE mutations in history mode. Lower it if you are experiencing large SQL statements. |
63+
| `hard_delete_batch_size` | integer | `1500` | 200 – 1,500 | Number of rows per batch for hard delete operations in history mode. Lower it if you are experiencing large SQL statements. |
64+
65+
All fields are optional. If a field is not specified, the default value is used.
66+
If a value is outside the allowed range, the destination will report an error during sync.
67+
Unknown fields are silently ignored (a warning is logged) and do not cause errors, which allows forward compatibility when new settings are added.
68+
69+
Example:
70+
71+
```json
72+
{
73+
"destination_configurations": {
74+
"write_batch_size": 50000,
75+
"select_batch_size": 200
76+
}
77+
}
78+
```
1279

1380
## Type transformation mapping {#type-mapping}
1481

@@ -35,13 +102,29 @@ The Fivetran ClickHouse destination maps [Fivetran data types](https://fivetran.
35102
\* BINARY, XML, and JSON are stored as [String](/sql-reference/data-types/string) because ClickHouse's `String` type can represent an arbitrary set of bytes. The destination adds a column comment to indicate the original data type. The ClickHouse [JSON](/sql-reference/data-types/newjson) data type is not used as it was marked as obsolete and never recommended for production usage.
36103
:::
37104

38-
## Destination table structure {#table-structure}
105+
## Destination tables {#table-structure}
106+
107+
The ClickHouse Cloud destination uses
108+
[Replacing](/engines/table-engines/mergetree-family/replacingmergetree) engine type of
109+
[SharedMergeTree](/cloud/reference/shared-merge-tree) family
110+
(specifically, `SharedReplacingMergeTree`), versioned by the `_fivetran_synced` column.
111+
112+
Every column except primary (ordering) keys and Fivetran metadata columns is created
113+
as [Nullable(T)](/sql-reference/data-types/nullable), where `T` is a
114+
ClickHouse Cloud type based on the [data types mapping](#data-types-mapping).
115+
116+
Every destination table includes the following metadata columns:
39117

40-
All destination tables use [SharedReplacingMergeTree](/cloud/reference/shared-merge-tree) versioned by the `_fivetran_synced` column. Every column except primary (ordering) keys and Fivetran metadata columns is created as [Nullable(T)](/sql-reference/data-types/nullable).
118+
| Column | Type | Description |
119+
|--------|------|-------------|
120+
| `_fivetran_synced` | `DateTime64(9, 'UTC')` | Timestamp when the record was synced by Fivetran. Used as the version column for `SharedReplacingMergeTree`. |
121+
| `_fivetran_deleted` | `Bool` | Soft delete marker. Set to `true` when the source record is deleted. |
122+
| `_fivetran_id` | `String` | Auto-generated unique identifier. Only present when the source table has no primary keys. |
41123

42-
### Single primary key {#single-pk}
124+
### Single primary key in the source table {#single-pk}
43125

44-
For a source table `users` with primary key `id` (`INT`) and column `name` (`STRING`):
126+
For example, source table `users` has a primary key column `id` (`INT`) and a regular column `name` (`STRING`).
127+
The destination table will be defined as follows:
45128

46129
```sql
47130
CREATE TABLE `users`
@@ -55,9 +138,15 @@ ORDER BY id
55138
SETTINGS index_granularity = 8192
56139
```
57140

58-
### Multiple primary keys {#multiple-pk}
141+
In this case, the `id` column is chosen as a table sorting key.
142+
143+
### Multiple primary keys in the source table
59144

60-
For a source table `items` with primary keys `id` (`INT`) and `name` (`STRING`), plus column `description` (`STRING`):
145+
If the source table has multiple primary keys, they are used in order of their appearance in the Fivetran source table
146+
definition.
147+
148+
For example, there is a source table `items` with primary key columns `id` (`INT`) and `name` (`STRING`), plus an
149+
additional regular column `description` (`STRING`). The destination table will be defined as follows:
61150

62151
```sql
63152
CREATE TABLE `items`
@@ -72,11 +161,13 @@ ORDER BY (id, name)
72161
SETTINGS index_granularity = 8192
73162
```
74163

75-
Primary keys are used in order of their appearance in the Fivetran source table definition.
164+
In this case, `id` and `name` columns are chosen as table sorting keys.
76165

77-
### No primary keys {#no-pk}
166+
### No primary keys in the source table
78167

79-
When the source table has no primary keys, Fivetran adds a `_fivetran_id` column as the sorting key:
168+
If the source table has no primary keys, a unique identifier will be added by Fivetran as a `_fivetran_id` column.
169+
Consider an `events` table that only has the `event` (`STRING`) and `timestamp` (`LOCALDATETIME`) columns in the source.
170+
The destination table in that case is as follows:
80171

81172
```sql
82173
CREATE TABLE events
@@ -91,88 +182,29 @@ ORDER BY _fivetran_id
91182
SETTINGS index_granularity = 8192
92183
```
93184

94-
## Data deduplication {#deduplication}
185+
Since `_fivetran_id` is unique and there are no other primary key options, it is used as a table sorting key.
186+
187+
### Selecting the latest version of the data without duplicates
95188

96-
`SharedReplacingMergeTree` performs background data deduplication [only during merges at an unknown time](/engines/table-engines/mergetree-family/replacingmergetree). To query the latest version of data without duplicates, use the `FINAL` keyword with [`select_sequential_consistency`](/operations/settings/settings#select_sequential_consistency):
189+
`SharedReplacingMergeTree` performs background data deduplication
190+
[only during merges at an unknown time](/engines/table-engines/mergetree-family/replacingmergetree).
191+
However, selecting the latest version of the data without duplicates ad-hoc is possible with the `FINAL` keyword and
192+
[select_sequential_consistency](/operations/settings/settings#select_sequential_consistency)
193+
setting:
97194

98195
```sql
99196
SELECT *
100197
FROM example FINAL
101-
LIMIT 1000
198+
LIMIT 1000
102199
SETTINGS select_sequential_consistency = 1;
103200
```
104201

105202
See also [Duplicate records with ReplacingMergeTree](/integrations/fivetran/troubleshooting#duplicate-records) in the troubleshooting guide.
106203

107-
## Fivetran metadata columns {#metadata-columns}
204+
## Retries on network failures
108205

109-
Every destination table includes the following metadata columns:
110-
111-
| Column | Type | Description |
112-
|--------|------|-------------|
113-
| `_fivetran_synced` | `DateTime64(9, 'UTC')` | Timestamp when the record was synced by Fivetran. Used as the version column for `ReplacingMergeTree`. |
114-
| `_fivetran_deleted` | `Bool` | Soft delete marker. Set to `true` when the source record is deleted. |
115-
| `_fivetran_id` | `String` | Auto-generated unique identifier. Only present when the source table has no primary keys. |
116-
117-
## Ownership and support model {#ownership}
118-
119-
The ClickHouse Fivetran destination has a split ownership model:
120-
121-
- **ClickHouse** develops and maintains the destination connector code ([GitHub](https://github.com/ClickHouse/clickhouse-fivetran-destination)).
122-
- **Fivetran** hosts the connector and is responsible for data movement, pipeline scheduling, and source connectors.
123-
124-
When diagnosing sync failures:
125-
- Check the ClickHouse `system.query_log` for server-side issues.
126-
- Request Fivetran connector process logs for client-side issues.
127-
128-
For connector bugs, [create a GitHub issue](https://github.com/ClickHouse/clickhouse-fivetran-destination/issues) or contact [ClickHouse Support](/about-us/support).
129-
130-
## Debugging Fivetran syncs {#debugging}
131-
132-
Use the following queries to diagnose sync failures on the ClickHouse side.
133-
134-
### Check recent Fivetran errors {#check-errors}
135-
136-
```sql
137-
SELECT event_time, query, exception_code, exception
138-
FROM system.query_log
139-
WHERE client_name LIKE 'fivetran-destination%'
140-
AND exception_code > 0
141-
ORDER BY event_time DESC
142-
LIMIT 50;
143-
```
206+
The ClickHouse Cloud destination retries transient network errors using the exponential backoff algorithm.
207+
This is safe even when the destination inserts the data, as any potential duplicates are handled by
208+
the `SharedReplacingMergeTree` table engine, either during background merges,
209+
or when querying the data with `SELECT FINAL`.
144210

145-
### Check replica health {#check-replicas}
146-
147-
```sql
148-
SELECT database, table, total_replicas, active_replicas, replica_is_active
149-
FROM system.replicas
150-
WHERE database LIKE 'ft_%'
151-
ORDER BY active_replicas ASC;
152-
```
153-
154-
### Identify orphaned replicas {#orphaned-replicas}
155-
156-
Orphaned replicas from migrated or scaled services can block DDL operations. Identify them with:
157-
158-
```sql
159-
SELECT DISTINCT arrayJoin(mapKeys(replica_is_active)) AS replica_name
160-
FROM system.replicas
161-
WHERE arrayJoin(mapValues(replica_is_active)) = 0;
162-
```
163-
164-
To remove an orphaned replica:
165-
166-
```sql
167-
SYSTEM DROP REPLICA '<old-replica-name>' FROM TABLE <db>.<table>;
168-
```
169-
170-
### Check recent Fivetran user activity {#check-activity}
171-
172-
```sql
173-
SELECT event_time, query_kind, query, exception_code, exception
174-
FROM system.query_log
175-
WHERE user = 'fivetran_user'
176-
ORDER BY event_time DESC
177-
LIMIT 100;
178-
```

0 commit comments

Comments
 (0)