You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, selecting the latest version of the data without duplicates ad-hoc is possible with the `FINAL` keyword:
212
210
213
211
```sql
214
212
SELECT*
215
213
FROM example FINAL
216
214
LIMIT1000
217
-
SETTINGS select_sequential_consistency =1;
218
215
```
219
216
220
-
See also [Duplicate records with ReplacingMergeTree](/integrations/fivetran/troubleshooting#duplicate-records)in the troubleshooting guide.
217
+
Check out the "optimizing reading queries](/integrations/fivetran/troubleshooting#optimizing-reading-queries)" section in the troubleshooting guide for query optimization tips.
221
218
222
219
## Retries on network failures {#retries-on-network-failures}
223
220
224
221
The ClickHouse Cloud destination retries transient network errors using the exponential backoff algorithm.
225
222
This is safe even when the destination inserts the data, as any potential duplicates are handled by
226
-
the `SharedReplacingMergeTree` table engine, either during background merges,
ClickHouse uses `SharedReplacingMergeTree`for Fivetran destination tables. Duplicate rows with the same primary key are normal — deduplication happens asynchronously during background merges.
149
+
ClickHouse uses `SharedReplacingMergeTree`forFivetran destination tables, which is the version of the [`ReplacingMergeTree` table engine](/guides/replacing-merge-tree)in ClickHouse Cloud. Duplicate rows with the same primary key are normal — deduplication happens asynchronously during background merges. At read time, you need to be careful to avoid returning duplicate rows, as some rows may not have been deduplicated yet.
150
150
151
-
**Always use the `FINAL`modifier**to get deduplicated results:
151
+
Using the `FINAL`keyword is the simplest way to avoid duplicate rows, as it forces a merge of any not-yet-deduplicated rows at read time:
152
152
153
153
```sql
154
154
SELECT * FROM schema.table FINAL WHERE ...
155
155
```
156
156
157
-
See the [table-structure](/integrations/fivetran/reference#table-structure) reference for more details.
157
+
There are ways to optimize this `FINAL` operation — for example, by filtering on key columns using a `WHERE` condition. For more details, see the [FINAL performance](/guides/replacing-merge-tree#final-performance) section of the ReplacingMergeTree guide.
158
+
159
+
If those optimizations are not sufficient, you have additional options that avoid using `FINAL`while still handling duplicates correctly:
160
+
- If you want to query a numeric column that is always incrementing, [you can use `max(the_column)`](/guides/developer/deduplication#avoiding-final).
161
+
- If you need to retrieve the latest value for some columns for a particular key, you can use [`argMax(the_column, _fivetran_id)`](https://clickhouse.com/blog/10-best-practice-tips#perfecting_replacingmergetree).
158
162
159
163
### Primary key and ORDER BY optimization {#primary-key-optimization}
160
164
161
-
Fivetran replicates the source table's primary key as the ClickHouse `ORDER BY` clause. When the source has no PK, `_fivetran_id` (a UUID) becomes the sorting key, which sometimes may lead to poor query performance because ClickHouse builds its [sparse primary index](/guides/best-practices/sparse-primary-indexes) from the `ORDER BY` columns.
165
+
Fivetran replicates the source table's primary key as the ClickHouse `ORDER BY` clause. When the source has no PK, `_fivetran_id` (a UUID) becomes the sorting key, which can lead to poor query performance because ClickHouse builds its [sparse primary index](/guides/best-practices/sparse-primary-indexes) from the `ORDER BY` columns.
162
166
163
-
**Recommendations:**
167
+
**Recommendations in this case if any other optimization is not sufficient:**
164
168
165
169
1. **Treat Fivetran tables as raw staging tables.** Do not query them directly for analytics.
166
-
2. **Create materialized views** with an `ORDER BY` optimized for your query patterns:
170
+
2. **If queries are still not performant enough**, use a [Refreshable Materialized View](/materialized-view/refreshable-materialized-view) to create a copy of the table with an `ORDER BY` optimized for your query patterns. Unlike incremental materialized views, refreshable materialized views re-run the full query on a schedule, which correctly handles the `UPDATE` and `DELETE` operations that Fivetran issues during syncs:
167
171
```sql
168
172
CREATE MATERIALIZED VIEW schema.table_optimized
173
+
REFRESH EVERY 1 HOUR
169
174
ENGINE = ReplacingMergeTree()
170
175
ORDER BY (user_id, event_date)
171
-
AS SELECT * FROM schema.table_raw;
176
+
AS SELECT * FROM schema.table_raw FINAL;
172
177
```
173
178
179
+
:::note
180
+
Avoid incremental (non-refreshable) materialized views for Fivetran-managed tables. Because Fivetran issues `UPDATE` and `DELETE` operations to keep data in sync, incremental materialized views will not reflect these changes and will contain stale or incorrect data.
Avoid manual DDL changes (e.g., `ALTER TABLE ... MODIFY COLUMN`) to tables managed by Fivetran. The connector expects the schema it created. Manual changes can cause [type mapping errors](#uint64-type-error) and schema mismatch failures.
0 commit comments