Skip to content

Commit bf24d85

Browse files
docs(dwh): add ClickHouse source (#16469)
* docs(dwh): add ClickHouse source Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(dwh): remove beta callout from ClickHouse source Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d8037d7 commit bf24d85

1 file changed

Lines changed: 62 additions & 0 deletions

File tree

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Linking ClickHouse as a source
3+
sidebar: Docs
4+
showTitle: true
5+
availability:
6+
free: full
7+
selfServe: full
8+
enterprise: full
9+
sourceId: ClickHouse
10+
---
11+
12+
The ClickHouse connector can link your ClickHouse database tables to PostHog. ClickHouse databases are often very large, so we stream the data in Arrow batches to keep memory bounded.
13+
14+
To link ClickHouse:
15+
16+
1. Go to the [Data pipeline page](https://app.posthog.com/data-management/sources) and the sources tab in PostHog
17+
2. Click **New source** and select ClickHouse
18+
3. Enter your database connection details:
19+
- **Host:** The hostname or IP of your ClickHouse server like `play.clickhouse.com` or `123.132.1.100`.
20+
- **Port:** The HTTP(S) port your ClickHouse server is listening on. The default is `8443` for HTTPS and `8123` for HTTP.
21+
- **Database:** The name of the database you want to sync. The default is `default`.
22+
- **User:** The username with read permissions on the database.
23+
- **Password:** The password for the user (optional).
24+
- **Use HTTPS?:** Whether to connect over HTTPS. Default is enabled.
25+
- **Verify SSL certificate?:** Whether to verify the server's SSL certificate. Default is enabled. Disable if your server uses a self-signed certificate.
26+
4. If you need to connect through an SSH tunnel, enable and configure it (optional):
27+
- **Tunnel host:** The hostname of your SSH server.
28+
- **Tunnel port:** The port your SSH server is listening on.
29+
- **Authentication type:**
30+
- For password authentication, enter your SSH username and password.
31+
- For key-based authentication, enter your SSH username, private key, and optional passphrase.
32+
5. Click **Next**
33+
34+
The data warehouse then starts syncing your ClickHouse data. You can see details and progress in the [sources tab](https://app.posthog.com/data-management/sources).
35+
36+
> **Permissions:** The ClickHouse source only requires read permissions on the database and tables you intend to sync, plus read access to `system.tables` and `system.columns` for schema discovery.
37+
38+
## Supported table engines
39+
40+
PostHog can sync data from any ClickHouse table engine, but row counts are only available for engines that track them:
41+
42+
- **MergeTree family** (including `ReplacingMergeTree`, `SummingMergeTree`, etc.) — full support including accurate row counts from `system.tables.total_rows`.
43+
- **Distributed tables** — row counts come from a distributed `SELECT count()`.
44+
- **MaterializedView** — resolves to the underlying `TO` target table or `.inner_id.<uuid>` inner table for row counts.
45+
- **View** — synced on demand. Row count shown as "Skipped" because counting would require a full scan.
46+
- **Memory, Buffer, Log, Kafka, URL, and other no-counter engines** — synced on demand. Row count shown as "Skipped".
47+
48+
## Incremental sync
49+
50+
Incremental syncs are supported on integer (`Int8``Int256`, `UInt8``UInt256`) and temporal (`Date`, `Date32`, `DateTime`, `DateTime64`) cursor fields.
51+
52+
PostHog uses the sorting key from `system.columns` as the detected primary key. Because ClickHouse sorting keys are not guaranteed to be unique, every incremental sync runs a bounded duplicate-key probe first and will fail the sync if duplicates are detected on the chosen primary key.
53+
54+
## Type handling
55+
56+
ClickHouse's Arrow output does not support every type, so PostHog serializes the following to strings on the server side to keep the stream reliable: `UUID`, `IPv4`/`IPv6`, wide ints (`Int128`/`Int256`/`UInt128`/`UInt256`), `Enum8`/`Enum16`, `FixedString`, `Array`, `Map`, `Tuple`, `Nested`, `Variant`, `Dynamic`, `JSON`, and `Object`.
57+
58+
`Nullable` and `LowCardinality` wrappers, `DateTime`/`DateTime64` precision and timezones, and `Decimal[32–256]` are all preserved natively.
59+
60+
import InboundIpAddresses from '../_snippets/inbound-ip-addresses.mdx'
61+
62+
<InboundIpAddresses />

0 commit comments

Comments
 (0)