Skip to content

Commit 76e9c07

Browse files
anandgupta42claude
andauthored
feat: add ClickHouse warehouse driver (#574)
* feat: add ClickHouse warehouse driver with full integration Add first-class ClickHouse support as the 12th database driver: **Driver (`packages/drivers/src/clickhouse.ts`):** - Official `@clickhouse/client` over HTTP(S) - Supports ClickHouse server 23.3+ (all non-EOL versions) - Password, connection string, and TLS/mTLS auth - ClickHouse Cloud and self-hosted compatible - Parameterized queries for SQL injection prevention - DML-aware LIMIT injection (won't break `WITH...INSERT`) **Integration (23 touchpoints):** - Registry: `DRIVER_MAP`, import switch, `PASSWORD_DRIVERS` - Discovery: Docker containers, env vars (`CLICKHOUSE_HOST`/`CLICKHOUSE_URL`), dbt profiles (`ADAPTER_TYPE_MAP`), dbt lineage dialect - FinOps: `system.query_log` query history template - Normalization: aliases for `connectionString`, `requestTimeout`, TLS fields - Publish: `@clickhouse/client` in `peerDependencies` **Tests:** - 30+ E2E tests across 5 suites (latest, LTS 23.8, 24.3, 24.8, connection string) - 14 config normalization tests for all ClickHouse aliases - MergeTree variants, materialized views, Nullable columns, Array/Map/IPv4 types **Documentation:** - Full config section in `warehouses.md` (standard, Cloud, connection string) - Support matrix entry in `drivers.md` with auth methods - Dedicated guide (`guides/clickhouse.md`): MergeTree optimization, materialized view pipelines, dialect translation, LowCardinality tips, dbt integration - Updated README, getting-started, warehouse-tools docs **Engineering:** - `packages/drivers/ADDING_A_DRIVER.md` — 23-point checklist for adding future drivers - `.claude/commands/add-database-driver.md` — Claude skill to automate the process Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use `client.command()` for ClickHouse DDL/DML, fix E2E test auth - `execute()` now uses `client.command()` for INSERT/CREATE/DROP/ALTER queries instead of `client.query()` with JSONEachRow format, which caused parse errors on INSERT VALUES - Add `CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1` to all LTS Docker containers (required for passwordless default user) - Fix UInt64 assertion to handle both string and number JSON encoding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: add ClickHouse E2E tests to driver-e2e CI job - Add `clickhouse/clickhouse-server:latest` as a GitHub Actions service - Add test step running `drivers-clickhouse-e2e.test.ts` with CI env vars - Add test file to change detection paths for the `drivers` filter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: 3 driver bugs found by adversarial testing (167 tests, 3 real failures) Ran 167 adversarial tests against real ClickHouse Docker containers covering SQL injection, unicode, NULLs, LIMIT edge cases, exotic types, error handling, large data, MergeTree variants, views, system tables, concurrent operations, and return value edge cases. **Bugs found and fixed:** 1. **DESCRIBE/EXISTS get LIMIT appended** — `isSelectLike` regex matched DESCRIBE/EXISTS but ClickHouse doesn't support LIMIT on these statements. Fix: narrowed `supportsLimit` to only `SELECT` and `WITH` queries. 2. **`limit=0` returns 0 rows** — truncation check `rows.length > 0` was always true, causing `slice(0, 0)` to return empty array. Fix: guard with `effectiveLimit > 0 &&` before truncation check. 3. **`limit=0` treated as `limit=1000`** — `0 ?? 1000` returns 0 (correct) but `limit === undefined ? 1000 : limit` properly distinguishes "not provided" from "explicitly zero". Changed from `??` to explicit check. **Regression tests added (5 tests in main E2E suite):** - DESCRIBE TABLE without LIMIT error - EXISTS TABLE without LIMIT error - limit=0 returns all rows without truncation - INSERT uses `client.command()` not `client.query()` - WITH...INSERT does not get LIMIT appended Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review findings for ClickHouse driver PR - Remove stale ClickHouse entry from "Unsupported Databases" doc section - Add ClickHouse to Docker auto-discovery description in docs - Add blank line around ClickHouse auth table for markdownlint MD058 - Add `text` language tag to fenced code block for markdownlint MD040 - Fail fast when `binds` passed to ClickHouse `execute()` instead of ignoring - Add `tls_key`, `tls_cert`, `tls_ca_cert` to SENSITIVE_FIELDS in credential store - Clamp `days`/`limit` values in ClickHouse query history SQL builder - Add `clickhouse`, `clickhouse+http`, `clickhouse+https` to DATABASE_URL scheme map - Make `waitForPort` accept configurable host in E2E tests - Close failed connectors during `waitForDbReady` retries in E2E tests - Add missing TLS alias tests: `ca_cert`, `ssl_cert`, `ssl_key` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 75b077f commit 76e9c07

File tree

29 files changed

+1937
-130
lines changed

29 files changed

+1937
-130
lines changed
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
description: Add a new database driver to Altimate Code. Scaffolds the driver, registers it across all 23 integration points, writes E2E tests, and updates docs. Usage - /add-database-driver <database-name>
3+
---
4+
5+
# Add Database Driver
6+
7+
Scaffold and fully integrate a new database/warehouse driver into Altimate Code. This command handles all 23 integration points — driver code, registry, discovery, finops, tests, and documentation.
8+
9+
## Input
10+
11+
`$ARGUMENTS` = the database name (e.g., `cockroachdb`, `timescaledb`, `cassandra`, `neo4j`).
12+
13+
If empty, ask: "Which database should I add support for?"
14+
15+
## Step 0: Research
16+
17+
Before writing any code, research the database:
18+
19+
1. **Find the official Node.js/TypeScript client package** on npm. Search for `@{database}/client`, `{database}-js`, or similar.
20+
2. **Check supported server versions** — which versions are not EOL?
21+
3. **Identify auth methods** — password, token, TLS/certificate, connection string, cloud-specific?
22+
4. **Check SQL dialect** — standard SQL? Custom syntax? LIMIT vs TOP vs FETCH FIRST? System tables for schemas/tables/columns?
23+
5. **Find Docker image** — official image on Docker Hub for E2E testing?
24+
6. **Check if dbt adapter exists** — search for `dbt-{database}` on PyPI.
25+
26+
Present findings to the user before proceeding:
27+
```
28+
## Research: {Database}
29+
30+
- **npm package**: `{package}` (v{version})
31+
- **Server versions**: {non-EOL versions}
32+
- **Auth methods**: {list}
33+
- **SQL dialect**: {notes on LIMIT, system tables, parameterized queries}
34+
- **Docker image**: `{image}:{tag}`
35+
- **dbt adapter**: {exists/not found}
36+
37+
Proceed with implementation?
38+
```
39+
40+
## Step 1: Read Reference Document
41+
42+
Read the comprehensive checklist:
43+
```bash
44+
cat packages/drivers/ADDING_A_DRIVER.md
45+
```
46+
47+
This document has all 23 integration points with exact file paths and code patterns.
48+
49+
## Step 2: Read Existing Driver for Pattern
50+
51+
Read a similar existing driver as a template. Choose based on database type:
52+
53+
- **SQL database with password auth** → read `packages/drivers/src/mysql.ts`
54+
- **Cloud warehouse with token auth** → read `packages/drivers/src/databricks.ts`
55+
- **Database with connection string support** → read `packages/drivers/src/postgres.ts`
56+
- **HTTP-based client** → read `packages/drivers/src/clickhouse.ts`
57+
- **Document database (non-SQL)** → read `packages/drivers/src/mongodb.ts`
58+
59+
Also read:
60+
- `packages/drivers/src/normalize.ts` — for alias pattern
61+
- `packages/opencode/src/altimate/native/connections/registry.ts` — for registration pattern
62+
- `packages/opencode/test/altimate/drivers-docker-e2e.test.ts` — for E2E test pattern
63+
64+
## Step 3: Implement (23 integration points)
65+
66+
Work through all 9 phases from the checklist. Use parallel edits where possible.
67+
68+
### Phase 1: Core Driver (4 files)
69+
70+
1. **Create `packages/drivers/src/{database}.ts`**
71+
- Follow the Connector interface: `connect()`, `execute()`, `listSchemas()`, `listTables()`, `describeTable()`, `close()`
72+
- Lazy-import the npm package
73+
- Use parameterized queries for schema introspection
74+
- Handle LIMIT injection with DML guard: `!hasDML` check before appending LIMIT
75+
- Handle TLS detection from connection strings
76+
77+
2. **Add export to `packages/drivers/src/index.ts`**
78+
79+
3. **Add optionalDependency to `packages/drivers/package.json`**
80+
81+
4. **Add aliases to `packages/drivers/src/normalize.ts`**
82+
83+
### Phase 2: Registry (4 files in registry.ts)
84+
85+
5. Add to `DRIVER_MAP`
86+
6. Add to import switch statement
87+
7. Add to `PASSWORD_DRIVERS` (if applicable)
88+
8. Remove from `KNOWN_UNSUPPORTED` (if listed)
89+
90+
### Phase 3: Discovery (4 files)
91+
92+
9. Docker discovery — `docker-discovery.ts` (IMAGE_MAP, ENV_MAP, DEFAULT_PORTS, DEFAULT_USERS)
93+
10. Env var detection — `project-scan.ts` (detectEnvVars warehouses array)
94+
11. dbt adapter — `dbt-profiles.ts` (ADAPTER_TYPE_MAP)
95+
12. dbt lineage — `dbt/lineage.ts` (detectDialect dialectMap)
96+
97+
### Phase 4: FinOps (1 file)
98+
99+
13. Query history — `finops/query-history.ts` (SQL template + handler if database has system query log)
100+
101+
### Phase 5: Build (1 file)
102+
103+
14. Peer deps — `script/publish.ts` (driverPeerDependencies)
104+
105+
### Phase 6: Tool Descriptions (1 file)
106+
107+
15. warehouse_add — `tools/warehouse-add.ts` (config description + error message)
108+
109+
### Phase 7: Tests (2 new files + 1 edit)
110+
111+
16. E2E tests — `test/altimate/drivers-{database}-e2e.test.ts`
112+
17. Normalization tests — add to `test/altimate/driver-normalize.test.ts`
113+
18. Verify existing tests pass
114+
115+
### Phase 8: Documentation (5 files)
116+
117+
19. `docs/docs/configure/warehouses.md` — config section + update count
118+
20. `docs/docs/drivers.md` — support matrix + installation + auth + update count
119+
21. `docs/docs/data-engineering/tools/warehouse-tools.md` — env vars + Docker
120+
22. `README.md` — warehouse list
121+
23. `docs/docs/getting-started/index.md` — homepage list
122+
123+
### Phase 9: Optional
124+
125+
- Guide page at `docs/docs/data-engineering/guides/{database}.md`
126+
- Update `mkdocs.yml` nav and `guides/index.md`
127+
- Check fingerprint regex in `fingerprint/index.ts`
128+
129+
## Step 4: Run Quality Gates
130+
131+
```bash
132+
# Tests (from packages/opencode/)
133+
cd packages/opencode && bun test test/altimate/driver-normalize.test.ts test/altimate/connections.test.ts test/altimate/drivers-{database}-e2e.test.ts
134+
135+
# Typecheck (from repo root)
136+
cd "$(git rev-parse --show-toplevel)" && bun turbo typecheck
137+
138+
# Marker check (from repo root)
139+
bun run script/upstream/analyze.ts --markers --base main --strict
140+
```
141+
142+
All three must pass before proceeding.
143+
144+
## Step 5: Run Code Review
145+
146+
Run `/consensus:code-review` to get the implementation reviewed by multiple models before committing.
147+
148+
## Step 6: Summary
149+
150+
Present final summary:
151+
```
152+
## {Database} Driver Added
153+
154+
### Files Created
155+
- packages/drivers/src/{database}.ts
156+
- packages/opencode/test/altimate/drivers-{database}-e2e.test.ts
157+
- docs/docs/data-engineering/guides/{database}.md (if created)
158+
159+
### Files Modified
160+
- {list all modified files}
161+
162+
### Test Results
163+
- {N} normalization tests pass
164+
- {N} connection tests pass
165+
- Typecheck: pass
166+
- Marker check: pass
167+
168+
### E2E Test Coverage
169+
- {list of test suites and server versions}
170+
171+
Ready to commit.
172+
```
173+
174+
## Rules
175+
176+
1. **Read before writing.** Always read existing drivers and the reference doc before creating new code.
177+
2. **Don't skip integration points.** All 23 points exist for a reason — missing one causes inconsistencies users will hit.
178+
3. **Use parameterized queries** for `listTables` and `describeTable` — never interpolate user input into SQL.
179+
4. **Test multiple server versions** — at minimum: latest stable + oldest non-EOL LTS.
180+
5. **Run all quality gates** before presenting the summary.
181+
6. **Don't modify finops tools** (credit-analyzer, warehouse-advisor, unused-resources) unless the database has equivalent cost/credit APIs.

.github/meta/commit.txt

Lines changed: 1 addition & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1 @@
1-
fix: tool reliability improvements for sql-classify, edit, and webfetch (#581)
2-
3-
**sql-classify.ts:**
4-
- Fix `computeSqlFingerprint` referencing undefined `core` variable after
5-
safe-import refactor — extract `extractMetadata` as module-level guard
6-
- Invert fallback classifier to whitelist reads (`READ_PATTERN`) instead of
7-
blacklisting writes — treats unknown statements as "write" for safety
8-
- Handle multi-statement SQL in fallback by splitting on semicolons
9-
- Strip `--` line comments in fallback (block comments already stripped)
10-
- Fix `HARD_DENY_PATTERN` trailing `\s` → `\b` to match `TRUNCATE;`
11-
12-
**edit.ts:**
13-
- Add `buildNotFoundMessage` with Levenshtein nearest-match snippets for
14-
LLM self-correction when `oldString` not found
15-
- Fix substring matching to prefer exact equality over short-line matches
16-
17-
**webfetch.ts:**
18-
- Add session-level URL failure cache (404/410/451) with 5-min TTL
19-
- Add `buildFetchError` with actionable status-specific error messages
20-
- Add `sanitizeUrl` to strip query strings from error messages
21-
- Add URL validation via `new URL()` constructor
22-
- Add `MAX_CACHED_URLS = 500` size cap with oldest-entry eviction
23-
24-
**Tests:** 12 new tests for `buildNotFoundMessage`, `replace` error
25-
messages, `computeSqlFingerprint`, and updated webfetch assertions.
26-
27-
Closes #581
28-
29-
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1+
feat: add ClickHouse warehouse driver

.github/workflows/ci.yml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ jobs:
4646
- 'packages/opencode/test/altimate/drivers-e2e.test.ts'
4747
- 'packages/opencode/test/altimate/drivers-docker-e2e.test.ts'
4848
- 'packages/opencode/test/altimate/drivers-mongodb-e2e.test.ts'
49+
- 'packages/opencode/test/altimate/drivers-clickhouse-e2e.test.ts'
4950
- 'packages/opencode/test/altimate/connections.test.ts'
5051
dbt-tools:
5152
- 'packages/dbt-tools/**'
@@ -198,6 +199,19 @@ jobs:
198199
--health-timeout 5s
199200
--health-retries 10
200201
202+
clickhouse:
203+
image: clickhouse/clickhouse-server:latest
204+
env:
205+
CLICKHOUSE_DB: testdb
206+
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
207+
ports:
208+
- 18123:8123
209+
options: >-
210+
--health-cmd "wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1"
211+
--health-interval 5s
212+
--health-timeout 5s
213+
--health-retries 15
214+
201215
steps:
202216
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
203217

@@ -245,6 +259,13 @@ jobs:
245259
TEST_MONGODB_HOST: 127.0.0.1
246260
TEST_MONGODB_PORT: "27017"
247261

262+
- name: Run ClickHouse driver E2E
263+
run: bun test test/altimate/drivers-clickhouse-e2e.test.ts
264+
working-directory: packages/opencode
265+
env:
266+
TEST_CLICKHOUSE_HOST: 127.0.0.1
267+
TEST_CLICKHOUSE_PORT: "18123"
268+
248269
# Cloud tests NOT included — they require real credentials
249270
# Run locally with:
250271
# ALTIMATE_CODE_CONN_SNOWFLAKE_TEST='...' bun test test/altimate/drivers-snowflake-e2e.test.ts

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ Each mode has scoped permissions, tool access, and SQL write-access control.
151151
152152
## Supported Warehouses
153153
154-
Snowflake · BigQuery · Databricks · PostgreSQL · Redshift · DuckDB · MySQL · SQL Server · Oracle · SQLite
154+
Snowflake · BigQuery · Databricks · PostgreSQL · Redshift · ClickHouse · DuckDB · MySQL · SQL Server · Oracle · SQLite · MongoDB
155155
156156
First-class support with schema indexing, query execution, and metadata introspection. SSH tunneling available for secure connections.
157157

bun.lock

Lines changed: 5 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/docs/configure/warehouses.md

Lines changed: 62 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Warehouses
22

3-
Altimate Code connects to 9 warehouse types. Configure them in `.altimate-code/connections.json` (project-local) or `~/.altimate-code/connections.json` (global).
3+
Altimate Code connects to 10 warehouse types. Configure them in `.altimate-code/connections.json` (project-local) or `~/.altimate-code/connections.json` (global).
44

55
## Configuration
66

@@ -288,6 +288,66 @@ If you're already authenticated via `gcloud`, omit `credentials_path`:
288288
!!! info "Server compatibility"
289289
The MongoDB driver (v6.x) supports MongoDB server versions 3.6 through 8.0, covering all releases from the last 3+ years.
290290

291+
## ClickHouse
292+
293+
```json
294+
{
295+
"clickhouse-prod": {
296+
"type": "clickhouse",
297+
"host": "localhost",
298+
"port": 8123,
299+
"database": "analytics",
300+
"user": "default",
301+
"password": "{env:CLICKHOUSE_PASSWORD}"
302+
}
303+
}
304+
```
305+
306+
| Field | Required | Description |
307+
|-------|----------|-------------|
308+
| `connection_string` | No | Full URL (alternative to individual fields, e.g. `http://user:pass@host:8123`) |
309+
| `host` | No | Hostname (default: `localhost`) |
310+
| `port` | No | HTTP port (default: `8123`) |
311+
| `database` | No | Database name (default: `default`) |
312+
| `user` | No | Username (default: `default`) |
313+
| `password` | No | Password |
314+
| `protocol` | No | `http` or `https` (default: `http`) |
315+
| `request_timeout` | No | Request timeout in ms (default: `30000`) |
316+
| `tls_ca_cert` | No | Path to CA certificate for TLS |
317+
| `tls_cert` | No | Path to client certificate for mutual TLS |
318+
| `tls_key` | No | Path to client key for mutual TLS |
319+
| `clickhouse_settings` | No | Object of ClickHouse server settings |
320+
321+
### ClickHouse Cloud
322+
323+
```json
324+
{
325+
"clickhouse-cloud": {
326+
"type": "clickhouse",
327+
"host": "abc123.us-east-1.aws.clickhouse.cloud",
328+
"port": 8443,
329+
"protocol": "https",
330+
"user": "default",
331+
"password": "{env:CLICKHOUSE_CLOUD_PASSWORD}",
332+
"database": "default"
333+
}
334+
}
335+
```
336+
337+
### Using a connection string
338+
339+
```json
340+
{
341+
"clickhouse-prod": {
342+
"type": "clickhouse",
343+
"connection_string": "https://default:secret@my-ch.cloud:8443"
344+
}
345+
}
346+
```
347+
348+
!!! info "Server compatibility"
349+
The ClickHouse driver supports ClickHouse server versions 23.3 and later, covering all non-EOL releases. This includes LTS releases 23.8, 24.3, 24.8, and all stable releases through the current version.
350+
291351
## SQL Server
292352

293353
```json
@@ -320,7 +380,6 @@ The following databases are not yet natively supported, but workarounds are avai
320380

321381
| Database | Workaround |
322382
|----------|------------|
323-
| ClickHouse | Use the bash tool with `clickhouse-client` or `curl` to query directly |
324383
| Cassandra | Use the bash tool with `cqlsh` to query directly |
325384
| CockroachDB | PostgreSQL-compatible — use `type: postgres` |
326385
| TimescaleDB | PostgreSQL extension — use `type: postgres` |
@@ -362,7 +421,7 @@ The `/discover` command can automatically detect warehouse connections from:
362421
| Source | Detection |
363422
|--------|-----------|
364423
| dbt profiles | Parses `~/.dbt/profiles.yml` |
365-
| Docker containers | Finds running PostgreSQL, MySQL, and SQL Server containers |
424+
| Docker containers | Finds running PostgreSQL, MySQL, SQL Server, and ClickHouse containers |
366425
| Environment variables | Scans for `SNOWFLAKE_ACCOUNT`, `PGHOST`, `DATABRICKS_HOST`, etc. |
367426

368427
See [Warehouse Tools](../data-engineering/tools/warehouse-tools.md) for the full list of environment variable signals.

0 commit comments

Comments
 (0)