Skip to content

feat(node): add ClickHouse database node (db_postgres clone)#1064

Merged
asclearuc merged 4 commits into
rocketride-org:developfrom
mithileshgau:feat/RR-1051-clickhouse-node
Jun 2, 2026
Merged

feat(node): add ClickHouse database node (db_postgres clone)#1064
asclearuc merged 4 commits into
rocketride-org:developfrom
mithileshgau:feat/RR-1051-clickhouse-node

Conversation

@mithileshgau
Copy link
Copy Markdown
Collaborator

@mithileshgau mithileshgau commented Jun 2, 2026

Summary

  • Add a dedicated ClickHouse node (db_clickhouse), implemented as a thin dialect clone of the existing db_mysql / db_postgres nodes over the shared ai.common.database base — only the connection params + DSN builder are ClickHouse-specific; schema reflection, NL→SQL, EXPLAIN validation, SELECT-only safety, and insertion are inherited unchanged.
  • Dual role (classType: ["database","tool"]): pipeline node (questions→SQL→execute; answers/table→insert) and agent tool (clickhouse.get_data / get_schema / get_sql).
  • ClickHouse-only tls toggle for TLS connections (ClickHouse Cloud, native port 9440) — kept distinct from the shared password-field secure attribute so the core stays aligned with MySQL/PostgreSQL.

Problem

RocketRide had no first-class ClickHouse support. Users wanting analytical Postgres-style querying or a ClickHouse-backed agent tool had no node for it.

Solution

db_clickhouse clones the proven db_postgres/db_mysql pattern: IGlobal supplies the ClickHouse connection params and a clickhouse-sqlalchemy native-TCP DSN (clickhouse+native://, port 9000), IInstance supplies the display name/dialect.
clickhouse-sqlalchemy reflects ClickHouse's empty FK list + best-effort PK, so the dialect-agnostic base works as-is. Read-only by default; raw SQL gated behind allow_execute, matching the other DB nodes.

Alternatives considered

  • Profile on a generic DB node — rejected; ClickHouse needs its own driver and a TLS field.
  • MCP-client clone (mcp-clickhouse, or the hosted mcp.clickhouse.cloud remote MCP) — a viable complementary node, the hosted MCP is OAuth-only and would need OAuth support added to tool_mcp_client.
  • allow_write/allow_drop + run_query tool surface; it's a base-class change affecting MySQL/PostgreSQL too.

Notes / caveats

  • Two pre-existing edge cases are shared with db_mysql/db_postgres and left for parity
    (a base-wide fix is the right place): .strip() on an explicit null config value, and
    IPv6-host port detection in the new TLS branch. Neither is reachable with realistic
    ClickHouse hosts (Cloud hostnames / localhost).
  • Example pipelines + seed SQL used for manual verification are intentionally not included
    to keep this PR focused on the node.

Type

Feature — new pipeline node.

Testing

  • Tests added or updated — nodes/test/db_clickhouse/ (22 DSN unit tests)
  • Tested locally — verified end-to-end against a local ClickHouse (Docker) and
    ClickHouse Cloud, both as a pipeline node and as a CrewAI agent tool
  • ./builder test passes — ran ./builder nodes:test-contracts (210 passed) + the new
    unit tests + ruff check/format; full C++ ./builder test not run locally

Checklist

  • Commit messages follow conventional commits
  • No secrets or credentials included
  • Wiki updated (n/a)
  • Breaking changes documented (none — additive node only)

Linked Issue

Fixes #1051

Summary by CodeRabbit

  • New Features

    • ClickHouse node: connection setup, TLS handling (defaults to port 9440 when secure), URL-encoded credentials, configurable validation retries (default 5, clamped 1–20), service registration and tooling (query/schema/sql) with safety gate.
  • Bug Fixes

    • Ensure user and database components are URL-encoded for MySQL and PostgreSQL connection URLs.
  • Documentation

    • Added ClickHouse usage, configuration, and cloud setup guide.
  • Tests

    • Unit tests for DSN construction, TLS parsing, credential encoding, and retry parsing.
  • Chores

    • Pinned ClickHouse client dependencies.

@github-actions github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 02f6e5ad-948a-446f-a50e-7c2b3d9a91d2

📥 Commits

Reviewing files that changed from the base of the PR and between 7cc4b8c and 6f9b0ef.

📒 Files selected for processing (5)
  • nodes/src/nodes/db_clickhouse/IGlobal.py
  • nodes/src/nodes/db_clickhouse/IInstance.py
  • nodes/src/nodes/db_mysql/IGlobal.py
  • nodes/src/nodes/db_postgres/IGlobal.py
  • nodes/test/db_clickhouse/test_clickhouse_dsn.py

📝 Walkthrough

Walkthrough

Adds a ClickHouse database node: implements ClickHouse-specific IGlobal for normalized connection params and native DSN construction (with TLS/port logic), registers the node in services.json, pins ClickHouse deps, provides package/init, README, IInstance, and unit tests for DSN and retry parsing.

Changes

ClickHouse Database Node

Layer / File(s) Summary
IGlobal core implementation
nodes/src/nodes/db_clickhouse/IGlobal.py
New IGlobal implements _connection_params, _build_connection_url, _max_validation_attempts, and _db_description, handling TLS normalization, password-whitespace preservation, URL-encoding of user/password/database, TLS port 9440 defaulting when host has no explicit port (including bracketed IPv6 handling), and clamping attempts to 1..20.
IGlobal unit tests
nodes/test/db_clickhouse/test_clickhouse_dsn.py
Adds tests that dynamically load IGlobal, validate defaults, whitespace rules, None coercion, TLS truthy/falsey parsing, DSN formatting (plaintext vs TLS, explicit ports, IPv6), URL-encoding of credentials and user/database, and max_attempts parsing/clamping.
Package, service manifest, docs, instance, deps
nodes/src/nodes/db_clickhouse/__init__.py, nodes/src/nodes/db_clickhouse/requirements.txt, nodes/src/nodes/db_clickhouse/services.json, nodes/src/nodes/db_clickhouse/README.md, nodes/src/nodes/db_clickhouse/IInstance.py
Initializes package to load local requirements via depends(...), pins clickhouse-sqlalchemy==0.3.2 and clickhouse-driver==0.2.9, registers db_clickhouse:// service with field schema and pipe wiring, adds README describing node roles and safety defaults, and adds IInstance subclass returning ClickHouse display name and dialect.
MySQL/Postgres DSN encoding fixes
nodes/src/nodes/db_mysql/IGlobal.py, nodes/src/nodes/db_postgres/IGlobal.py
Encode user, password, and database components via URL-quoting when building SQLAlchemy DSNs; adds/updates brief docstrings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I’m a rabbit with a tiny net,
I hop through configs, trim and vet.
I build the DSN with hops and cheer,
TLS, ports, tests — all clear! 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly identifies the main change: adding a ClickHouse database node as a clone of existing database nodes (db_postgres/db_mysql). It is concise, specific, and accurately reflects the primary objective.
Linked Issues check ✅ Passed The PR successfully implements all coding-related objectives from issue #1051: adds a dual-role ClickHouse node (pipeline DB + agent tool), implements native DSN logic with TLS support, handles connection params with proper URL encoding, gates raw execution, and provides tests. The README documents the architecture.
Out of Scope Changes check ✅ Passed All changes are in-scope: new ClickHouse node implementation, updated MySQL/PostgreSQL docstrings for parity/consistency, comprehensive DSN tests, and README documentation. The removal of the answers ingestion lane (commit 5f3431e) directly addresses reviewer feedback on incompatibility and is a valid scope adjustment.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 67-85: The _build_connection_url function currently only
URL-encodes params['password'], leaving params['user'] and params['database']
unescaped which can break the DSN when they contain reserved characters; update
_build_connection_url to quote_plus both params['user'] and params['database']
(in addition to params['password']) and use the quoted values in both the TLS
branch (return in the if params.get('tls') block) and the non-TLS return path so
the constructed clickhouse+native://{user}:{password}@{host}/{database} string
is always safe; apply the same change pattern to the corresponding
_build_connection_url implementations in db_postgres/IGlobal.py and
db_mysql/IGlobal.py for parity.

In `@nodes/src/nodes/db_clickhouse/services.json`:
- Line 50: The current "description" in services.json describes the node as an
insert-only sink; update the description string for the node (the "description"
property in nodes/src/nodes/db_clickhouse/services.json) to reflect its primary
role as a query/agent tool that translates questions to SQL and executes queries
(question→SQL execution) and exposes the agent tool surface methods get_data,
get_schema and get_sql, while also noting it can insert or map fields for
ingestion—make the wording UI-friendly and accurate so users understand both
querying and optional ingestion capabilities.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ac54fc4-c1ae-4979-a106-289d32bf2ae7

📥 Commits

Reviewing files that changed from the base of the PR and between da2a7d2 and 9153a33.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_clickhouse/clickhouse.svg is excluded by !**/*.svg
📒 Files selected for processing (8)
  • nodes/src/nodes/db_clickhouse/IGlobal.py
  • nodes/src/nodes/db_clickhouse/IInstance.py
  • nodes/src/nodes/db_clickhouse/README.md
  • nodes/src/nodes/db_clickhouse/__init__.py
  • nodes/src/nodes/db_clickhouse/requirements.txt
  • nodes/src/nodes/db_clickhouse/services.json
  • nodes/test/db_clickhouse/__init__.py
  • nodes/test/db_clickhouse/test_clickhouse_dsn.py

Comment thread nodes/src/nodes/db_clickhouse/IGlobal.py Outdated
Comment thread nodes/src/nodes/db_clickhouse/services.json Outdated
Adds a dedicated ClickHouse node, structured as a thin dialect clone of the
existing db_mysql / db_postgres nodes — connection params + DSN builder are the
only ClickHouse-specific code; schema reflection, NL->SQL, EXPLAIN validation,
SELECT-only safety, and insertion are inherited unchanged from
ai.common.database.

Dual role (classType ["database","tool"]):
- pipeline node: questions -> SQL -> execute; answers/table -> insert
- agent tool: clickhouse.get_data / get_schema / get_sql

Driver: clickhouse-sqlalchemy native TCP (clickhouse-driver), port 9000.
ClickHouse-only extra: a `tls` toggle (distinct from the shared password-field
"secure" attribute) that switches the DSN to TLS and assumes the ClickHouse
Cloud native port 9440 — verified against a local server and ClickHouse Cloud.

Read-only by default; raw SQL gated behind allow_execute, matching
MySQL/PostgreSQL. Includes DSN unit tests (nodes/test/db_clickhouse).

Fixes rocketride-org#1051

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mithileshgau mithileshgau force-pushed the feat/RR-1051-clickhouse-node branch from 9153a33 to 4186f3a Compare June 2, 2026 06:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
nodes/src/nodes/db_clickhouse/IGlobal.py (1)

67-85: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Encode user and database in the DSN too.

Only escaping password still leaves valid usernames/databases with reserved characters able to break the SQLAlchemy URL. This was already raised on an earlier revision and is still present.

Suggested fix
     def _build_connection_url(self, params: Dict[str, str]) -> str:
         # URL-encode the password so special characters (e.g. @, /, #) don't
         # break the SQLAlchemy connection string.
+        user = urllib.parse.quote_plus(params['user'])
         password = urllib.parse.quote_plus(params['password'])
+        database = urllib.parse.quote_plus(params['database'])
@@
-            return f'clickhouse+native://{params["user"]}:{password}@{host}/{params["database"]}?secure=true'
+            return f'clickhouse+native://{user}:{password}@{host}/{database}?secure=true'
@@
-        return f'clickhouse+native://{params["user"]}:{password}@{host}/{params["database"]}'
+        return f'clickhouse+native://{user}:{password}@{host}/{database}'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_clickhouse/IGlobal.py` around lines 67 - 85, The DSN
builder in _build_connection_url only URL-encodes params['password'], leaving
params['user'] and params['database'] able to break the SQLAlchemy URL;
URL-encode both the username and database (e.g., via urllib.parse.quote_plus)
before interpolating them into the returned clickhouse+native:// string while
preserving existing host/port and ?secure=true behavior (i.e., replace direct
uses of params["user"] and params["database"] with their quoted equivalents
alongside the already-quoted password).
nodes/src/nodes/db_clickhouse/services.json (1)

50-50: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the description to match the node’s actual role.

This still reads like an insert-only sink, but the node’s primary surface is question-to-SQL querying plus tool functions. The UI copy will mislead users selecting it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_clickhouse/services.json` at line 50, The current
"description" string in services.json describes an insert-only ClickHouse sink;
update the "description" value to accurately reflect this node's primary role as
a question-to-SQL querying tool with tool functions (e.g., SQL generation, query
execution, and optional insertion), mentioning that it translates
natural-language questions into SQL, runs queries against ClickHouse, returns
results, and can optionally map fields for inserts; locate and edit the
"description" array entry in nodes/src/nodes/db_clickhouse/services.json to
replace the insert-focused copy with the new concise summary.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 73-78: The TLS default-port logic incorrectly treats IPv6 literals
as having a port because it checks for ':' in the host; update the check used
when params.get('tls') is true so it's bracket-aware: for hosts starting with
'[' (IPv6 literal) detect a port only if there's a ']' followed by ':' (e.g.,
'...[ ]:port'), otherwise for non-bracketed hosts keep the existing ':' check;
if no port is detected append ':9440' to host. Refer to the params.get('tls')
check and the host variable in IGlobal.py and implement the bracket-aware port
detection before adding ':9440'.
- Around line 87-91: The _max_validation_attempts function currently casts
config.get('max_attempts', 5) to int but does not enforce the documented 1–20
bounds; update _max_validation_attempts to parse the value (catching
ValueError/TypeError), then clamp the resulting int into the inclusive range
1..20 and return that clamped value (falling back to 5 on parse failure), so the
'max_attempts' config key cannot produce 0, negatives, or values >20.
- Around line 57-61: The config values for host, user, database, and table are
being .strip()'d without guarding against explicit nulls; update the code that
builds the connection dict (the block creating 'host', 'user', 'password',
'database', 'table' entries in IGlobal.py) to coerce None to the intended
default before calling .strip() — e.g., use a None-safe expression like
(config.get('host') or 'localhost').strip() and similarly for 'user',
'database', and 'table'; also ensure 'password' is coerced to '' when None (but
keep the comment about not stripping).

In `@nodes/src/nodes/db_clickhouse/services.json`:
- Around line 73-76: The manifest currently exposes a write path via the
"answers" ingestion lane in services.json (see the "lanes" object) which lets
the node mutate ClickHouse even when only clickhouse.allow_execute is used;
either remove the "answers" lane from the default manifest or add explicit
write/disposal guards (e.g., "allow_write" and "allow_drop" flags) and wire
those flags into the node's ingestion/mutation path (the code paths that handle
answers ingestion and auto-create/insert behaviour referenced in the same
service and the related entries at lines 158-179) so that write/drop operations
are blocked unless the new toggles are explicitly enabled. Ensure the chosen
approach is applied consistently to the other affected service entries mentioned
(158-179).

In `@nodes/test/db_clickhouse/test_clickhouse_dsn.py`:
- Around line 98-170: Tests are missing regression coverage for three failure
modes: explicit nulls for stripped fields, IPv6 host literals with TLS, and
out-of-range max_attempts; update the test suite to add cases exercising
g._connection_params when fields like user/password/database are None (ensure
they result in empty/stripped values rather than errors), add TLS parsing tests
for IPv6 hosts (e.g. host like "[::1]" and "[::1]:9000") to verify
_build_connection_url preserves brackets and port and appends ?secure=true when
tls is truthy, and add bounds tests for g._max_validation_attempts to assert
that values below the minimum and above the maximum (and None/non-int) fall back
to the allowed range (and are cast to int where appropriate) so implementation
regressions are caught.

---

Duplicate comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 67-85: The DSN builder in _build_connection_url only URL-encodes
params['password'], leaving params['user'] and params['database'] able to break
the SQLAlchemy URL; URL-encode both the username and database (e.g., via
urllib.parse.quote_plus) before interpolating them into the returned
clickhouse+native:// string while preserving existing host/port and ?secure=true
behavior (i.e., replace direct uses of params["user"] and params["database"]
with their quoted equivalents alongside the already-quoted password).

In `@nodes/src/nodes/db_clickhouse/services.json`:
- Line 50: The current "description" string in services.json describes an
insert-only ClickHouse sink; update the "description" value to accurately
reflect this node's primary role as a question-to-SQL querying tool with tool
functions (e.g., SQL generation, query execution, and optional insertion),
mentioning that it translates natural-language questions into SQL, runs queries
against ClickHouse, returns results, and can optionally map fields for inserts;
locate and edit the "description" array entry in
nodes/src/nodes/db_clickhouse/services.json to replace the insert-focused copy
with the new concise summary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c5e53d54-453f-4140-acf4-1fcdae2bd12f

📥 Commits

Reviewing files that changed from the base of the PR and between 9153a33 and 4186f3a.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_clickhouse/clickhouse.svg is excluded by !**/*.svg
📒 Files selected for processing (8)
  • nodes/src/nodes/db_clickhouse/IGlobal.py
  • nodes/src/nodes/db_clickhouse/IInstance.py
  • nodes/src/nodes/db_clickhouse/README.md
  • nodes/src/nodes/db_clickhouse/__init__.py
  • nodes/src/nodes/db_clickhouse/requirements.txt
  • nodes/src/nodes/db_clickhouse/services.json
  • nodes/test/db_clickhouse/__init__.py
  • nodes/test/db_clickhouse/test_clickhouse_dsn.py

Comment thread nodes/src/nodes/db_clickhouse/IGlobal.py Outdated
Comment thread nodes/src/nodes/db_clickhouse/IGlobal.py
Comment thread nodes/src/nodes/db_clickhouse/IGlobal.py
Comment thread nodes/src/nodes/db_clickhouse/services.json
Comment thread nodes/test/db_clickhouse/test_clickhouse_dsn.py
Copy link
Copy Markdown
Collaborator

@asclearuc asclearuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mithileshgau Thanks for this — the architecture is clean and the DSN tests are nice and focused.

Value

A first-class ClickHouse node is genuinely useful, and the approach is right:
the heavy logic already lives in ai.common.database, so this PR only adds the
~120 lines of real ClickHouse code (DSN + TLS). Good.

ClickHouse specifics — one blocker

The README and PR description advertise the answers insert lane and
"auto-creates the target table on first insert." That inherited path does not
work on ClickHouse, because the shared _createTableFromData builds the table
with an auto-increment integer primary key and no table engine — neither exists
in ClickHouse.

This is complementary to CodeRabbit's note about allow_execute not gating
the write lane: CodeRabbit asks whether writes should be open; this is that
they do not work. The cleanest fix probably satisfies both at once.

Verdict

Request changes. The node's read/query/tool path looks solid, but the
insert/auto-create path is advertised and broken on ClickHouse. Either disable
it for ClickHouse and update the docs, or implement ClickHouse-correct table
creation. Plus the CodeRabbit items above.

…ported insert lane

Addresses review feedback on the ClickHouse node PR.

DSN safety (db_clickhouse + db_mysql + db_postgres, for parity):
- URL-encode user and database (not just password) in _build_connection_url
  so reserved characters can't break the SQLAlchemy URL.

db_clickhouse hardening:
- _connection_params: coerce explicit null host/user/database/table to their
  defaults before .strip() (was AttributeError on a stored null).
- _build_connection_url: bracket-aware port detection for IPv6 literals so a
  TLS host like [::1] correctly gets :9440 and isn't mis-parsed.
- _max_validation_attempts: clamp to the documented 1..20 range.

Remove the broken / ungated insert path:
- Drop the `answers` ingestion lane from the manifest. The inherited
  auto-create-table helper uses an auto-increment integer PK and no table
  engine — neither exists in ClickHouse — so the advertised insert/auto-create
  path could not work, and it also bypassed allow_execute. Removing the lane
  resolves both the "broken on ClickHouse" and "ungated write" review notes.
- Rewrite the node description (query/agent-tool focus, not insert-only) and
  update the README (remove insert lane + auto-create claims, add an Ingestion
  section explaining the deliberate exclusion).

Tests: add coverage for null coercion, IPv6+TLS, user/database encoding, and
max_attempts clamping (32 DSN unit tests pass).

Refs rocketride-org#1051
@mithileshgau
Copy link
Copy Markdown
Collaborator Author

@mithileshgau Thanks for this — the architecture is clean and the DSN tests are nice and focused.

Value

A first-class ClickHouse node is genuinely useful, and the approach is right: the heavy logic already lives in ai.common.database, so this PR only adds the ~120 lines of real ClickHouse code (DSN + TLS). Good.

ClickHouse specifics — one blocker

The README and PR description advertise the answers insert lane and "auto-creates the target table on first insert." That inherited path does not work on ClickHouse, because the shared _createTableFromData builds the table with an auto-increment integer primary key and no table engine — neither exists in ClickHouse.

This is complementary to CodeRabbit's note about allow_execute not gating the write lane: CodeRabbit asks whether writes should be open; this is that they do not work. The cleanest fix probably satisfies both at once.

Verdict

Request changes. The node's read/query/tool path looks solid, but the insert/auto-create path is advertised and broken on ClickHouse. Either disable it for ClickHouse and update the docs, or implement ClickHouse-correct table creation. Plus the CodeRabbit items above.

Thanks @alexandru — great catch on the insert path, that was the right call. Pushed 5f3431e addressing everything.

The blocker — insert/auto-create broken on ClickHouse

You're right that the inherited _createTableFromData (auto-increment int PK + no table engine) can't work on ClickHouse. Rather than gate a path that fundamentally doesn't work here, I took the "disable it and update the docs" option:

  • Removed the answers ingestion lane from the manifest → the node is now query / agent-tool only.
  • This also resolves CodeRabbit's point about the write lane not being gated by allow_execute — there's no longer an ungated write path at all.
  • Updated the README (dropped the insert lane + "auto-creates table" claims, added an Ingestion section explaining the exclusion) and rewrote the node description to reflect the query/tool role.
  • A ClickHouse-correct (MergeTree) ingestion path can follow as its own feature if we want it.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 108-109: The _db_description method can return None when config
contains "db_description": null which violates its -> str contract; update
_db_description (the method named _db_description) to coerce null to an empty
string the same way other nullable config fields are handled: read
config['db_description'], if it is None or not a string return '' (or str(value)
if you prefer) and always return a str so callers relying on a string never get
None.

In `@nodes/src/nodes/db_clickhouse/README.md`:
- Around line 69-71: The README's Ingestion section is ambiguous about the
"answers" lane; update the text to explicitly state that the node does not
expose the ingestion/input "answers" lane (used for pipeline inserts) but still
supports the output "questions -> answers" query path; specifically edit the
sentence referencing "this node intentionally does not expose a pipeline
ingestion (`answers`) lane" to add a clarifying clause like "this only removes
the ingestion/input `answers` lane (used for inserts), not the `questions ->
answers` output lane used for querying" so readers understand there is no
contradiction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4d0ed11b-7612-4ce7-99d3-e918e941c6cc

📥 Commits

Reviewing files that changed from the base of the PR and between 4186f3a and 5f3431e.

📒 Files selected for processing (6)
  • nodes/src/nodes/db_clickhouse/IGlobal.py
  • nodes/src/nodes/db_clickhouse/README.md
  • nodes/src/nodes/db_clickhouse/services.json
  • nodes/src/nodes/db_mysql/IGlobal.py
  • nodes/src/nodes/db_postgres/IGlobal.py
  • nodes/test/db_clickhouse/test_clickhouse_dsn.py

Comment thread nodes/src/nodes/db_clickhouse/IGlobal.py Outdated
Comment thread nodes/src/nodes/db_clickhouse/README.md Outdated
…estion note

- _db_description: a stored `"db_description": null` returned None, violating
  the -> str contract. Coerce null / non-string to '' (add regression tests).
- README Ingestion: clarify that removing the `answers` lane drops only the
  ingestion/input lane (pipeline inserts), not the `questions -> answers`
  output lane used for querying.

Refs rocketride-org#1051
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nodes/src/nodes/db_clickhouse/IGlobal.py (1)

60-64: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Whitespace-only connection fields bypass the intended defaults.

Lines 60-64 only fall back before calling .strip(), so inputs like ' ' become '' rather than 'localhost' / 'default' / 'table'. That produces invalid connection URLs from otherwise recoverable config and contradicts the normalization comment above this block. Strip first, then apply the default in one helper so whitespace-only and non-string imported values are handled consistently.

Proposed fix
 class IGlobal(DatabaseGlobalBase):
@@
+    `@staticmethod`
+    def _string_param(config: Dict[str, Any], key: str, default: str) -> str:
+        value = config.get(key)
+        if value is None:
+            return default
+        return str(value).strip() or default
+
     def _connection_params(self, config: Dict[str, Any]) -> Dict[str, str]:
@@
         return {
-            'host': (config.get('host') or 'localhost').strip(),
-            'user': (config.get('user') or 'default').strip(),
+            'host': self._string_param(config, 'host', 'localhost'),
+            'user': self._string_param(config, 'user', 'default'),
             'password': config.get('password') or '',  # Do not strip — whitespace is valid in passwords
-            'database': (config.get('database') or 'default').strip(),
-            'table': (config.get('table') or 'table').strip(),
+            'database': self._string_param(config, 'database', 'default'),
+            'table': self._string_param(config, 'table', 'table'),
             'tls': 'true' if tls else '',
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_clickhouse/IGlobal.py` around lines 60 - 64, Create a
small normalization helper in IGlobal.py (e.g., normalize_field(value, default))
that coerces non-None values to str, calls .strip(), and returns the default
when the stripped result is empty; then replace the current inline expressions
for 'host', 'user', 'database', and 'table' to use
normalize_field(config.get('host'), 'localhost') /
normalize_field(config.get('user'), 'default') /
normalize_field(config.get('database'), 'default') /
normalize_field(config.get('table'), 'table'); leave 'password' as
(config.get('password') or '') without stripping (per the existing comment) so
whitespace-only passwords are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@nodes/src/nodes/db_clickhouse/IGlobal.py`:
- Around line 60-64: Create a small normalization helper in IGlobal.py (e.g.,
normalize_field(value, default)) that coerces non-None values to str, calls
.strip(), and returns the default when the stripped result is empty; then
replace the current inline expressions for 'host', 'user', 'database', and
'table' to use normalize_field(config.get('host'), 'localhost') /
normalize_field(config.get('user'), 'default') /
normalize_field(config.get('database'), 'default') /
normalize_field(config.get('table'), 'table'); leave 'password' as
(config.get('password') or '') without stripping (per the existing comment) so
whitespace-only passwords are preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f8089b0-3107-4d36-96bd-4ac243fdc7d4

📥 Commits

Reviewing files that changed from the base of the PR and between 5f3431e and 7cc4b8c.

📒 Files selected for processing (3)
  • nodes/src/nodes/db_clickhouse/IGlobal.py
  • nodes/src/nodes/db_clickhouse/README.md
  • nodes/test/db_clickhouse/test_clickhouse_dsn.py

- Extract a _normalize_field(value, default) helper used for host/user/
  database/table: coerces non-None values via str() before .strip() (so a
  non-string config value can't raise AttributeError) and falls back to the
  default when the result is empty/whitespace-only. Password stays uncoerced
  (whitespace preserved). Adds regression coverage.
- Add docstrings to all overridden methods across the three DB nodes
  (db_clickhouse / db_mysql / db_postgres) to satisfy the docstring-coverage
  threshold (touched files now 100%).

Refs rocketride-org#1051
@mithileshgau mithileshgau requested a review from asclearuc June 2, 2026 18:41
@asclearuc asclearuc merged commit d34d7fc into rocketride-org:develop Jun 2, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(node): add ClickHouse node

2 participants