Skip to content

Commit 4b9e16e

Browse files
authored
[SEA-NodeJS] Kernel backend: connection/statement options, TLS & configurable sync/async execution (#416)
* [SEA-NodeJS] SEA connection & statement options Wire the SEA connection-level and per-statement option surfaces onto the merged-kernel napi binding (thin forwarding — the kernel owns the behaviour): Connection options (SeaAuth.buildSeaConnectionOptions): - `maxConnections` → kernel pool size, validated as a positive integer within the napi u32 range. - TLS: `checkServerCertificate` (secure-by-default — omit to keep the kernel's verify-on default; `false` opts into insecure) and `customCaCert` (PEM string or Buffer; strings are PEM-sanity-checked and normalised to a Buffer before the FFI boundary), via the new `buildSeaTlsOptions`. - `intervalsAsString: true` is always set so SEA interval/duration columns render as strings — a byte-compatible drop-in for the Thrift backend. `complexTypesAsJson` is intentionally left at the kernel default (native Arrow), which already decodes identically to Thrift via the shared converter. Statement options (SeaSessionBackend.executeStatement, via buildExecuteOptions): - `queryTimeout` → `queryTimeoutSecs`; `rowLimit` → `rowLimit` (SEA-only cap). - `queryTags` serialised JS-side (reusing Thrift's `serializeQueryTags`) into the reserved `query_tags` conf key, merged with any explicit `statementConf` — the napi `queryTags` field can't carry null-valued tags, and the kernel rejects setting both. `queryTags` / `queryTimeout` are no longer rejected. - Still rejected (genuinely unsupported on SEA): `useCloudFetch`, `useLZ4Compression`, `stagingAllowedLocalPath`. `rowLimit` / `statementConf` added to the public `ExecuteStatementOptions`; SEA-only knobs (`maxConnections` / `checkServerCertificate` / `customCaCert`) added to the internal `InternalConnectionOptions`. Validated against a live warehouse: secure-by-default connect, maxConnections, checkServerCertificate, rowLimit (caps rows), queryTimeout, queryTags, statementConf, and non-PEM customCaCert rejection. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] SEA async execute (submit / poll / awaitResult) Switch the SEA query path from the blocking `executeStatement` to the kernel's async `submitStatement`, matching the Thrift backend's always-async (`runAsync: true`) model. `submitStatement` returns immediately with a pending `AsyncStatement` (kernel `wait_timeout=0s`) while the query runs server-side. SeaOperationBackend becomes dual-mode (exactly one of): - `asyncStatement` (query path): `waitUntilReady()` polls `status()` to a terminal state on a 100ms cadence (matching Thrift), firing the progress callback each tick. Polling `status()` rather than blocking on `awaitResult()` keeps `cancel()` responsive — a blocking awaitResult would hold the kernel statement mutex for the whole query and queue cancel behind it. On Succeeded it materialises the result handle (first fetch is free); on Failed it drives `awaitResult()` to surface the kernel's typed SQL-error envelope; on a server-side Cancelled/Closed/Unknown it throws a clear error. `status()` reports the real Pending/Running/Succeeded state. - `statement` (metadata path): the kernel `list*`/`get*` statement is already terminal, so `waitUntilReady()` stays the one-shot completion tick. The fetch pipeline is shared: `awaitResult()`'s `AsyncResultHandle` and the metadata `Statement` expose the same `fetchNextBatch()` / `schema()` surface, so `SeaResultsProvider` → `ArrowResultConverter` → `ResultSlicer` consume either interchangeably via a single memoised fetch handle. cancel()/close() route through a `lifecycleHandle` abstraction over whichever handle backs the op. Re-exports the kernel `AsyncStatement` / `AsyncResultHandle` types from `SeaNativeLoader`. Validated against a live warehouse: async fetchAll correctness, multi-row drain (5000 rows), long-running aggregate (count over 20M), kernel SQL-error surfacing, and cancellation mid-execution. PR1's params/metadata/getInfo all still pass through the new async path. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] prettier-format SEA connection/options + async files (CI code-style) The CI "Check code style" step runs `prettier . --check` (whole repo); these files were committed without prettier formatting. Formatting-only. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address code-review findings on async/TLS/options (#413) Code-review #413 (81/100). Validated each against the code + a live warehouse: - F1 (HIGH): the async poll loop threw plain HiveDriverError for server-driven Cancelled/Closed/Unknown. The DBSQLOperation facade only mirrors its cancelled/closed flags when `err instanceof OperationStateError` (and OperationStateError extends HiveDriverError, not the reverse), so a server-side cancel/close/admin-kill left the facade desynced. Now throws OperationStateError(Canceled/Closed/Unknown) — matching the Thrift backend. The Failed branch still surfaces the kernel SQL-error envelope via awaitResult. - F2 (MED): the server-Cancelled test asserted only instanceOf(HiveDriverError), which passes for both the correct and incorrect type — it couldn't catch F1. Now asserts instanceOf(OperationStateError) + errorCode, plus a new Closed test. - F3 (MED): queryTimeout was forwarded to submitStatement but the kernel ignores queryTimeoutSecs on submit (always wait_timeout=0s), so the documented public option was a silent no-op, and the poll loop had no client-side deadline (a stalled Running statement polled forever). Now enforced client-side: the poll loop tracks a deadline, best-effort cancels the statement on expiry, and throws OperationStateError(Timeout) — matching Thrift's server TIMEDOUT outcome. Stopped forwarding the ignored queryTimeoutSecs to the napi options. Validated live: a 2s timeout interrupts a slow cross-join with TIMEOUT. - F4 (LOW): customCaCert PEM string check now requires the END marker too (a truncated/headerless cert no longer passes), consistent with the Buffer path. - F5 (LOW): SeaAuth reads the SEA-only fields (checkServerCertificate / customCaCert / maxConnections) through `InternalConnectionOptions` instead of ad-hoc inline casts, so a typo'd key fails to compile. - F6 (LOW): corrected the poll-loop comment — the prior text justified polling by an incorrect "blocking awaitResult holds the mutex and queues cancel" claim; cancel() is documented lock-free. The real rationale (real-time status to the progress callback + cancel observed between ticks) is now stated. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] add TIMESTAMP_NTZ / TIMESTAMP_LTZ bound-param types Consolidates the last net-new bit of the superseded #408: two SEA-path DBSQLParameterType variants for binding timezone-explicit timestamps. The type name flows through the existing param codec (toSparkParameter → sqlType), which the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address #413 review: TIMESTAMP_LTZ wire type + Int64 queryTimeout coercion Code-review #413 (gopalldb). Two P1s: - TIMESTAMP_LTZ was sent verbatim on the wire, but Spark has no distinct TIMESTAMP_LTZ type (TIMESTAMP already carries LTZ semantics) — so a Thrift caller got an opaque server bind error, and the enum comment falsely claimed NTZ/LTZ "degrade to a plain TIMESTAMP bind" (there was no such logic). `toSparkParameter` now maps TIMESTAMP_LTZ → `TIMESTAMP` (valid on both Thrift and kernel); TIMESTAMP_NTZ stays native (a real Spark type). Comment corrected. Added DBSQLParameter tests for both wire types (the Thrift behaviour the review flagged as untested) and updated the kernel positional-params test. - queryTimeout (`number | bigint | Int64`) was coerced with `Number(...)`, which yields NaN for an Int64 (node-int64 has no valueOf) → the client-side deadline was silently disabled for Int64 inputs. Now uses `numberToInt64(...).toNumber()`, matching the Thrift backend. Added a regression test that an `Int64(1)` queryTimeout actually fires the deadline (OperationStateError(Timeout)) rather than polling forever. (P1 "queryTimeout silently dropped on submit" and the unbounded-poll note were already resolved earlier by the client-side deadline fix; doc comment updated to match. P2 polarity/Date-NTZ items noted for the public-surface follow-up.) Validated live: NTZ binds natively and LTZ binds as TIMESTAMP on the kernel path. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] configurable sync/async execute via runAsync (default sync) Make the SEA execution path user-configurable between sync and async, toggled by the EXISTING `runAsync` option — no new public field, exactly mirroring the Thrift backend's `runAsync` distinction. Default is SYNC (`runAsync: false`): faster, and with the kernel sync canceller fully cancellable mid-compute. SeaSessionBackend.executeStatement now branches on `options.runAsync`: - false/undefined (DEFAULT) -> Connection.executeStatementCancellable: the kernel blocks on execute() (poll-to-terminal server-side), driven lazily in the operation backend's result(). `queryTimeoutSecs` IS forwarded (the kernel execute() honours it). - true -> Connection.submitStatement (submit + poll), unchanged. `queryTimeoutSecs` stays client-side (kernel ignores it on submit). SeaOperationBackend gains a third dual-mode handle kind, `cancellableExecution`, alongside `asyncStatement` and `statement`: - waitUntilReadyCancellable drives result() to the terminal Statement (memoised as the fetch handle + close target); - the lifecycle handle is a composite: cancel() routes to the detached canceller (lock-free, interrupts a running result() mid-COMPUTE and is a no-op once terminal); close() routes to the resolved statement; - a cancel-induced result() rejection maps to OperationStateError( Canceled) so the DBSQLOperation facade mirrors its cancelled flag, matching the Thrift path. Public API, result shape, schema (TTableSchema), and error classes are identical across both modes and to Thrift — the only observable difference is lifecycle timing (when executeStatement resolves). Built against databricks-sql-kernel napi 639e19ef97decc1c5aa2365c0b3a229c1ccd5b58 (executeStatementCancellable / CancellableExecution); KERNEL_REV bumped to match. Refreshed the committed binding surface (native/sea/index.{js,d.ts}). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] surface server statement_id as op.id on the sync path On the sync (cancellable) execute path the operation id was a client UUID, because the server statement_id isn't known at construction — the kernel publishes it mid-`result()` once the initial execute round-trip returns. That left a cancelled/closed sync op untraceable against server/kernel logs (the async path already had the id from `submit`). `id` now prefers the server statement_id once known (captured from the resolved `Statement`, then the live canceller slot), falling back to the construction-time UUID until then. Updated the fake to model the real null-until-resolved `statementId` and assert op.id flips from UUID → server id after the execute completes. Validated live: SELECT 1 op.id is a UUID before fetch and the real `01f1…` statement id after. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address #416 review: stable op.id (telemetry), runAsync contract, doc + signal/coverage gaps Review findings (databricks-sql-nodejs#416): - F2 (HIGH): revert the sync-path op.id mutation. `op.id` flipped from a client UUID to the server statement_id after `result()` resolved, but the facade keys telemetry start/complete on it (DBSQLOperation → MetricsAggregator), so the flip split the records across two keys and dropped the summary. `id` is now stable for the operation's lifetime; the resolved server statement_id is surfaced via a debug log for server/kernel correlation instead. Test updated: asserts id is stable AND the server id is logged. - F1 (HIGH): `runAsync` is the SEA sync/async toggle but was JSDoc-@deprecated, and a comment falsely claimed it "mirrors Thrift's runAsync distinction" (Thrift hardcodes runAsync:true and never reads the option). Replaced the @deprecated tag with the cross-backend contract (Thrift: no-op; kernel: selects sync-default vs async) and corrected the in-code comment. - Doc: SeaSessionBackend class comment still said metadata "defers to M1 — throws"; metadata is fully implemented. Rewritten to list the implemented surface. - F3 (MED): ThriftSessionBackend now debug-logs when rowLimit / statementConf (kernel-only options) are set on the Thrift path, instead of dropping silently. - F4 (MED): added the missing coverage using the previously-dead fakes — sync-path Failed/SQL-error envelope (`resultError`), submit-time error mapping on both paths (`throwOnExecute`), and queryTags-vs-statementConf.query_tags collision precedence. - F5 (MED): the query-timeout best-effort cancel now warn-logs a failed cancel (mirrors the fetch-error cleanup) so a still-running server statement is diagnosable. - F10 (LOW): hoisted `Object.values(OperationState)` to a module const off the 100ms async poll loop. Validated: tsc/eslint/prettier clean; 243 SEA / 1162 full unit tests pass; live smoke confirms op.id is stable across fetch on both paths and both return correct data. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address #416 P1 review round 2: cancel-intent, statement-leak, PEM order, TLS warn, Thrift signal databricks-sql-nodejs#416 (gopalldb P1): - P1.1: `seaCancel` no longer rolls `isCancelled` back to false when the kernel cancel RPC fails. The caller asked to cancel, so the op stays cancelled (subsequent fetches fail fast, poll-loop observers stay consistent); the RPC failure is still surfaced via the rethrow. Rolling back silently resurrected a cancelled op while the server statement might still run. Test asserts the flag stays set on failure. - P1.5: the async poll loop now best-effort `close()`s the kernel statement on every server-driven terminal error (Failed / Cancelled / Closed / Unknown / Timeout) before throwing — previously it leaked the statement handle until session close (only `fetchChunk` cleaned up). Warn-logs a close failure. - P1.3: `customCaCert` PEM check is now an ordered regex (`BEGIN…END` block) instead of two independent substring checks, so a blob containing both markers out of order (e.g. a proxy-intercept page) is rejected. - P1.4: warn when `customCaCert` is set together with `checkServerCertificate: false` — verification is fully off so the custom CA is unused; the combo is still honoured but no longer silently masks it. - P1.6: the Thrift `rowLimit`/`statementConf` ignored-option signal is now a WARN (was debug) — these materially change results (e.g. `rowLimit` not capping), so a caller on the Thrift path gets a visible warning. P1.2 (race in-flight `status()` against a cancel signal) deferred: bounded by the HTTP transport timeout today; the proper fix needs a cancel-signal promise + napi `AbortSignal`, which the binding doesn't yet expose — tracked as a follow-up. Validated: tsc/eslint/prettier clean; 243 SEA / 1162 full unit tests pass; live smoke confirms both modes execute correctly. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] tests: ordered PEM markers + poll-loop terminal close (P1.3/P1.5 coverage) Locks the round-2 fixes the review flagged as untested: - PEM customCaCert rejects out-of-order / BEGIN-only / END-only blobs (ordered regex, not two independent substring checks). - The async poll loop best-effort close()s the kernel statement on every server-driven terminal error (Cancelled/Closed/Unknown), proving no handle leak. All round-2 behaviours additionally verified end-to-end against a live warehouse (insecure-combo warn, async-Failed SQL error + close, Thrift rowLimit warn). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address #416 round-3 review: KERNEL_REV bump, binding-shape check, sync close cancels databricks-sql-nodejs#416 (gopalldb round 3): - P0 (KERNEL_REV): bumped from 639e19e → 9a50904 (current kernel PR #122 head) and rebuilt the committed binding. This pulls in fix(reader) #115 (which the old pin was missing) and the napi cancel-status normalisation. The pin still references the (unmerged) #122 branch — that's inherent to the stacked PR; #416 must merge AFTER #122, with a final KERNEL_REV bump to the merged SHA. - P1 (binding shape): assertBindingShape now also validates AsyncStatement, AsyncResultHandle, and CancellableExecution. A stale cached .node missing these now fails fast at load instead of crashing mid-query at `submitStatement` / `executeStatementCancellable`. Added a negative test (and the loader stub now includes the new classes). - P2 (sync close leak): close() on a still-running sync op now proactively cancels the cancellable execution (server stops computing immediately) instead of no-oping and relying on the kernel drop-guard firing at JS GC. Added a test. (Stale-review note: round-1/2 findings — PEM ordering, TLS insecure-combo warn, seaCancel rollback, poll-loop close, Thrift signal — were already addressed in 0422f27 + 9d13f11 and are on HEAD; the round-3 comment reviewed an earlier SHA.) Validated: tsc/eslint/prettier clean; 247 SEA / 1166 full unit tests pass; live smoke on the 9a50904 binding confirms both modes execute correctly. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] bump KERNEL_REV to dcbbf57 (kernel #122 cancel-normalisation fix) Picks up the kernel-side fix for the first-window cancel mis-attribution (genuine InvalidArgument no longer mislabelled Cancelled) + the terminal-flag gating, so this PR's pinned binding doesn't ship the pre-fix behaviour. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] bump KERNEL_REV to 754f162 (race-free cancel-dispatch signal) Picks up the kernel fix that makes StatementCanceller::cancel() report whether it dispatched, closing the TOCTOU in the cancel→Cancelled normalisation that the prior pinned binding (dcbbf57) still carried. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] bump KERNEL_REV to merged #122 (8bedaab) Pin to the squashed merge commit of databricks-sql-kernel#122 (feat(napi): expose sync StatementCanceller via executeStatementCancellable), now on kernel main. Supersedes the pre-merge branch-tip pin (754f162). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent 94d63bd commit 4b9e16e

22 files changed

Lines changed: 1616 additions & 122 deletions

KERNEL_REV

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
8bedaabf69f5bce5a957a8775f29dbb8dbdd2e71

lib/DBSQLParameter.ts

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,15 @@ export enum DBSQLParameterType {
88
STRING = 'STRING',
99
DATE = 'DATE',
1010
TIMESTAMP = 'TIMESTAMP',
11+
// `TIMESTAMP_NTZ` binds a timezone-free (wall-clock) timestamp. It is a real
12+
// Spark type, bound natively on both the Thrift and kernel backends (requires
13+
// a server that supports TIMESTAMP_NTZ; Spark 3.4+ / recent DBR).
14+
TIMESTAMP_NTZ = 'TIMESTAMP_NTZ',
15+
// `TIMESTAMP_LTZ` is an alias for `TIMESTAMP`: Spark has no distinct
16+
// TIMESTAMP_LTZ type — `TIMESTAMP` already carries local/instant (LTZ)
17+
// semantics. `toSparkParameter` therefore binds it as `TIMESTAMP` on the wire
18+
// (valid on both backends); it exists only as a self-documenting alias.
19+
TIMESTAMP_LTZ = 'TIMESTAMP_LTZ',
1120
FLOAT = 'FLOAT',
1221
DECIMAL = 'DECIMAL',
1322
DOUBLE = 'DOUBLE',
@@ -50,10 +59,16 @@ export class DBSQLParameter {
5059
return new TSparkParameter({ name }); // for NULL neither `type` nor `value` should be set
5160
}
5261

62+
// Map timezone-explicit timestamp aliases to their Spark wire type. Spark
63+
// has no distinct TIMESTAMP_LTZ type (TIMESTAMP carries LTZ semantics), so
64+
// bind it as TIMESTAMP — valid on both the Thrift and kernel backends.
65+
// TIMESTAMP_NTZ is a real Spark type and is bound natively.
66+
const wireType = this.type === DBSQLParameterType.TIMESTAMP_LTZ ? DBSQLParameterType.TIMESTAMP : this.type;
67+
5368
if (typeof this.value === 'boolean') {
5469
return new TSparkParameter({
5570
name,
56-
type: this.type ?? DBSQLParameterType.BOOLEAN,
71+
type: wireType ?? DBSQLParameterType.BOOLEAN,
5772
value: new TSparkParameterValue({
5873
stringValue: this.value ? 'TRUE' : 'FALSE',
5974
}),
@@ -63,7 +78,7 @@ export class DBSQLParameter {
6378
if (typeof this.value === 'number') {
6479
return new TSparkParameter({
6580
name,
66-
type: this.type ?? (Number.isInteger(this.value) ? DBSQLParameterType.INTEGER : DBSQLParameterType.DOUBLE),
81+
type: wireType ?? (Number.isInteger(this.value) ? DBSQLParameterType.INTEGER : DBSQLParameterType.DOUBLE),
6782
value: new TSparkParameterValue({
6883
stringValue: Number(this.value).toString(),
6984
}),
@@ -73,7 +88,7 @@ export class DBSQLParameter {
7388
if (this.value instanceof Int64 || typeof this.value === 'bigint') {
7489
return new TSparkParameter({
7590
name,
76-
type: this.type ?? DBSQLParameterType.BIGINT,
91+
type: wireType ?? DBSQLParameterType.BIGINT,
7792
value: new TSparkParameterValue({
7893
stringValue: this.value.toString(),
7994
}),
@@ -83,7 +98,7 @@ export class DBSQLParameter {
8398
if (this.value instanceof Date) {
8499
return new TSparkParameter({
85100
name,
86-
type: this.type ?? DBSQLParameterType.TIMESTAMP,
101+
type: wireType ?? DBSQLParameterType.TIMESTAMP,
87102
value: new TSparkParameterValue({
88103
stringValue: this.value.toISOString(),
89104
}),
@@ -92,7 +107,7 @@ export class DBSQLParameter {
92107

93108
return new TSparkParameter({
94109
name,
95-
type: this.type ?? DBSQLParameterType.STRING,
110+
type: wireType ?? DBSQLParameterType.STRING,
96111
value: new TSparkParameterValue({
97112
stringValue: this.value,
98113
}),

lib/contracts/IDBSQLSession.ts

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,18 @@ export type ExecuteStatementOptions = {
1212
*/
1313
queryTimeout?: number | bigint | Int64;
1414
/**
15-
* @deprecated This option is no longer supported and will be removed in future releases
15+
* Selects the execution lifecycle. The only observable effect is WHEN
16+
* `executeStatement` resolves; the result data, schema, and error classes are
17+
* identical regardless.
18+
*
19+
* - **Thrift backend:** no-op. The Thrift path always submits asynchronously
20+
* (`runAsync: true` on the wire) and polls during fetch; this option is not
21+
* read.
22+
* - **Kernel backend (`useSEA`):** selects the kernel execution path —
23+
* `false`/unset (default) runs the blocking direct-results path (faster,
24+
* cancellable mid-compute); `true` submits and polls (returns a pending
25+
* handle before completion). Default is sync, matching the python
26+
* connector's `cursor.execute()`.
1627
*/
1728
runAsync?: boolean;
1829
maxRows?: number | bigint | Int64 | null;
@@ -27,6 +38,18 @@ export type ExecuteStatementOptions = {
2738
* These tags apply only to this statement and do not persist across queries.
2839
*/
2940
queryTags?: Record<string, string | null | undefined>;
41+
/**
42+
* SEA-only: server-side row cap for this statement (kernel `row_limit`). The
43+
* Thrift backend has no execute-time server cap, so this is a no-op there;
44+
* use `maxRows` for the cross-backend client-side fetch limit.
45+
*/
46+
rowLimit?: number;
47+
/**
48+
* SEA-only: per-statement Spark conf overlay (kernel `statement_conf`).
49+
* Merged with the serialized `queryTags` (which land under the reserved
50+
* `query_tags` key). Ignored by the Thrift backend.
51+
*/
52+
statementConf?: Record<string, string>;
3053
};
3154

3255
export type TypeInfoRequest = {

lib/contracts/InternalConnectionOptions.ts

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,27 @@ export interface InternalConnectionOptions {
1818
* @internal Not stable; M0 stub only.
1919
*/
2020
useSEA?: boolean;
21+
22+
/**
23+
* SEA-only: kernel connection-pool size (`ConnectionOptions.max_connections`).
24+
* Validated as a positive integer within the napi `u32` range.
25+
* @internal SEA path only.
26+
*/
27+
maxConnections?: number;
28+
29+
/**
30+
* SEA-only: verify the server's TLS certificate. Secure-by-default — omit
31+
* to keep full chain + hostname verification; set `false` only to opt into
32+
* the insecure accept-anything mode.
33+
* @internal SEA path only.
34+
*/
35+
checkServerCertificate?: boolean;
36+
37+
/**
38+
* SEA-only: PEM-encoded CA certificate (string or `Buffer`) added to the
39+
* trust store on top of the system roots — for TLS-inspecting proxies or
40+
* on-prem internal CAs. Honoured regardless of `checkServerCertificate`.
41+
* @internal SEA path only.
42+
*/
43+
customCaCert?: Buffer | string;
2144
}

lib/sea/SeaAuth.ts

Lines changed: 140 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
// limitations under the License.
1414

1515
import { ConnectionOptions } from '../contracts/IDBSQLClient';
16+
import { InternalConnectionOptions } from '../contracts/InternalConnectionOptions';
1617
import AuthenticationError from '../errors/AuthenticationError';
1718
import HiveDriverError from '../errors/HiveDriverError';
1819

@@ -66,9 +67,58 @@ export interface SeaSessionDefaults {
6667
catalog?: string;
6768
schema?: string;
6869
sessionConf?: Record<string, string>;
70+
/**
71+
* Render `INTERVAL` / `DURATION` result columns as strings
72+
* (kernel `ResultConfig.intervals_as_string`). The kernel default is
73+
* native Arrow `month_interval` / `duration[us]`, but the NodeJS
74+
* Thrift driver surfaces intervals as strings — so the SEA path sets
75+
* this `true` so its result shape is a byte-compatible drop-in for the
76+
* Thrift backend. Omitting it falls back to the kernel's native types.
77+
*/
78+
intervalsAsString?: boolean;
79+
/**
80+
* Render complex (`ARRAY` / `MAP` / `STRUCT` / `VARIANT`) result
81+
* columns as JSON strings (kernel `ResultConfig.complex_types_as_json`).
82+
* Left unset on the SEA path: native Arrow nested types already decode
83+
* identically to the Thrift backend through the shared Arrow converter,
84+
* so forcing JSON here would *introduce* a divergence rather than
85+
* remove one.
86+
*/
87+
complexTypesAsJson?: boolean;
88+
/**
89+
* Per-session kernel connection-pool size
90+
* (kernel `ConnectionOptions.max_connections`). Validated as a positive
91+
* integer within the napi `u32` range by `buildSeaConnectionOptions`.
92+
*/
93+
maxConnections?: number;
94+
}
95+
96+
/**
97+
* TLS options shared across all auth-mode variants. Mirror the napi
98+
* binding's `ConnectionOptions.checkServerCertificate` / `.customCaCert`
99+
* (kernel `Session::builder().tls(TlsConfig)`).
100+
*
101+
* The napi shape takes `customCaCert` as a `Buffer` only; the public
102+
* `ConnectionOptions` additionally accepts a PEM string, which
103+
* `buildSeaConnectionOptions` normalises to a `Buffer` before crossing
104+
* the FFI boundary.
105+
*/
106+
export interface SeaTlsOptions {
107+
/**
108+
* Verify the server's TLS certificate. The SEA backend is
109+
* **secure-by-default**: omitting this leaves the kernel default of
110+
* `true` (full chain + hostname verification). Set `false` only to opt
111+
* into the insecure, accept-anything mode (analogous to Thrift's
112+
* `rejectUnauthorized: false`); prefer pairing strict checking with
113+
* `customCaCert` over disabling verification entirely.
114+
*/
115+
checkServerCertificate?: boolean;
116+
/** PEM-encoded CA bytes to add to the trust store. */
117+
customCaCert?: Buffer;
69118
}
70119

71120
export type SeaNativeConnectionOptions = SeaSessionDefaults &
121+
SeaTlsOptions &
72122
(
73123
| {
74124
hostName: string;
@@ -114,6 +164,63 @@ export function isBlankOrReserved(s: string): boolean {
114164
return normalized.length === 0 || normalized === 'undefined' || normalized === 'null';
115165
}
116166

167+
/** napi-rs marshals `maxConnections` as a `u32`; reject values it can't hold. */
168+
const MAX_U32 = 0xffffffff;
169+
170+
/**
171+
* Normalise the public TLS options (`checkServerCertificate` /
172+
* `customCaCert`) into the napi shape.
173+
*
174+
* - `checkServerCertificate` passes through verbatim (only when set; an
175+
* absent value leaves the kernel default, which is secure — verify on).
176+
* - `customCaCert` accepts a PEM string or `Buffer` on the public
177+
* surface; we convert a string to a `Buffer` here and do a light PEM
178+
* sanity check. The bytes are NOT parsed in JS — the kernel returns a
179+
* meaningful error if the PEM is malformed.
180+
*
181+
* Throws `HiveDriverError` when `customCaCert` is supplied but empty or
182+
* (for strings) lacks a PEM certificate header.
183+
*/
184+
export function buildSeaTlsOptions(options: ConnectionOptions): SeaTlsOptions {
185+
// Read the SEA-only fields through the purpose-built internal options type
186+
// rather than an ad-hoc inline cast, so the shape can't silently drift from
187+
// its declaration and a typo'd key fails to compile.
188+
const { checkServerCertificate, customCaCert } = options as ConnectionOptions & InternalConnectionOptions;
189+
190+
const tls: SeaTlsOptions = {};
191+
192+
if (checkServerCertificate !== undefined) {
193+
tls.checkServerCertificate = checkServerCertificate;
194+
}
195+
196+
if (customCaCert !== undefined) {
197+
if (typeof customCaCert === 'string') {
198+
// Light PEM sanity check — require a well-ordered BEGIN…END block so a
199+
// truncated/headerless cert (or a stray page that merely contains both
200+
// literals out of order, e.g. a proxy-intercept page) is rejected here
201+
// rather than surfacing as an opaque kernel TLS error. Ordered match, not
202+
// two independent substring checks. Full parsing is deferred to the kernel.
203+
if (!/-----BEGIN CERTIFICATE-----[\s\S]+?-----END CERTIFICATE-----/.test(customCaCert)) {
204+
throw new HiveDriverError(
205+
'SEA backend: `customCaCert` string does not look like a PEM certificate ' +
206+
"(expected a '-----BEGIN CERTIFICATE-----' … '-----END CERTIFICATE-----' block). " +
207+
'Pass PEM text or a Buffer of PEM bytes.',
208+
);
209+
}
210+
tls.customCaCert = Buffer.from(customCaCert, 'utf8');
211+
} else if (Buffer.isBuffer(customCaCert)) {
212+
if (customCaCert.length === 0) {
213+
throw new HiveDriverError('SEA backend: `customCaCert` Buffer is empty.');
214+
}
215+
tls.customCaCert = customCaCert;
216+
} else {
217+
throw new HiveDriverError('SEA backend: `customCaCert` must be a PEM string or a Buffer.');
218+
}
219+
}
220+
221+
return tls;
222+
}
223+
117224
/**
118225
* Validate the user-supplied `ConnectionOptions` and build the
119226
* napi-binding's connection-options shape.
@@ -170,11 +277,43 @@ export function isBlankOrReserved(s: string): boolean {
170277
export function buildSeaConnectionOptions(options: ConnectionOptions): SeaNativeConnectionOptions {
171278
const { authType } = options as { authType?: string };
172279

173-
const base = {
280+
const base: {
281+
hostName: string;
282+
httpPath: string;
283+
intervalsAsString: boolean;
284+
maxConnections?: number;
285+
} & SeaTlsOptions = {
174286
hostName: options.host,
175287
httpPath: prependSlash(options.path),
288+
// Match the NodeJS Thrift driver, which surfaces INTERVAL columns as
289+
// strings. The kernel defaults to native Arrow interval/duration types;
290+
// forcing the string rendering here keeps the SEA path a byte-compatible
291+
// drop-in. Complex types are intentionally left at the kernel default
292+
// (native Arrow) — they already decode identically to Thrift via the
293+
// shared Arrow converter, so `complexTypesAsJson` is not forced on.
294+
intervalsAsString: true,
295+
// TLS knobs (server-cert verification toggle + custom CA). Validated and
296+
// normalised (string PEM → Buffer) here so the napi shape only sees a Buffer.
297+
...buildSeaTlsOptions(options),
176298
};
177299

300+
// SEA-only pool sizing; read via cast to match how this function reads the
301+
// other SEA-specific options (TLS) — they live on the internal options
302+
// surface, not the published public `ConnectionOptions` `.d.ts`.
303+
const { maxConnections } = options as ConnectionOptions & InternalConnectionOptions;
304+
if (maxConnections !== undefined) {
305+
if (!Number.isInteger(maxConnections) || maxConnections < 1) {
306+
throw new HiveDriverError(`SEA backend: \`maxConnections\` must be a positive integer; got ${maxConnections}.`);
307+
}
308+
if (maxConnections > MAX_U32) {
309+
throw new HiveDriverError(
310+
`SEA backend: \`maxConnections\` exceeds the napi u32 limit (${MAX_U32}); got ${maxConnections}. ` +
311+
'Typical pool sizes are 10-500.',
312+
);
313+
}
314+
base.maxConnections = maxConnections;
315+
}
316+
178317
const oauth = options as {
179318
oauthClientId?: string;
180319
oauthClientSecret?: string;

lib/sea/SeaBackend.ts

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ import IBackend from '../contracts/IBackend';
1616
import ISessionBackend from '../contracts/ISessionBackend';
1717
import IClientContext from '../contracts/IClientContext';
1818
import { ConnectionOptions, OpenSessionRequest } from '../contracts/IDBSQLClient';
19+
import { InternalConnectionOptions } from '../contracts/InternalConnectionOptions';
20+
import { LogLevel } from '../contracts/IDBSQLLogger';
1921
import HiveDriverError from '../errors/HiveDriverError';
2022
import { getSeaNative, SeaNativeBinding, SeaConnection } from './SeaNativeLoader';
2123
import { decodeNapiKernelError } from './SeaErrorMapping';
@@ -78,6 +80,22 @@ export default class SeaBackend implements IBackend {
7880
// Any non-PAT mode (or a missing/empty token) throws here, before
7981
// we ever touch the native binding.
8082
this.nativeOptions = buildSeaConnectionOptions(options);
83+
84+
// Warn on the insecure combo: a `customCaCert` paired with
85+
// `checkServerCertificate: false` is almost always a mistake — verification
86+
// is fully off, so the custom trust anchor is never used. The combo is
87+
// still honoured (kernel contract), but a secure-looking `customCaCert`
88+
// shouldn't silently mask disabled verification.
89+
const tlsOpts = options as ConnectionOptions & InternalConnectionOptions;
90+
if (tlsOpts.checkServerCertificate === false && tlsOpts.customCaCert !== undefined) {
91+
this.context
92+
.getLogger()
93+
.log(
94+
LogLevel.warn,
95+
'SEA: `customCaCert` is set but `checkServerCertificate: false` disables certificate ' +
96+
'verification entirely — the custom CA is not used. Set `checkServerCertificate: true` to use it.',
97+
);
98+
}
8199
}
82100

83101
public async openSession(request: OpenSessionRequest): Promise<ISessionBackend> {

lib/sea/SeaNativeLoader.ts

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ import type {
3636
ExecuteOptions as NativeExecuteOptions,
3737
TypedValueInput as NativeTypedValueInput,
3838
NamedTypedValueInput as NativeNamedTypedValueInput,
39+
AsyncStatement as NativeAsyncStatement,
40+
AsyncResultHandle as NativeAsyncResultHandle,
41+
CancellableExecution as NativeCancellableExecution,
3942
} from '../../native/sea';
4043

4144
// SEA-prefixed re-exports. The kernel-generated `.d.ts` keeps the
@@ -59,6 +62,22 @@ export type SeaNativeExecuteOptions = NativeExecuteOptions;
5962
export type SeaNativeTypedValueInput = NativeTypedValueInput;
6063
export type SeaNativeNamedTypedValueInput = NativeNamedTypedValueInput;
6164

65+
// Async-submit surface: `Connection.submitStatement` returns an
66+
// `AsyncStatement` (status / awaitResult / cancel / close); `awaitResult`
67+
// yields an `AsyncResultHandle` whose `fetchNextBatch` / `schema` match the
68+
// blocking `Statement`'s fetch surface, so the results pipeline consumes
69+
// either interchangeably.
70+
export type SeaNativeAsyncStatement = NativeAsyncStatement;
71+
export type SeaNativeAsyncResultHandle = NativeAsyncResultHandle;
72+
73+
// Cancellable sync-execute surface: `Connection.executeStatementCancellable`
74+
// returns a `CancellableExecution` that captures a detached StatementCanceller
75+
// BEFORE dispatching the blocking `execute()`, so a concurrent `cancel()`
76+
// interrupts a still-running query mid-compute. `result()` drives the blocking
77+
// execute and resolves to the same terminal `Statement` `executeStatement`
78+
// returns.
79+
export type SeaNativeCancellableExecution = NativeCancellableExecution;
80+
6281
/**
6382
* The full native binding surface, derived from the generated module
6483
* so it can never drift from the `.d.ts` contract: when the kernel
@@ -124,6 +143,13 @@ function assertBindingShape(binding: SeaNativeBinding): void {
124143
if (typeof binding.openSession !== 'function') missing.push('openSession');
125144
if (typeof binding.Connection !== 'function') missing.push('Connection');
126145
if (typeof binding.Statement !== 'function') missing.push('Statement');
146+
// Classes the async (submit/poll) and cancellable-sync execution paths depend
147+
// on. Checking them here turns a stale/older cached `.node` into a clear
148+
// load-time error instead of an `X is not a function` mid-query (e.g.
149+
// `Connection.submitStatement` / `executeStatementCancellable`).
150+
if (typeof binding.AsyncStatement !== 'function') missing.push('AsyncStatement');
151+
if (typeof binding.AsyncResultHandle !== 'function') missing.push('AsyncResultHandle');
152+
if (typeof binding.CancellableExecution !== 'function') missing.push('CancellableExecution');
127153
if (missing.length > 0) {
128154
throw new Error(
129155
`SEA native binding loaded but is missing expected export(s): ${missing.join(', ')}. ` +

0 commit comments

Comments
 (0)