Skip to content

Commit 94d63bd

Browse files
authored
[SEA-NodeJS] SEA query parameters, metadata & getInfo (#412)
* [SEA-NodeJS] SEA query params, metadata & getInfo Wire three SEA session-backend capabilities onto the merged-kernel napi surface, all as thin adapters over the kernel (which owns the heavy lifting): - Bound parameters: forward `ordinalParameters`/`namedParameters` to the kernel's `ExecuteOptions.positionalParams`/`namedParams`, reusing the existing `DBSQLParameter.toSparkParameter` codec (DECIMAL→DECIMAL(p,s), INTERVAL→INTERVAL, NULL→VOID). Positional and named are mutually exclusive. - Metadata: `getCatalogs`/`getSchemas`/`getTables`/`getColumns`/`getFunctions`/ `getTableTypes`/`getTypeInfo`/`getPrimaryKeys`/`getCrossReference` forward to the kernel `Connection.list*`/`get*` calls and wrap the returned Statement. The kernel owns SQL synthesis, column projection, and the client-side `TABLE_TYPE` filter — the driver only maps request fields to arguments. - getInfo: synthesized client-side (no SEA/kernel endpoint), mirroring the three TGetInfoType values the Databricks Thrift server answers and rejecting the rest, byte-identical to the Thrift path. `SeaInputValidation` is slimmed to the one bind-time gate the driver needs (`assertBindableValue`); marker counting is the kernel's job. Re-exports the kernel's `ExecuteOptions`/`TypedValueInput` types from `SeaNativeLoader` rather than re-declaring them, so the driver param shapes can't drift from the kernel. Regenerates the committed napi `.d.ts` snapshot against the merged kernel. Validated against a live warehouse: positional + named params, mutual-exclusion rejection, all metadata calls (incl. kernel-side tableTypes filter), and getInfo. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Address code-review findings on params/metadata/getInfo Code-review-squad #412 review (79/100). Validated each finding against the kernel source + a live warehouse before fixing: - F1 (HIGH): getPrimaryKeys turned an omitted catalog into `?? ''`, which the kernel's `Identifier::new` rejects with an opaque "identifier must not be empty" — a regression vs Thrift (which forwards undefined). The napi `getPrimaryKeys(catalog: string, …)` requires a non-empty catalog, so reject up front with a clear, actionable message and document the divergence. Added a test (omitted + empty-string catalog) asserting the kernel call is never reached. (getCrossReference is unaffected — its parent args are nullable and passed through without `?? ''`.) - F3 (MED): mutual-exclusivity of ordinal/named params threw HiveDriverError with a different message than the Thrift path, which throws ParameterError('Driver does not support both ordinal and named parameters.') (ParameterError extends Error, not HiveDriverError — so a caller catching it worked on Thrift and silently failed on SEA). Now throws the identical ParameterError + message. Test updated. - F2 (MED): empirically re-verified against the live warehouse over Thrift — the server returns the exact same three values we synthesize ("Spark SQL" / "Spark SQL" / "3.1.1") and errors on every other TGetInfoType (probed CLI_MAX_DRIVER_CONNECTIONS, CLI_DATA_SOURCE_NAME, …). So throwing on an unsupported type matches Thrift's effective behaviour rather than narrowing it; softened the "byte-for-byte" assertion into the validated fact in both the SeaServerInfo and getInfo doc comments. - F4 (LOW): getInfo's rejection now names the accepted enum identifiers/values (CLI_SERVER_NAME (13), CLI_DBMS_NAME (17), CLI_DBMS_VER (18)) so an agent/SDK consumer can self-correct. - F5 (LOW): metadata + bound-param execute failures now emit a debug-level breadcrumb (mapped error + session id) before throwing, via a shared `logAndMapError` helper, matching the rest of the SEA backend's logging. - F7 (LOW): dropped the vestigial `nativeStatement!` non-null assertion in executeStatement (typed the local, mirroring runMetadata). - F8 (test): added coverage for the getPrimaryKeys omitted-catalog path (F1), the INTERVAL *→INTERVAL collapse, and the DECIMAL precision clamp-to-38. F6 (kernel diagnostic getters unconsumed) is tracked as a follow-up — wiring them in must land with the errorDetailsJson size-guard, out of scope here. Validated live: getPrimaryKeys(with/without catalog), the ParameterError parity, and the named-identifier getInfo error all behave as asserted. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] prettier-format SEA test files (CI code-style) The CI "Check code style" step runs `prettier . --check` (whole repo), not just eslint; these SEA test files were committed without prettier formatting and failed it. Formatting-only; no behavioural change. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * [SEA-NodeJS] Fix DECIMAL (p,s) derivation — reject overflow/non-numeric, drop leading-zero miscount Code-review #412 (gopalldb P1). `decimalPrecisionScale` derived precision by counting digit characters, which (a) counted insignificant leading zeros ("0.00001" → DECIMAL(6,5)) and (b) clamped precision to 38 while still sending the full value — so a 40-digit value produced DECIMAL(38,0) alongside an unrepresentable 40-digit value, which the kernel rejects or silently truncates. The prior test blessed that clamp as correct (wrong contract). Now: - Precision = significant integer digits (insignificant leading zeros stripped, matching Spark's decimal-literal rule) + fractional scale. "0.00001" → DECIMAL(5,5); "007.50" → DECIMAL(3,2); "0" → DECIMAL(1,0). - A value needing precision > 38 throws ParameterError at bind time instead of clamp-and-send (P1.1). - A non-numeric / exponential value ("1e+21", "abc", "") throws ParameterError with a clear message rather than mis-deriving (p,s) and surfacing an opaque kernel error (P1.2 / P2). Validated live: 0.00001 binds as DECIMAL(5,5) and round-trips; a 40-digit value is rejected client-side with ParameterError (no server round-trip). Unit tests updated to assert the new reject contract + leading-zero handling. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent de04e19 commit 94d63bd

11 files changed

Lines changed: 1202 additions & 78 deletions

lib/sea/SeaInputValidation.ts

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
// Copyright (c) 2026 Databricks, Inc.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
import Int64 from 'node-int64';
16+
import { DBSQLParameter, DBSQLParameterValue } from '../DBSQLParameter';
17+
import ParameterError from '../errors/ParameterError';
18+
19+
/**
20+
* Reject a parameter value that cannot be bound as a scalar. Arrays and plain
21+
* objects stringify to garbage (e.g. `[1,2,3]` → `"1,2,3"`) that the server
22+
* fails to coerce — on the Thrift path the operation never returns to
23+
* FINISHED (a DoS hazard), and on SEA it surfaces an opaque server error. We
24+
* fail fast at bind time instead, mirroring the kernel's compound-type
25+
* rejection. `DBSQLParameter`, `Int64`, `Date`, and JS primitives are allowed.
26+
*
27+
* Parameter-marker counting and arity validation are intentionally NOT done
28+
* here: the kernel's `statement::params` codec owns that check and binds
29+
* exactly one placeholder style per statement, so duplicating the SQL-walk
30+
* JS-side would only risk drift. The driver's sole bind-time job is this
31+
* cheap, type-shape gate before the value crosses the napi boundary.
32+
*/
33+
export default function assertBindableValue(value: DBSQLParameter | DBSQLParameterValue, label: string): void {
34+
if (value instanceof DBSQLParameter) return;
35+
if (value === null || value === undefined) return;
36+
if (Array.isArray(value)) {
37+
throw new ParameterError(
38+
`${label} is an array; compound types (ARRAY/MAP/STRUCT) are not bindable as a parameter value`,
39+
);
40+
}
41+
if (typeof value === 'object' && !(value instanceof Date) && !(value instanceof Int64)) {
42+
throw new ParameterError(
43+
`${label} is an object; only scalar values (string/number/bigint/boolean), Date, and Int64 are bindable`,
44+
);
45+
}
46+
}

lib/sea/SeaNativeLoader.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ import type {
3333
ConnectionOptions as NativeConnectionOptions,
3434
ArrowBatch as NativeArrowBatch,
3535
ArrowSchema as NativeArrowSchema,
36+
ExecuteOptions as NativeExecuteOptions,
37+
TypedValueInput as NativeTypedValueInput,
38+
NamedTypedValueInput as NativeNamedTypedValueInput,
3639
} from '../../native/sea';
3740

3841
// SEA-prefixed re-exports. The kernel-generated `.d.ts` keeps the
@@ -46,6 +49,16 @@ export type SeaArrowSchema = NativeArrowSchema;
4649
export type SeaConnection = NativeConnection;
4750
export type SeaStatement = NativeStatement;
4851

52+
// Per-statement execution options and bound-parameter inputs are kernel
53+
// concerns: the napi binding generates the canonical shapes (`positionalParams`
54+
// / `namedParams` as `TypedValueInput` / `NamedTypedValueInput`, plus
55+
// `rowLimit`, `queryTimeoutSecs`, `statementConf`, `queryTags`). We re-export
56+
// rather than re-declare so the driver-side param codec can never drift from
57+
// the kernel contract.
58+
export type SeaNativeExecuteOptions = NativeExecuteOptions;
59+
export type SeaNativeTypedValueInput = NativeTypedValueInput;
60+
export type SeaNativeNamedTypedValueInput = NativeNamedTypedValueInput;
61+
4962
/**
5063
* The full native binding surface, derived from the generated module
5164
* so it can never drift from the `.d.ts` contract: when the kernel

lib/sea/SeaPositionalParams.ts

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
// Copyright (c) 2026 Databricks, Inc.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
import { DBSQLParameter, DBSQLParameterValue } from '../DBSQLParameter';
16+
import ParameterError from '../errors/ParameterError';
17+
import { SeaNativeTypedValueInput, SeaNativeNamedTypedValueInput } from './SeaNativeLoader';
18+
import assertBindableValue from './SeaInputValidation';
19+
20+
/**
21+
* Derive `(precision,scale)` from a decimal value string for the SEA
22+
* `DECIMAL(p,s)` type name — the kernel param codec requires the parenthesised
23+
* form (plain `"DECIMAL"` is rejected) so it preserves the caller's fractional
24+
* digits. `"99.99"` ⇒ `"4,2"`; `"-123"` ⇒ `"3,0"`; `"0.00001"` ⇒ `"5,5"`.
25+
*
26+
* Precision is significant integer digits (insignificant leading zeros
27+
* stripped, matching Spark's decimal-literal rule) plus the fractional scale.
28+
* The value is NOT clamped: a non-numeric value, or one that needs precision >
29+
* the Databricks maximum of 38, throws `ParameterError` at bind time rather
30+
* than emitting an inconsistent `DECIMAL(38,…)` type alongside an
31+
* unrepresentable value (which the kernel would reject or silently truncate).
32+
*/
33+
function decimalPrecisionScale(v: string): string {
34+
// Accept an optional sign + plain decimal numeral only. Exponential form
35+
// ("1e+21"), empty, and non-numeric ("abc") are rejected with a clear error
36+
// instead of mis-deriving (p,s) and surfacing an opaque kernel failure.
37+
const m = /^[+-]?(\d*)(?:\.(\d+))?$/.exec(v.trim());
38+
const intPart = m?.[1] ?? '';
39+
const fracPart = m?.[2] ?? '';
40+
if (!m || (intPart === '' && fracPart === '')) {
41+
throw new ParameterError(
42+
`DECIMAL parameter value "${v}" is not a plain decimal numeral (expected [+-]?digits[.digits]).`,
43+
);
44+
}
45+
const scale = fracPart.length;
46+
// Significant integer digits: drop insignificant leading zeros ("007" → 1,
47+
// "0" / "" → 0) so e.g. 0.00001 is DECIMAL(5,5), not DECIMAL(6,5).
48+
const significantIntDigits = intPart.replace(/^0+/, '').length;
49+
const precision = Math.max(significantIntDigits + scale, 1);
50+
if (precision > 38) {
51+
throw new ParameterError(
52+
`DECIMAL parameter value "${v}" needs precision ${precision}, exceeding the Databricks maximum of 38. ` +
53+
'Round/scale the value or bind it as a STRING.',
54+
);
55+
}
56+
return `${precision},${scale}`;
57+
}
58+
59+
/**
60+
* Reduce a `DBSQLParameter | DBSQLParameterValue` to the napi
61+
* `TypedValueInput` (`{ sqlType, value? }`) the kernel's positional-param
62+
* codec (`parse_typed_value`) accepts. Reuses `DBSQLParameter.toSparkParameter`
63+
* — the same type-inference + value-stringification the Thrift backend uses —
64+
* then adapts the type name to the codec's expectations:
65+
* - DECIMAL → `DECIMAL(p,s)` (parenthesised form required)
66+
* - INTERVAL * → `INTERVAL` (the codec's single interval type name)
67+
* - a missing value ⇒ SQL NULL (`parse_typed_value` maps `value: None` to NULL).
68+
*/
69+
function toTypedValueInput(value: DBSQLParameter | DBSQLParameterValue): SeaNativeTypedValueInput {
70+
const param = value instanceof DBSQLParameter ? value : new DBSQLParameter({ value });
71+
const spark = param.toSparkParameter();
72+
const stringValue = spark.value?.stringValue ?? undefined;
73+
74+
// NULL: no value (and `VOID` ignores any type), matching toSparkParameter's
75+
// type/value-less shape for null/undefined.
76+
if (stringValue === undefined || stringValue === null) {
77+
return { sqlType: 'VOID' };
78+
}
79+
80+
let sqlType = spark.type ?? 'STRING';
81+
const upper = sqlType.toUpperCase();
82+
if (upper === 'DECIMAL') {
83+
sqlType = `DECIMAL(${decimalPrecisionScale(stringValue)})`;
84+
} else if (upper.startsWith('INTERVAL')) {
85+
sqlType = 'INTERVAL';
86+
}
87+
return { sqlType, value: stringValue };
88+
}
89+
90+
/**
91+
* Convert the public `ordinalParameters` option into the napi
92+
* `positionalParams` array (1-based `?` placeholders). Returns `undefined`
93+
* when none were supplied, so the caller can keep the minimal no-options
94+
* call shape.
95+
*/
96+
export function buildSeaPositionalParams(
97+
ordinalParameters?: Array<DBSQLParameter | DBSQLParameterValue>,
98+
): Array<SeaNativeTypedValueInput> | undefined {
99+
if (ordinalParameters === undefined || ordinalParameters.length === 0) {
100+
return undefined;
101+
}
102+
return ordinalParameters.map((value, i) => {
103+
assertBindableValue(value, `ordinalParameters[${i}]`);
104+
return toTypedValueInput(value);
105+
});
106+
}
107+
108+
/**
109+
* Convert the public `namedParameters` option (`Record<name, value>`) into
110+
* the napi `namedParams` array (`:name` placeholders). Each value reuses the
111+
* same `toTypedValueInput` mapping (DECIMAL → DECIMAL(p,s), NULL → VOID, …),
112+
* then carries its name. Returns `undefined` when none were supplied.
113+
*/
114+
export function buildSeaNamedParams(
115+
namedParameters?: Record<string, DBSQLParameter | DBSQLParameterValue>,
116+
): Array<SeaNativeNamedTypedValueInput> | undefined {
117+
if (namedParameters === undefined || Object.keys(namedParameters).length === 0) {
118+
return undefined;
119+
}
120+
return Object.keys(namedParameters).map((name) => {
121+
assertBindableValue(namedParameters[name], `namedParameters[${name}]`);
122+
return { name, ...toTypedValueInput(namedParameters[name]) };
123+
});
124+
}

lib/sea/SeaServerInfo.ts

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
// Copyright (c) 2026 Databricks, Inc.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
import { TGetInfoType, TGetInfoValue } from '../../thrift/TCLIService_types';
16+
17+
/**
18+
* `getInfo` (JDBC `DatabaseMetaData` / ODBC `SQLGetInfo`) is a Thrift-protocol
19+
* concept: the Thrift backend forwards `TGetInfoReq` to the server's `getInfo`
20+
* RPC. The SEA REST protocol and the Rust kernel have **no** equivalent
21+
* endpoint, so — exactly as JDBC does for `DatabaseMetaData` — we synthesize
22+
* the values client-side.
23+
*
24+
* The Databricks Thrift server itself answers only three `TGetInfoType`s and
25+
* rejects every other value; we mirror that surface so the SEA path is a
26+
* drop-in equivalent:
27+
*
28+
* | TGetInfoType | Thrift server | SEA (here) |
29+
* |---------------------|---------------|-------------------|
30+
* | CLI_SERVER_NAME (13)| "Spark SQL" | "Spark SQL" |
31+
* | CLI_DBMS_NAME (17)| "Spark SQL" | "Spark SQL" |
32+
* | CLI_DBMS_VER (18)| "3.1.1" | "3.1.1" |
33+
* | (any other) | error | undefined → error |
34+
*
35+
* Empirically re-verified against a live warehouse over the Thrift path: the
36+
* three values above are byte-identical to the server's, and the server
37+
* returns an error for every other `TGetInfoType` (probed CLI_MAX_DRIVER_-
38+
* CONNECTIONS, CLI_DATA_SOURCE_NAME, CLI_FETCH_DIRECTION, … — all errored). So
39+
* the SEA "undefined → throw" path matches Thrift's effective behaviour rather
40+
* than narrowing it.
41+
*/
42+
43+
/** Canonical DBMS product name — identical to the Thrift server's value. */
44+
export const SEA_DBMS_NAME = 'Spark SQL';
45+
46+
/** Server-name answer — identical to the Thrift server's value. */
47+
export const SEA_SERVER_NAME = 'Spark SQL';
48+
49+
/**
50+
* DBMS version string. Mirrors the constant the Databricks Thrift server
51+
* reports for `CLI_DBMS_VER` (the HiveServer2-compat Spark SQL version, not
52+
* the DBR release). Kept in lock-step with Thrift for parity; if the server
53+
* ever changes it the comparator's GET_INFO suite flags the drift.
54+
*/
55+
export const SEA_DBMS_VERSION = '3.1.1';
56+
57+
/**
58+
* Synthesize the `TGetInfoValue` for a `getInfo` request on the SEA path.
59+
* Returns `undefined` for any `TGetInfoType` the (Thrift) server does not
60+
* answer — the caller surfaces that as an error, matching Thrift's
61+
* reject-unsupported-info-type behaviour.
62+
*/
63+
export function seaServerInfoValue(infoType: number): TGetInfoValue | undefined {
64+
switch (infoType) {
65+
case TGetInfoType.CLI_SERVER_NAME:
66+
return new TGetInfoValue({ stringValue: SEA_SERVER_NAME });
67+
case TGetInfoType.CLI_DBMS_NAME:
68+
return new TGetInfoValue({ stringValue: SEA_DBMS_NAME });
69+
case TGetInfoType.CLI_DBMS_VER:
70+
return new TGetInfoValue({ stringValue: SEA_DBMS_VERSION });
71+
default:
72+
return undefined;
73+
}
74+
}

0 commit comments

Comments
 (0)