Skip to content

Commit 3358a61

Browse files
authored
feat(kernel): session-level query tags + Thrift-parity OAuth scopes (configurable) (#430)
* feat(kernel): wire session-level query tags into KernelBackend.openSession Ports the session-level query-tags wiring onto the post-#428 lib/kernel path (originally lib/sea/SeaBackend, before the SEA→kernel rename). openSession serializes request.queryTags into the reserved QUERY_TAGS session conf, which the kernel allowlists (SESSION_CONF_ALLOWLIST) and forwards onto the SEA CreateSession session_confs — mirroring ThriftBackend.openSession. queryTags takes precedence over an explicit configuration.QUERY_TAGS. Verified end-to-end against a live warehouse: the tag lands in system.query.history.query_tags. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * fix(kernel): request Thrift-parity OAuth scopes, configurable via oauthScopes The kernel U2M flow passed no scopes, so it fell through to the kernel's bare default (all-apis offline_access). The databricks-sql-connector OAuth app is registered for `sql`, so U2M auth used the wrong scope set. Pass scopes explicitly from the driver: - U2M defaults to ['sql', 'offline_access'] (matches the Thrift driver's defaultOAuthScopes), overriding the kernel's all-apis default. - M2M defaults to ['all-apis'] (matches Thrift + the kernel's M2M default). - Both overridable via a new `oauthScopes` connect option — closing the configurability gap with pyo3, which already forwards `scopes` on M2M. Driver-only change: the napi binding already forwards oauth_scopes and the kernel's u2m.rs/m2m.rs feed them into the authorize/token request. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * fix(kernel): no-op unsupported per-statement options instead of throwing Per-statement options the kernel backend doesn't honour are now NO-OPs (logged at warn), not HiveDriverErrors, so call sites written for the Thrift backend are drop-in on the kernel path: - useCloudFetch / useLZ4Compression — kernel-governed perf/format hints - stagingAllowedLocalPath — staging not yet exposed on the kernel Ignoring these can't change query results. Parameter binding (compound/BINARY) is deliberately NOT no-op'd — a dropped param would silently change results, so it still throws. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * feat(kernel): route OAuth identically to Thrift (strict parity) + custom U2M client id Make the kernel auth flow selector + client-id resolution byte-for-byte identical to the Thrift driver (`DBSQLClient.createAuthProvider`): - flow = `oauthClientSecret === undefined ? U2M : M2M` (strict undefined) - clientId = `oauthClientId ?? defaultClientId` (`??` guards null/undefined only) No blank/reserved normalization on the OAuth fields — a present-but-degenerate value (`""` / `"undefined"` / whitespace) is forwarded verbatim, exactly as Thrift forwards it. This removes every divergence from the Thrift backend across the full (clientId × clientSecret) input matrix (verified). Consequences vs the prior id-presence routing: - `oauthClientId` + no secret now runs U2M (browser) and forwards the id as a custom U2M client (Thrift does the same). - M2M with no/blank id no longer throws "id required" — it uses the default client (`?? defaultClientId`), matching Thrift. - A present-but-empty/`"undefined"` secret routes to M2M (not U2M), matching Thrift's strict `=== undefined` check. Trade-off (accepted for parity): this re-imports Thrift's env-stringification behaviour — a secret/id env var that resolved to `""`/`"undefined"` is taken literally rather than treated as unset. Updated the auth unit tests (u2m/m2m/edge-cases) to assert the strict-parity matrix. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * chore(kernel): bump KERNEL_REV to kernel main (statement-level query tags + proxy + UA) Bump the pinned kernel from b676275 to 34b4c20 (current kernel main), which brings: - #150 — per-statement query tags on the SEA wire (native query_tags array). This is what makes `executeStatement(sql, { queryTags })` actually reach the server on the kernel path — bringing statement-level query tags to parity with the Thrift backend (the driver already serialized them into statement_conf["query_tags"]; the kernel previously dropped that before the wire). Verified e2e: the tag now lands in system.query.history.query_tags. - #129 — programmatic HTTP/HTTPS proxy + per-connection socket timeout (additive optional napi ConnectionOptions fields; refreshed in index.d.ts). - #151 — caller User-Agent overwrites the kernel base UA for accurate query-source attribution. Refreshes the committed napi types (native/kernel/index.d.ts) accordingly; the new fields are optional, so no driver-side change is required. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent 26be3be commit 3358a61

10 files changed

Lines changed: 349 additions & 271 deletions

File tree

KERNEL_REV

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
b676275f234b11a68cb4c8a2955d69cb3b786d97
1+
34b4c202cc127021c923903d59c841daa2b44fef

lib/contracts/IDBSQLClient.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ type AuthOptions =
2222
oauthClientId?: string;
2323
oauthClientSecret?: string;
2424
useDatabricksOAuthInAzure?: boolean;
25+
// OAuth scopes to request. When omitted, the kernel backend defaults the
26+
// U2M flow to `['sql', 'offline_access']` (parity with the Thrift driver's
27+
// `defaultOAuthScopes`), overriding the kernel's bare `all-apis offline_access`.
28+
oauthScopes?: Array<string>;
2529
}
2630
| {
2731
authType: 'custom';

lib/kernel/KernelAuth.ts

Lines changed: 50 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,25 @@ import { buildUserAgentString } from '../utils';
2828
*/
2929
const U2M_DEFAULT_REDIRECT_PORT = 8030;
3030

31+
// U2M OAuth scopes default. Matches the standalone Thrift driver's
32+
// `defaultOAuthScopes` (lib/connection/auth/DatabricksOAuth/OAuthScope.ts):
33+
// `['sql', 'offline_access']`. The kernel's bare default is
34+
// `['all-apis', 'offline_access']`; the `databricks-sql-connector` OAuth app is
35+
// registered for the `sql` scope, so we pass the Thrift-parity scopes explicitly
36+
// unless the caller overrides via `oauthScopes`.
37+
const U2M_DEFAULT_SCOPES = ['sql', 'offline_access'];
38+
39+
// M2M OAuth scopes default. Matches the standalone Thrift driver (`getScopes`
40+
// forces `['all-apis']` for the client-credentials flow) and the kernel's own
41+
// M2M default (`m2m.rs` → `['all-apis']`). Overridable via `oauthScopes`
42+
// (parity with pyo3, which forwards `scopes` on M2M).
43+
const M2M_DEFAULT_SCOPES = ['all-apis'];
44+
45+
// Default OAuth client id — identical to the Thrift driver's
46+
// `DatabricksOAuthManager.defaultClientId` and the kernel napi's own U2M default.
47+
// Used for `oauthClientId ?? default`, mirroring Thrift's `getClientId()`.
48+
const DEFAULT_OAUTH_CLIENT_ID = 'databricks-sql-connector';
49+
3150
/**
3251
* Shape consumed by the napi-binding's `openSession()` (see
3352
* `native/kernel/index.d.ts`). Mirrors `ConnectionOptions` in the binding's
@@ -189,12 +208,15 @@ export type KernelNativeConnectionOptions = KernelSessionDefaults &
189208
authMode: 'OAuthM2m';
190209
oauthClientId: string;
191210
oauthClientSecret: string;
211+
oauthScopes?: Array<string>;
192212
}
193213
| {
194214
hostName: string;
195215
httpPath: string;
196216
authMode: 'OAuthU2m';
197217
oauthRedirectPort: number;
218+
oauthScopes?: Array<string>;
219+
oauthClientId?: string;
198220
}
199221
);
200222

@@ -541,6 +563,7 @@ export function buildKernelConnectionOptions(options: ConnectionOptions): Kernel
541563
const oauth = options as {
542564
oauthClientId?: string;
543565
oauthClientSecret?: string;
566+
oauthScopes?: Array<string>;
544567
azureTenantId?: string;
545568
useDatabricksOAuthInAzure?: boolean;
546569
persistence?: unknown;
@@ -579,40 +602,18 @@ export function buildKernelConnectionOptions(options: ConnectionOptions): Kernel
579602
);
580603
}
581604

582-
// Flow selector — DELIBERATELY DIFFERENT from thrift's
583-
// `DBSQLClient.createAuthProvider` (`DBSQLClient.ts:216`), which keys off
584-
// the secret (`oauthClientSecret === undefined ? U2M : M2M`). kernel keys off
585-
// `oauthClientId` *presence* (the "do I have an id?" signal) instead, so a
586-
// user who set an id but typoed/forgot the secret gets the actionable M2M
587-
// "secret is required" error rather than being silently routed to U2M
588-
// (which would hide their intent). Cost: `id + no secret` throws here
589-
// where thrift would run U2M, and kernel U2M has no custom-client-id support
590-
// (see buildKernelConnectionOptions header). The U2M arm still defends against an id
591-
// sneaking through: fires only when `oauthClientId` is provided as
592-
// a blank-reserved literal (e.g., whitespace, `"null"`, `"undefined"`)
593-
// alongside an absent/blank secret — both `idIsBlank` and
594-
// `secretIsBlank` are true so U2M wins routing, but the caller's
595-
// intent to use U2M with a partially-set id is ambiguous and
596-
// rejected explicitly.
597-
const idIsBlank =
598-
oauth.oauthClientId === undefined ||
599-
(typeof oauth.oauthClientId === 'string' && isBlankOrReserved(oauth.oauthClientId));
600-
const secretIsBlank =
601-
oauth.oauthClientSecret === undefined ||
602-
(typeof oauth.oauthClientSecret === 'string' && isBlankOrReserved(oauth.oauthClientSecret));
603-
604-
if (idIsBlank && secretIsBlank) {
605-
// U2M — neither id nor secret supplied.
606-
if (oauth.oauthClientId !== undefined) {
607-
// Defense-in-depth: id was set but blank/reserved literal.
608-
// The kernel hardcodes `client_id = "databricks-cli"` for U2M;
609-
// there's no JS-side override knob.
610-
throw new HiveDriverError(
611-
'kernel backend: `oauthClientId` is not supported on the OAuth U2M flow; ' +
612-
"the kernel uses the built-in 'databricks-cli' client. " +
613-
'Omit `oauthClientId` for U2M, or supply `oauthClientSecret` for the M2M flow.',
614-
);
615-
}
605+
// Flow selector + client-id resolution mirror the Thrift driver EXACTLY
606+
// (`DBSQLClient.createAuthProvider`, DBSQLClient.ts:220):
607+
// flow = oauthClientSecret === undefined ? U2M : M2M (strict undefined)
608+
// clientId = oauthClientId ?? defaultClientId (`??` guards null/undefined only)
609+
// No blank/reserved normalization on the OAuth fields — a present-but-
610+
// degenerate value (`""`, `"undefined"`, whitespace) is forwarded verbatim,
611+
// exactly as Thrift forwards it, so the kernel path does not diverge from the
612+
// Thrift backend. (This intentionally re-imports Thrift's env-stringification
613+
// behaviour: a secret that resolved to `""`/`"undefined"` counts as a real
614+
// secret ⇒ M2M, just like Thrift.)
615+
if (oauth.oauthClientSecret === undefined) {
616+
// U2M (browser) — no secret, exactly like Thrift.
616617
if (oauth.persistence !== undefined) {
617618
throw new HiveDriverError(
618619
'kernel backend: `persistence` (custom OAuth token store) is not yet wired through ' +
@@ -623,24 +624,21 @@ export function buildKernelConnectionOptions(options: ConnectionOptions): Kernel
623624
'when the kernel exposes it.',
624625
);
625626
}
626-
return {
627+
const u2m = {
627628
...base,
628-
authMode: 'OAuthU2m',
629+
authMode: 'OAuthU2m' as const,
629630
oauthRedirectPort: U2M_DEFAULT_REDIRECT_PORT,
631+
// Scopes default to Thrift parity (`sql offline_access`); overridable.
632+
oauthScopes:
633+
Array.isArray(oauth.oauthScopes) && oauth.oauthScopes.length > 0 ? oauth.oauthScopes : U2M_DEFAULT_SCOPES,
630634
};
635+
// clientId: Thrift uses `oauthClientId ?? default`. Forward it verbatim
636+
// when set; when absent the napi applies the same default
637+
// (`databricks-sql-connector`), so omitting it is identical to Thrift.
638+
return oauth.oauthClientId !== undefined ? { ...u2m, oauthClientId: oauth.oauthClientId } : u2m;
631639
}
632640

633-
// M2M.
634-
if (typeof oauth.oauthClientId !== 'string' || isBlankOrReserved(oauth.oauthClientId)) {
635-
throw new AuthenticationError(
636-
'kernel backend: `oauthClientId` is required (non-empty, non-whitespace) for OAuth M2M.',
637-
);
638-
}
639-
if (typeof oauth.oauthClientSecret !== 'string' || isBlankOrReserved(oauth.oauthClientSecret)) {
640-
throw new AuthenticationError(
641-
'kernel backend: `oauthClientSecret` must be a non-empty non-whitespace string for OAuth M2M.',
642-
);
643-
}
641+
// M2M (client credentials) — a secret is present, exactly like Thrift.
644642
if (oauth.persistence !== undefined) {
645643
throw new HiveDriverError(
646644
'kernel backend: `persistence` is not supported on OAuth M2M ' +
@@ -650,8 +648,12 @@ export function buildKernelConnectionOptions(options: ConnectionOptions): Kernel
650648
return {
651649
...base,
652650
authMode: 'OAuthM2m',
653-
oauthClientId: oauth.oauthClientId,
651+
// Thrift: `getClientId()` = `oauthClientId ?? defaultClientId`.
652+
oauthClientId: oauth.oauthClientId ?? DEFAULT_OAUTH_CLIENT_ID,
654653
oauthClientSecret: oauth.oauthClientSecret,
654+
// Configurable (parity with pyo3); defaults to `['all-apis']`.
655+
oauthScopes:
656+
Array.isArray(oauth.oauthScopes) && oauth.oauthScopes.length > 0 ? oauth.oauthScopes : M2M_DEFAULT_SCOPES,
655657
};
656658
}
657659

lib/kernel/KernelBackend.ts

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ import { ConnectionOptions, OpenSessionRequest } from '../contracts/IDBSQLClient
1919
import { InternalConnectionOptions } from '../contracts/InternalConnectionOptions';
2020
import { LogLevel } from '../contracts/IDBSQLLogger';
2121
import HiveDriverError from '../errors/HiveDriverError';
22+
import { serializeQueryTags } from '../utils';
2223
import { getKernelNative, KernelNativeBinding, KernelConnection } from './KernelNativeLoader';
2324
import { decodeNapiKernelError } from './KernelErrorMapping';
2425
import { buildKernelConnectionOptions, buildKernelRetryOptions, KernelNativeConnectionOptions } from './KernelAuth';
@@ -145,6 +146,20 @@ export default class KernelBackend implements IBackend {
145146
if (request.configuration !== undefined) {
146147
sessionOptions.sessionConf = { ...request.configuration };
147148
}
149+
// Session-level query tags: serialize into the reserved `QUERY_TAGS`
150+
// session conf. The kernel allowlists `QUERY_TAGS` (SESSION_CONF_ALLOWLIST)
151+
// and forwards it onto the SEA `CreateSession` `session_confs`, mirroring
152+
// the Thrift backend's `ThriftBackend.openSession`. Runs after the
153+
// `configuration` merge so `queryTags` takes precedence over an explicit
154+
// `configuration.QUERY_TAGS`, matching the documented contract.
155+
if (request.queryTags !== undefined) {
156+
const serialized = serializeQueryTags(request.queryTags);
157+
if (serialized) {
158+
sessionOptions.sessionConf = { ...(sessionOptions.sessionConf ?? {}), QUERY_TAGS: serialized };
159+
} else if (sessionOptions.sessionConf) {
160+
delete sessionOptions.sessionConf.QUERY_TAGS;
161+
}
162+
}
148163

149164
let nativeConnection: KernelConnection;
150165
try {

lib/kernel/KernelSessionBackend.ts

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -156,20 +156,36 @@ export default class KernelSessionBackend implements ISessionBackend {
156156
public async executeStatement(statement: string, options: ExecuteStatementOptions): Promise<IOperationBackend> {
157157
this.failIfClosed();
158158

159+
// Per-statement options the kernel backend doesn't honour are treated as
160+
// NO-OPs (logged at warn), not errors — so call sites written for the Thrift
161+
// backend are drop-in on the kernel path. These are perf/format hints the
162+
// kernel governs internally (useCloudFetch / useLZ4Compression) or a feature
163+
// it doesn't expose yet (staging); ignoring them cannot change query results.
164+
// NB: parameter binding (compound/BINARY) is deliberately NOT no-op'd — a
165+
// dropped param would silently change results, so it still throws.
159166
if (options.useCloudFetch !== undefined) {
160-
throw new HiveDriverError(
161-
'kernel executeStatement: useCloudFetch is controlled by the kernel result configuration and is not a per-statement option on kernel',
162-
);
167+
this.context
168+
.getLogger()
169+
.log(
170+
LogLevel.warn,
171+
'kernel executeStatement: ignoring per-statement `useCloudFetch` — result fetching is governed by the kernel result configuration (no-op on kernel).',
172+
);
163173
}
164174
if (options.useLZ4Compression !== undefined) {
165-
throw new HiveDriverError(
166-
'kernel executeStatement: useLZ4Compression is not supported on kernel (result compression is governed by the kernel)',
167-
);
175+
this.context
176+
.getLogger()
177+
.log(
178+
LogLevel.warn,
179+
'kernel executeStatement: ignoring per-statement `useLZ4Compression` — result compression is governed by the kernel (no-op on kernel).',
180+
);
168181
}
169182
if (options.stagingAllowedLocalPath !== undefined) {
170-
throw new HiveDriverError(
171-
'kernel executeStatement: stagingAllowedLocalPath (volume operations) is not supported on kernel',
172-
);
183+
this.context
184+
.getLogger()
185+
.log(
186+
LogLevel.warn,
187+
'kernel executeStatement: ignoring `stagingAllowedLocalPath` — volume/staging operations are not yet supported on the kernel backend (no-op).',
188+
);
173189
}
174190

175191
// `runAsync` selects the kernel execution path. NOTE: this is a kernel-

native/kernel/index.d.ts

Lines changed: 48 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)