Skip to content

Latest commit

 

History

History
379 lines (279 loc) · 19 KB

File metadata and controls

379 lines (279 loc) · 19 KB

Release History

1.16.0

  • New: optional kernel backend (useKernel: true). Adds an alternative connection path backed by the native @databricks/databricks-sql-kernel client (a Rust core exposed via napi-rs), shipped as prebuilt per-platform packages (linux x64/arm64 gnu+musl, macOS x64/arm64, Windows x64/arm64) pulled in automatically as optional dependencies. The kernel talks to Databricks over the SEA (Statement Execution API) HTTP transport — not Thrift — with CloudFetch and inline-Arrow result fetching, through the same DBSQLClient surface. Supports PAT and OAuth (M2M/U2M) auth. Requires Node >= 18; on older Node the binding is not loaded and useKernel: true raises a clear error directing you to the Thrift backend. The default backend remains Thrift — opt in per connection. (databricks#378, #380, #409, #410, #411, #412, #416, #428, #434 by @msrathore-db)
  • Kernel backend behavior is aligned with Thrift so application code works the same either way: named/positional query parameters, metadata calls, TLS/mTLS with a custom CA, custom headers and user-agent, HTTP/SOCKS proxy and socket timeout, configurable retry/backoff, session query tags, async submit + cancel(), operation id/schema, and INTERVAL/type parity. Kernel logs surface through the same DBSQLLogger sink. (databricks#417, #420, #421, #426, #430, #431 by @msrathore-db)
  • Make retry-policy knobs (max attempts, min/max backoff, overall timeout) configurable via connect() for both backends (databricks#433 by @msrathore-db)
  • Retry transient network errors and fix a CloudFetch prefetch promise-rejection leak (databricks#424 by @msrathore-db)
  • Telemetry: emit sql_operation, auth_type, and driver_connection_params (databricks#396 by @samikshya-db)

1.15.0

  • Add SPOG routing support: parse ?o=<workspaceId> from httpPath and inject x-databricks-org-id on Thrift, telemetry, and feature-flag requests. Expose customHeaders on ConnectionOptions for caller-supplied headers (databricks#391 by @samikshya-db)
  • Telemetry: enable by default with feature-flag-controlled priority, and fix final-flush dropping on client.close() due to a close-ordering bug (databricks#327, #391 by @samikshya-db)
  • Fix Azure AD OAuth: tenant-aware discovery URL and correct scope resource (databricks#363 by @msrathore-db)
  • Fix: use a valid SPDX license identifier in package.json (databricks#389 by @sreekanth-db)

1.14.0

  • Add statement-level query tag support (databricks#366 by @sreekanth-db)
  • Add AI coding agent detection to User-Agent header (databricks#333 by @vikrantpuppala)
  • Internal: telemetry infrastructure improvements — circuit breaker, feature flag cache, telemetry client management (off by default) (databricks#325, #326, #362)

1.13.0

  • Add token federation support with custom token providers (databricks#318, databricks#319, databricks#320 by @madhav-db)
  • Add metric view metadata support (databricks#312 by @shivam2680)
  • Fix: Avoid calling require('lz4') if it's really not required (databricks#316 by @ikkala)
  • Add telemetry foundation (off by default) (databricks#324 by @samikshya-db)
  • Telemetry event emission and per-host aggregation (databricks#327 by @samikshya-db). Default change: telemetryEnabled now defaults to true (gated by a remote feature flag). To opt out programmatically, pass telemetryEnabled: false to connect(). To disable globally without code changes, set the environment variable DATABRICKS_TELEMETRY_DISABLED to one of 1, true, yes, or on (case-insensitive). Other values (empty, 0, false, etc.) are ignored — the runtime config takes precedence.

1.12.0

1.11.0

1.10.0

  • Rename clientId parameter to userAgentEntry in connect call to standardize across sql drivers (databricks#281)

1.9.0

1.8.4

  • Fix: proxy agent unintentionally overwrites protocol in URL (databricks#241)
  • Improve Array.at/TypedArray.at polyfill (databricks#242 by @barelyhuman)
  • UC Volume ingestion: stream files instead of loading them into memory (databricks#247)
  • UC Volume ingestion: improve behavior on SQL REMOVE (databricks#249)
  • Expose session and query ID (databricks#250)
  • Make lz4 module optional so package manager can skip it when cannot install (databricks#246)

1.8.3

1.8.2

Improved results handling when running queries against older DBR versions (databricks#232)

1.8.1

Security fixes:

An issue in all published versions of the NPM package ip allows an attacker to execute arbitrary code and obtain sensitive information via the isPublic() function. This can lead to potential Server-Side Request Forgery (SSRF) attacks. The core issue is the function's failure to accurately distinguish between public and private IP addresses.

1.8.0

Highlights

OAuth on Azure

Some Azure instances now support Databricks native OAuth flow (in addition to AAD OAuth). For a backward compatibility, library will continue using AAD OAuth flow by default. To use Databricks native OAuth, pass useDatabricksOAuthInAzure: true to client.connect():

client.connect({
  // other options - host, port, etc.
  authType: 'databricks-oauth',
  useDatabricksOAuthInAzure: true,
  // other OAuth options if needed
});

Also, we fixed issue with AAD OAuth when wrong scopes were passed for M2M flow.

OAuth on GCP

We enabled OAuth support on GCP instances. Since it uses Databricks native OAuth, all the options are the same as for OAuth on AWS instances.

CloudFetch improvements

Now library will automatically attempt to retry failed CloudFetch requests. Currently, the retry strategy is quite basic, but it is going to be improved in the future.

Also, we implemented a support for LZ4-compressed results (Arrow- and CloudFetch-based). It is enabled by default, and compression will be used if server supports it.

1.7.1

  • Fix "Premature close" error which happened due to socket limit when intensively using library (databricks#217)

1.7.0

  • Fixed behavior of maxRows option of IOperation.fetchChunk(). Now it will return chunks of requested size (databricks#200)
  • Improved CloudFetch memory usage and overall performance (databricks#204, databricks#207, databricks#209)
  • Remove protocol version check when using query parameters (databricks#213)
  • Fix IOperation.hasMoreRows() behavior to avoid fetching data beyond the end of dataset. Also, now it will work properly prior to fetching first chunk (databricks#205)

1.6.1

1.6.0

Highlights

Proxy support

This feature allows to pass through proxy all the requests library makes. By default, proxy is disabled. To enable proxy, pass a configuration object to DBSQLClient.connect:

client.connect({
    // pass host, path, auth options as usual
    proxy: {
      protocol: 'http',  // supported protocols: 'http', 'https', 'socks', 'socks4', 'socks4a', 'socks5', 'socks5h'
      host: 'localhost', // proxy host (string)
      port: 8070,        // proxy port (number)
      auth: {            // optional proxy basic auth config
        username: ...
        password: ...
      },
    },
  })

Note: using proxy settings from environment variables is currently not supported

1.5.0

Highlights

Databricks OAuth support

Databricks OAuth support added in v1.4.0 is now extended with M2M flow. To use OAuth instead of PAT, pass a corresponding auth provider type and options to DBSQLClient.connect:

// instantiate DBSQLClient as usual

client.connect({
  // provide other mandatory options as usual - e.g. host, path, etc.
  authType: 'databricks-oauth',
  oauthClientId: '...', // optional - overwrite default OAuth client ID
  azureTenantId: '...', // optional - provide custom Azure tenant ID
  persistence: ...,     // optional - user-provided storage for OAuth tokens, should implement OAuthPersistence interface
})

U2M flow involves user interaction - the library will open a browser tab asking user to log in. To use this flow, no other options are required except for authType.

M2M flow does not require any user interaction, and therefore is a good option, say, for scripting. To use this flow, two extra options are required for DBSQLClient.connect: oauthClientId and oauthClientSecret.

Also see Databricks docs for more details about Databricks OAuth.

Named query parameters

v1.5.0 adds a support for query parameters. Currently only named parameters are supported.

Basic usage example:

// obtain session object as usual

const operation = session.executeStatement('SELECT :p1 AS "str_param", :p2 AS "number_param"', {
  namedParameters: {
    p1: 'Hello, World',
    p2: 3.14,
  },
});

The library will infer parameter types from passed primitive objects. Supported data types include booleans, various numeric types (including native BigInt and Int64 from node-int64), native Date type, and string.

It's also possible to explicitly specify the parameter type by passing DBSQLParameter instances instead of primitive values. It also allows one to use values that don't have a corresponding primitive representation:

import { ..., DBSQLParameter, DBSQLParameterType } from '@databricks/sql';

// obtain session object as usual

const operation = session.executeStatement('SELECT :p1 AS "date_param", :p2 AS "interval_type"', {
  namedParameters: {
    p1: new DBSQLParameter({
      value: new Date('2023-09-06T03:14:27.843Z'),
      type: DBSQLParameterType.DATE, // by default, Date objects are inferred as TIMESTAMP, this allows to override the type
    }),
    p2: new DBSQLParameter({
      value: 5, // INTERVAL '5' DAY
      type: DBSQLParameterType.INTERVALDAY
    }),
  },
});

Of course, you can mix primitive values and DBSQLParameter instances.

runAsync deprecation

Starting with this release, the library will execute all queries asynchronously, so we have deprecated the runAsync option. It will be completely removed in v2. So you should not use it going forward and remove all the usages from your code before version 2 is released. From user's perspective the library behaviour won't change.

Data ingestion support

This feature allows you to upload, retrieve, and remove unity catalog volume files using SQL PUT, GET and REMOVE commands.

1.4.0

  • Added Cloud Fetch support (databricks#158)
  • Improved handling of closed sessions and operations (databricks#129). Now, when session gets closed, all operations associated with it are immediately closed. Similarly, if client gets closed - all associated sessions (and their operations) are closed as well.

Notes:

Cloud Fetch is disabled by default. To use it, pass useCloudFetch: true to IDBSQLSession.executeStatement(). For example:

// obtain session object as usual
const operation = session.executeStatement(query, {
  runAsync: true,
  useCloudFetch: true,
});

Note that Cloud Fetch is effectively enabled only for really large datasets, so if the query returns only few thousands records, Cloud Fetch won't be enabled no matter what useCloudFetch setting is. Also gentle reminder that for large datasets it's better to use fetchChunk instead of fetchAll to avoid OOM errors:

do {
  const chunk = await operation.fetchChunk({ maxRows: 100000 });
  // process chunk here
} while (await operation.hasMoreRows());

1.3.0

1.2.1

1.2.0

1.1.1

  • Fix: patch needed for improved error handling wasn't applied when installing 1.1.0

1.1.0

  • Fix: now library will not attempt to parse column names and will use ones provided by server (databricks#84)
  • Better error handling: more errors can now be handled in specific .catch() handlers instead of being emitted as a generic error event (databricks#99)
  • Fixed error logging bug (attempt to serialize circular structures) (databricks#89)
  • Fixed some minor bugs and regressions

1.0.0

  • DBSQLClient.openSession now takes a limited set of options (OpenSessionRequest instead of Thrift's TOpenSessionReq)
  • DBSQLClient.openSession now uses the latest protocol version by default
  • Direct results feature is now available for all IOperation methods which support it. To enable direct results feature, maxRows option should be used
  • Direct results became enabled by default. If maxRows is omitted - it will default to 100000. To disable direct results, set maxRows to null
  • FunctionNameRequest type renamed to FunctionsRequest
  • IDBSQLConnectionOptions type renamed to ConnectionOptions
  • IFetchOptions renamed to FetchOptions
  • DBSQLOperation.getSchema will wait for operation completion, like DBSQLOperation.fetchChunk/DBSQLOperation.fetchAll. It also supports the same progress reporting options
  • runAsync option is now available for all operations that support it
  • Added logging functionality for logging on client side and added new optional logger param for DBSQLClient constructor
  • Turned on Direct results feature by default
  • Removed legacy Kerberos auth APIs

0.1.8-beta.2 (2022-09-08)

  • Operations will wait for cluster to start instead of failing
  • Added support for DirectResults, which speeds up data fetches by reducing the number of server roundtrips when possible
  • DBSQLOperation interface simplified: HiveUtils were removed and replaced with new methods DBSQLOperation.fetchChunk/DBSQLOperation.fetchAll. New API implements all necessary waiting and data conversion routines internally
  • Better TypeScript support
  • Thrift definitions updated to support additional Databricks features
  • User-agent string updated; a part of user-agent string is configurable through DBSQLClient's clientId option
  • Connection now uses keep-alive (not configurable at this moment)
  • DBSQLClient now prepends slash to path when needed
  • DBSQLOperation: default chunk size for data fetching increased from 100 to 100.000

Upgrading

DBSQLClient.utils was permanently removed. Code which used utils.waitUntilReady, utils.fetchAll and utils.getResult to get data should now be replaced with the single DBSQLOperation.fetchAll method. Progress reporting, previously supported by utils.waitUntilReady, is now configurable via DBSQLOperation.fetchChunk/DBSQLOperation.fetchAll options. DBSQLOperation.setMaxRows also became an option of methods mentioned above.

0.1.8-beta.1 (2022-06-24)

  • Initial release