Implement getSchemas() for SEA by jayantsing-db · Pull Request #802 · databricks/databricks-jdbc

jayantsing-db · 2025-04-21T06:54:06Z

Description

Retrieve all catalogs.
For each catalog, fetch schemas in parallel and merge the results.
On the e2-dogfood environment, which has around 9,000 schemas, this approach took approximately 12 seconds—compared to 9 seconds with the existing driver—an acceptable difference.
The runtime update for SHOW SCHEMAS IN ALL CATALOGS is currently under BEHAVE/compatibility review. Meanwhile, this client-side implementation serves as a temporary solution.
Introduced a JdbcThreadUtil class to simplify multi-threaded operations in JDBC. To ensure correctness, thread context must carry the appropriate connection-related details.

Testing

End to end testing
Unit tests

Additional Notes to the Reviewer

The BEHAVE committee reviewing runtime PR #139818 is expected to require the introduction of a SQL configuration. For JDBC clients to take advantage of these changes, users would need to manually set this SQL config in their Spark session. ~~Currently afaik, it's not possible to set this configuration directly through the JDBC connection.~~ We need to add a DBSQL config at https://github.com/databricks-eng/universe/blob/68a3f9bf4aa09b4d28f85229cbbb4c6e7bcef7e2/common/dbsql-config/src/SqlConfig.scala#L35 for the configuration which is WIP.

…w-schemas-client

Copilot

Pull Request Overview

This PR implements the getSchemas() method for the SEA client by retrieving catalogs and fetching their schemas in parallel, using a newly introduced JdbcThreadUtils for multi-threaded operations. Key changes include:

Introducing JdbcThreadUtils with parallelMap and parallelFlatMap methods for concurrent task execution.
Modifying DatabricksDatabaseMetaData#getSchemas() to fetch catalogs and schemas concurrently.
Adding comprehensive tests for JdbcThreadUtils in JdbcThreadUtilsTest.java.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
src/test/java/com/databricks/jdbc/common/util/JdbcThreadUtilsTest.java	New tests covering multiple execution paths for the JdbcThreadUtils methods.
src/main/java/com/databricks/jdbc/model/telemetry/enums/DatabricksDriverErrorCode.java	Updated error code ordering with addition of OPERATION_TIMEOUT_ERROR.
src/main/java/com/databricks/jdbc/common/util/JdbcThreadUtils.java	Introduced parallelMap and parallelFlatMap; contains error handling for interruptions and execution errors.
src/main/java/com/databricks/jdbc/api/impl/DatabricksDatabaseMetaData.java	Updated getSchemas() to process catalog schemas in parallel using JdbcThreadUtils.

Comments suppressed due to low confidence (2)

src/main/java/com/databricks/jdbc/common/util/JdbcThreadUtils.java:71

The error code THREAD_INTERRUPTED_ERROR is referenced but not defined in DatabricksDriverErrorCode. Please add an appropriate definition or update to an existing error code.

throw new DatabricksSQLException("Parallel execution interrupted", e, DatabricksDriverErrorCode.THREAD_INTERRUPTED_ERROR);

src/main/java/com/databricks/jdbc/common/util/JdbcThreadUtils.java:80

The error code INVALID_STATE is used but not defined in DatabricksDriverErrorCode. Please add it to the enum or replace it with an existing, appropriate error code.

throw new DatabricksSQLException("Error in parallel execution", e, DatabricksDriverErrorCode.INVALID_STATE);

Copilot

Pull Request Overview

This PR implements the getSchemas() method for the SEA client by fetching catalogs and concurrently retrieving schema information, while introducing the JdbcThreadUtils for parallel operations. Key changes include:

Adding JdbcThreadUtils and associated unit tests for parallel processing.
Enhancing error handling and telemetry error codes.
Updating the getSchemas() method in DatabricksDatabaseMetaData to use parallel processing.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
src/main/java/com/databricks/jdbc/common/util/JdbcThreadUtils.java	Introduces parallelMap and parallelFlatMap methods with proper thread-context handling.
src/test/java/com/databricks/jdbc/common/util/JdbcThreadUtilsTest.java	Adds comprehensive unit tests for parallel execution, exception, and timeout scenarios.
src/main/java/com/databricks/jdbc/model/telemetry/enums/DatabricksDriverErrorCode.java	Inserts a new error code (OPERATION_TIMEOUT_ERROR) and reorders error codes.
src/main/java/com/databricks/jdbc/api/impl/DatabricksDatabaseMetaData.java	Updates getSchemas() to fetch catalogs and execute schema retrieval in parallel.

Comments suppressed due to low confidence (1)

src/main/java/com/databricks/jdbc/common/util/JdbcThreadUtils.java:73

The error code THREAD_INTERRUPTED_ERROR is referenced here but is not defined in the DatabricksDriverErrorCode enum. Consider adding the THREAD_INTERRUPTED_ERROR to the enum or using an existing error code that reflects an interruption.

throw new DatabricksSQLException("Parallel execution interrupted", e, DatabricksDriverErrorCode.THREAD_INTERRUPTED_ERROR);

…w-schemas-client

madhav-db · 2025-05-12T06:26:09Z

is there a basis for selecting this particular value for the timeout? Had the same concerns for the max threads variable, but that is configurable.

I think i added based on if timeout is long enough and below is one data-point:

On the e2-dogfood environment, which has around 9,000 schemas, this approach took approximately 12 seconds—compared to 9 seconds with the existing driver—an acceptable difference.

gopalldb · 2025-05-15T12:53:19Z

how are we testing parallel execute here?

this particular test method just tests the parallel execute with a single item/task testParallelExecuteWithSingleItem. But there are other test methods that tests parallel execute with multiple items.

Testing methodology:

parallel map is executed over a set of items

for each item, the task is to create a new upper case string String::toUpperCase

we assert the upper-case

jayantsing-db added 2 commits April 19, 2025 02:50

Implement getSchemas() for SEA

c33b19b

Merge remote-tracking branch 'databricks/main' into jayantsing-db/sho…

3f8f349

…w-schemas-client

jayantsing-db temporarily deployed to azure-prod April 21, 2025 06:55 — with GitHub Actions Inactive

jayantsing-db requested a review from Copilot April 21, 2025 11:18

jayantsing-db marked this pull request as ready for review April 21, 2025 11:19

Copilot AI reviewed Apr 21, 2025

View reviewed changes

Comment thread src/main/java/com/databricks/jdbc/api/impl/DatabricksDatabaseMetaData.java Outdated

Close result set after consuming

8c722fd

jayantsing-db temporarily deployed to azure-prod April 21, 2025 11:26 — with GitHub Actions Inactive

jayantsing-db requested review from Copilot, gopalldb and madhav-db April 21, 2025 11:26

Copilot AI reviewed Apr 21, 2025

View reviewed changes

jayantsing-db requested review from samikshya-db and vikrantpuppala April 21, 2025 11:26

gopalldb reviewed Apr 23, 2025

View reviewed changes

Comment thread src/main/java/com/databricks/jdbc/api/impl/DatabricksDatabaseMetaData.java Outdated

gopalldb reviewed Apr 23, 2025

View reviewed changes

Comment thread src/main/java/com/databricks/jdbc/common/util/JdbcThreadUtils.java Outdated

vikrantpuppala reviewed Apr 25, 2025

View reviewed changes

Comment thread src/main/java/com/databricks/jdbc/api/impl/DatabricksDatabaseMetaData.java Outdated

jayantsing-db added 5 commits May 12, 2025 09:04

Merge remote-tracking branch 'databricks/main' into jayantsing-db/sho…

69afe1b

…w-schemas-client

Address review comments

fd0ed2f

Merge remote-tracking branch 'databricks/main' into jayantsing-db/sho…

2cc82ef

…w-schemas-client

Update changelog

095360b

fmt

2c14d4a

jayantsing-db temporarily deployed to azure-prod May 12, 2025 06:07 — with GitHub Actions Inactive

nit

0277ce0

jayantsing-db requested a review from gopalldb May 12, 2025 06:09

jayantsing-db temporarily deployed to azure-prod May 12, 2025 06:09 — with GitHub Actions Inactive

madhav-db reviewed May 12, 2025

View reviewed changes

gopalldb reviewed May 15, 2025

View reviewed changes

Comment thread src/main/java/com/databricks/jdbc/api/impl/DatabricksDatabaseMetaData.java Outdated

gopalldb approved these changes May 21, 2025

View reviewed changes

Merge branch 'main' into jayantsing-db/show-schemas-client

61fdab5

jayantsing-db temporarily deployed to azure-prod May 22, 2025 06:55 — with GitHub Actions Inactive

jayantsing-db merged commit aa55100 into databricks:main May 22, 2025
15 of 16 checks passed

jayantsing-db deleted the jayantsing-db/show-schemas-client branch May 22, 2025 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement getSchemas() for SEA#802

Implement getSchemas() for SEA#802
jayantsing-db merged 10 commits into
mainfrom
jayantsing-db/show-schemas-client

jayantsing-db commented Apr 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madhav-db May 12, 2025

Uh oh!

jayantsing-db May 19, 2025

Uh oh!

Uh oh!

gopalldb May 15, 2025

Uh oh!

jayantsing-db May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jayantsing-db commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Additional Notes to the Reviewer

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madhav-db May 12, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gopalldb May 15, 2025

Choose a reason for hiding this comment

Uh oh!

jayantsing-db May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jayantsing-db commented Apr 21, 2025 •

edited

Loading