Skip to content

[BUG] Performance regression of SHOW-based metadata operations in 3.4.1 vs 3.3.3 #1491

Description

@NathanEckert

Describe the bug

After upgrading from 3.3.3 to 3.4.1, DatabaseMetaData.getColumns() calls against a SQL warehouse went from sub-second to 20–165 seconds per call. Per the 3.4.1 changelog, getColumns/getTables/getSchemas now execute SQL SHOW commands instead of Thrift metadata RPCs.
Each metadata call is now a full statement execution on the warehouse (queued and polled like any query), so under concurrent load metadata discovery becomes slower by two orders of magnitude

Measurements

Same test suite, same warehouse, same CI executor; only the driver version changed:

Driver version 3.3.3 3.4.1
Full test module (17 tests, ~36 getColumns calls) 2 min 38 s > 20 min (killed by CI timeout)
Individual getColumns latency sub-second 20–165 s observed

Our code calls getColumns once per table during schema discovery (to introspecting a Databricks catalog). With two test modules running in parallel against the warehouse, each module completed only ~36 metadata calls in 19 minutes — an average of one every ~32 s

Expected behavior

Metadata discovery latency comparable to 3.3.3

Client Environment (please complete the following information):

  • OS: Linux
  • Java version: Java 25
  • Java vendor: Azul
  • Driver Version: 3.4.1 (using jdbc-thin)

Additional context

While investigating we also found that each SHOW-based getColumns call floods the DriverManager log writer with caught Invalid column index stack traces — reported separately in #1490

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions