Skip to content

Migrate Databricks from sqlalchemy-databricks to databricks-sqlalchemy#26896

Open
ulixius9 wants to merge 8 commits intomainfrom
databricks_sqa_update
Open

Migrate Databricks from sqlalchemy-databricks to databricks-sqlalchemy#26896
ulixius9 wants to merge 8 commits intomainfrom
databricks_sqa_update

Conversation

@ulixius9
Copy link
Copy Markdown
Member

@ulixius9 ulixius9 commented Mar 31, 2026

Summary

  • Migrate Databricks connectors (Databricks, Unity Catalog, Databricks Pipeline) from unmaintained sqlalchemy-databricks==0.2.0 (pyhive-based) to official databricks-sqlalchemy~=2.0.9 with native SQLAlchemy 2.0 support
  • Update connection URL scheme from databricks+connector to databricks across JSON schemas, generated models, frontend types, and Flyway migrations
  • Replace pyhive HiveCompiler references with SQLCompiler/DatabricksStatementCompiler in profiler interface
  • Pass catalog as URL query parameter (?catalog=) so the new dialect's internal methods (get_pk_constraint, _describe_table_extended) resolve the catalog correctly
  • Fix Row.values()tuple(result) for SQLAlchemy 2.0 Row compatibility in table/schema comment extraction
  • Fix Column._set_parent() to pass required all_names and allow_replacements kwargs for SQLAlchemy 2.0
  • Fix USE CATALOG :catalog parameterized DDL → literal USE CATALOG for NATIVE paramstyle compatibility
  • Suppress upstream _user_agent_entry deprecation warning from databricks-sqlalchemy

Test plan

  • Unit tests for connection URL generation (Databricks, Unity Catalog, Pipeline) with and without catalog
  • Unit tests for _type_map completeness and complex type registration
  • Unit tests for DatabricksBaseTableParameter default scheme
  • Unit tests for profiler visit_column/visit_table with new compiler class
  • E2E metadata ingestion validated against live Databricks workspace (all table types including struct/array/map)
  • E2E profiler validated against live Databricks workspace
  • Verify Hive connector unaffected (pyhive remains in hive extras)
  • Verify Unity Catalog metadata ingestion
  • Verify Databricks Pipeline connector ingests jobs

🤖 Generated with Claude Code


Summary by Gitar

  • Profiler hardening:
    • Refactored DatabricksProfilerInterface to dynamically patch the DatabricksDialect compiler at runtime instead of importing private modules.
    • Added error handling to the compiler patching process to ensure profiler startup succeeds even if patching fails.
  • Connection safety:
    • Updated SQAMixin to use the dialect's identifier_preparer for quoting the catalog name during USE CATALOG execution, preventing injection and syntax errors.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings March 31, 2026 18:35
@ulixius9 ulixius9 requested review from a team, akash-jain-10, harshach and tutte as code owners March 31, 2026 18:35
@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 31, 2026
Comment thread ingestion/src/metadata/ingestion/source/database/databricks/connection.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates OpenMetadata’s Databricks-related connectors (Databricks, Unity Catalog, Databricks Pipeline) from the unmaintained sqlalchemy-databricks dialect to the official databricks-sqlalchemy dialect, updating connection URL scheme semantics and adapting profiler/ingestion logic for SQLAlchemy 2.0 compatibility.

Changes:

  • Updated Databricks/Unity Catalog connection scheme from databricks+connector to databricks across JSON schemas, ingestion code, unit tests, and DB migrations.
  • Adjusted connection URL generation to pass catalog via URL query parameter, and updated profiler compiler integration away from PyHive.
  • Updated ingestion internals for SQLAlchemy 2.0 compatibility (Row handling, Column parenting) and removed legacy dialect preinstalls from CI images/actions.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/unityCatalogConnection.json Updates Unity Catalog scheme enum/default to databricks.
openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/databricksConnection.json Updates Databricks scheme enum/default to databricks.
openmetadata-spec/src/main/resources/json/schema/entity/applications/configuration/external/metadataExporterConnectors/databricksConnection.json Aligns exporter connector schema scheme enum/default to databricks.
ingestion/tests/unit/topology/database/test_databricks.py Updates expected scheme/URL in Databricks unit tests.
ingestion/tests/unit/topology/database/test_databricks_migration.py Adds unit coverage for scheme enum, _type_map, default scheme, and pipeline URL scheme.
ingestion/tests/unit/test_source_connection.py Updates Databricks URL expectations; adds Unity Catalog and pipeline URL coverage including catalog query param.
ingestion/tests/unit/observability/profiler/sqlalchemy/databricks/test_visit_column.py Switches compiler mocking from PyHive HiveCompiler to SQLAlchemy SQLCompiler.
ingestion/src/metadata/profiler/interface/sqlalchemy/databricks/profiler_interface.py Updates compiler integration to work with databricks-sqlalchemy statement compiler and SQLAlchemy 2.0.
ingestion/src/metadata/mixins/sqalchemy/sqa_mixin.py Changes Databricks/Unity Catalog catalog selection DDL execution.
ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/connection.py Updates pipeline connection URL scheme to databricks and adds log suppression.
ingestion/src/metadata/ingestion/source/database/unitycatalog/connection.py Appends catalog as URL query param and adds log suppression.
ingestion/src/metadata/ingestion/source/database/databricks/metadata.py Replaces PyHive _type_map; updates dialect import; fixes SQLAlchemy 2.0 Row iteration for comments/descriptions.
ingestion/src/metadata/ingestion/source/database/databricks/connection.py Appends catalog as URL query param and adds log suppression.
ingestion/src/metadata/ingestion/source/database/common/data_diff/databricks_base.py Updates default scheme fallback from databricks+connector to databricks.
ingestion/setup.py Adds databricks-sqlalchemy dependency; updates connector versions; removes PyHive from databricks extra.
ingestion/operators/docker/Dockerfile.ci Removes preinstall of legacy sqlalchemy-databricks dialect.
ingestion/Dockerfile.ci Removes preinstall of legacy sqlalchemy-databricks dialect.
bootstrap/sql/migrations/native/1.13.0/postgres/schemaChanges.sql Migrates stored Databricks/UnityCatalog scheme values to databricks in Postgres.
bootstrap/sql/migrations/native/1.13.0/mysql/schemaChanges.sql Migrates stored Databricks/UnityCatalog scheme values to databricks in MySQL.
.github/actions/setup-openmetadata-test-environment/action.yml Removes preinstall of legacy sqlalchemy-databricks in test environment setup.

Comment thread ingestion/src/metadata/mixins/sqalchemy/sqa_mixin.py
Comment on lines 71 to 75
def get_connection_url(connection: UnityCatalogConnection) -> str:
url = f"{connection.scheme.value}://{connection.hostPort}"
if connection.catalog:
url = f"{url}?catalog={connection.catalog}"
return url
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building the connection URL with ?catalog={connection.catalog} does not URL-encode the catalog value. Catalog names containing spaces or reserved URL characters will produce an invalid URL and can break SQLAlchemy parsing. Consider using urllib.parse.urlencode/quote when appending query parameters.

Copilot uses AI. Check for mistakes.
Comment thread ingestion/src/metadata/ingestion/source/database/databricks/connection.py Outdated
Comment on lines +66 to +68
# Suppress noisy deprecation warning from databricks-sqlalchemy using
# the deprecated '_user_agent_entry' parameter internally
logging.getLogger("databricks.sql.session").setLevel(logging.ERROR)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting databricks.sql.session logger level at import time is a global side effect (affects all callers) and may hide useful INFO/WARN logs for debugging Databricks connectivity issues. If the goal is to suppress a specific deprecation warning, prefer filtering the specific warning/message (e.g., warnings.filterwarnings) or applying a targeted log filter closer to connection initialization rather than changing the logger’s level module-wide.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/connection.py Outdated
Comment on lines 103 to 111
from databricks.sqlalchemy._ddl import DatabricksStatementCompiler

DatabricksStatementCompiler.visit_column = (
DatabricksProfilerInterface.visit_column
)
DatabricksStatementCompiler.visit_table = (
DatabricksProfilerInterface.visit_table
)

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing DatabricksStatementCompiler from databricks.sqlalchemy._ddl relies on a private module (_ddl). This is fragile across databricks-sqlalchemy upgrades and can break at runtime if internals move. Consider guarding the import (try/except) and/or retrieving the statement compiler class from the dialect/engine in a supported way if available.

Suggested change
from databricks.sqlalchemy._ddl import DatabricksStatementCompiler
DatabricksStatementCompiler.visit_column = (
DatabricksProfilerInterface.visit_column
)
DatabricksStatementCompiler.visit_table = (
DatabricksProfilerInterface.visit_table
)
# Override the Databricks statement compiler's visit methods to handle
# struct columns and table names more robustly. Instead of importing
# DatabricksStatementCompiler from a private module, retrieve the
# active statement compiler class from the dialect in a supported way.
try:
bind = getattr(self.session, "bind", None)
dialect = getattr(bind, "dialect", None)
compiler_cls = getattr(dialect, "statement_compiler", None)
if compiler_cls is not None:
compiler_cls.visit_column = DatabricksProfilerInterface.visit_column
compiler_cls.visit_table = DatabricksProfilerInterface.visit_table
else:
logger.debug(
"DatabricksProfilerInterface: dialect has no statement_compiler; "
"skipping compiler monkey-patching."
)
except Exception as exc: # Defensive: do not break initialization
logger.debug(
"DatabricksProfilerInterface: failed to patch statement compiler: %r",
exc,
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

harshach
harshach previously approved these changes Mar 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

@github-actions github-actions Bot requested a review from a team as a code owner March 31, 2026 18:41
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (13)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (24)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2026-27962 🔥 CRITICAL 1.6.6 1.6.9
Authlib CVE-2026-28490 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28498 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28802 🚨 HIGH 1.6.6 1.6.7
PyJWT CVE-2026-32597 🚨 HIGH 2.11.0 2.12.0
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
apache-airflow-providers-http CVE-2025-69219 🚨 HIGH 5.6.4 6.0.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
pyasn1 CVE-2026-30922 🚨 HIGH 0.6.2 0.6.3
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
tornado CVE-2026-31958 🚨 HIGH 6.5.4 6.5.5
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (2)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.6 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2026-25679 🚨 HIGH v1.25.6 1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 31, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 61%
62.04% (61977/99885) 42.14% (33134/78616) 45.24% (9825/21713)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 31, 2026

🟡 Playwright Results — all passed (12 flaky)

✅ 3962 passed · ❌ 0 failed · 🟡 12 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 740 0 4 8
🟡 Shard 3 746 0 2 7
🟡 Shard 4 755 0 4 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 736 0 1 8
🟡 12 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Quick filters should persist when domain filter is applied and cleared (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 2 retries)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/NotificationAlerts.spec.ts › Conversation source alert (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Should clear search and show all properties for apiCollection in right panel (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Topic (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for DashboardDataModel (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@ulixius9 ulixius9 added this to Shipping Apr 6, 2026
@ulixius9 ulixius9 moved this to In Review / QA 👀 in Shipping Apr 6, 2026
Copilot AI review requested due to automatic review settings April 6, 2026 10:25
Copilot AI review requested due to automatic review settings April 27, 2026 16:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 36 changed files in this pull request and generated 3 comments.

{"catalog": self.service_connection_config.catalog},
).first()
catalog = self.service_connection_config.catalog
session.execute(text(f"USE CATALOG `{catalog}`"))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE dbservice_entity
SET json = JSON_SET(json, '$.connection.config.scheme', 'databricks')
WHERE serviceType IN ('Databricks', 'UnityCatalog')
AND JSON_EXTRACT(json, '$.connection.config.scheme') = 'databricks+connector';
Comment on lines +99 to +103
self.set_catalog(self.session)
HiveCompiler.visit_column = DatabricksProfilerInterface.visit_column
HiveCompiler.visit_table = DatabricksProfilerInterface.visit_table
from databricks.sqlalchemy._ddl import DatabricksStatementCompiler

DatabricksStatementCompiler.visit_column = DatabricksProfilerInterface.visit_column
DatabricksStatementCompiler.visit_table = DatabricksProfilerInterface.visit_table
Comment thread ingestion/src/metadata/ingestion/source/database/databricks/connection.py Outdated
Comment on lines +66 to +68
# Suppress noisy deprecation warning from databricks-sqlalchemy using
# the deprecated '_user_agent_entry' parameter internally
logging.getLogger("databricks.sql.session").setLevel(logging.ERROR)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 103 to 111
from databricks.sqlalchemy._ddl import DatabricksStatementCompiler

DatabricksStatementCompiler.visit_column = (
DatabricksProfilerInterface.visit_column
)
DatabricksStatementCompiler.visit_table = (
DatabricksProfilerInterface.visit_table
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread ingestion/src/metadata/mixins/sqalchemy/sqa_mixin.py
result = super( # pylint: disable=bad-super-call
HiveCompiler, self
).visit_table(*args, **kwargs)
result = SQLCompiler.visit_table(self, *args, **kwargs)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should double check how DBX reading from a hive metastore will behave

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested with legacy hive databricks, worked fine

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 27, 2026

Code Review ✅ Approved 3 resolved / 3 findings

Migration to databricks-sqlalchemy correctly handles catalog URL encoding and limits log suppression scope, resolving all previous issues. The monkey-patching concern has been dismissed.

✅ 3 resolved
Edge Case: Catalog value not URL-encoded in connection URL

📄 ingestion/src/metadata/ingestion/source/database/databricks/connection.py:140-143 📄 ingestion/src/metadata/ingestion/source/database/unitycatalog/connection.py:71-75
The get_connection_url functions in both Databricks and Unity Catalog connection modules directly interpolate connection.catalog into the URL query string without URL-encoding. If a catalog name contains special characters (&, =, #, %, spaces), this will produce a malformed URL or cause SQLAlchemy to misparse the query parameters.

The codebase already uses quote_plus for URL parameters elsewhere (e.g., get_connection_url_common in builders.py).

Quality: Module-level log suppression is too broad

📄 ingestion/src/metadata/ingestion/source/database/databricks/connection.py:61-63
Setting logging.getLogger('databricks.sql.session').setLevel(logging.ERROR) at module level suppresses all WARNING and INFO messages from the Databricks SQL session logger globally and permanently, not just the _user_agent_entry deprecation warning. This could hide legitimate warnings about connection issues, timeouts, or other important diagnostic information from the driver.

A more targeted approach would use a warnings.filterwarnings call to suppress only the specific deprecation warning.

Bug: Monkey-patching DatabricksStatementCompiler mutates global state

📄 ingestion/src/metadata/profiler/interface/sqlalchemy/databricks/profiler_interface.py:103-110
In __init__, DatabricksStatementCompiler.visit_column and visit_table are replaced with DatabricksProfilerInterface methods. Since this patches a class (not an instance), every DatabricksStatementCompiler instance in the process—including those used by non-profiler code paths—will now call the backtick-wrapping logic. This is the same pattern as the old HiveCompiler patching, but worth noting: if any non-profiler Databricks query compiles after a DatabricksProfilerInterface is instantiated, it will get unexpected backtick transformations.

Consider patching at module level (outside __init__) to make it clear this is a one-time global side effect, or better yet, subclass DatabricksStatementCompiler and register a custom dialect that uses the subclass.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

Status: In Review / QA 👀

Development

Successfully merging this pull request may close these issues.

4 participants