Skip to content

feat(hive): add MSSQL and Oracle backends for Hive metastore #26977

Open
SaaiAravindhRaja wants to merge 13 commits intoopen-metadata:mainfrom
SaaiAravindhRaja:feat/issue-12787-hive-metastore-mssql-oracle
Open

feat(hive): add MSSQL and Oracle backends for Hive metastore #26977
SaaiAravindhRaja wants to merge 13 commits intoopen-metadata:mainfrom
SaaiAravindhRaja:feat/issue-12787-hive-metastore-mssql-oracle

Conversation

@SaaiAravindhRaja
Copy link
Copy Markdown
Contributor

@SaaiAravindhRaja SaaiAravindhRaja commented Apr 2, 2026

Fixes #12787

Adds MSSQL and Oracle as supported backends for Hive metastore, alongside the existing MySQL and PostgreSQL support.

Changes

  • New metastore_dialects/mssql/ and metastore_dialects/oracle/ packages with SQL dialect implementations
  • hiveConnection.json: add mssqlConnection and oracleConnection to metastoreConnection.oneOf
  • connection.py: register @get_metastore_connection singledispatch handlers for MssqlConnection and OracleConnection
  • metadata.py: extend _get_validated_metastore_connection to handle both new connection types
  • Unit tests: dialect-level SQL tests and metastore connection validation tests for both backends

MSSQL notes: uses CTEs (unlike MySQL 5.7) with unquoted identifiers
Oracle notes: uses CTEs with double-quoted identifiers (mirrors Postgres pattern)


Summary by Gitar

  • SQL security:
    • Parameterized all metastore dialect queries to prevent SQL injection vulnerabilities and removed unsafe string interpolation.
  • Stability improvements:
    • Added error handling in SamplerProcessor to gracefully tolerate transient failures when fetching profiler configuration.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 2, 2026 11:11
@SaaiAravindhRaja SaaiAravindhRaja requested a review from a team as a code owner April 2, 2026 11:11
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/tests/unit/topology/database/test_hive_metastore_mssql_dialect.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds MSSQL and Oracle as additional supported Hive metastore backends (alongside MySQL/Postgres) by extending the Hive connection schema, registering new metastore connection handlers, and introducing SQLAlchemy metastore dialect plugins with unit tests.

Changes:

  • Extend hiveConnection.json metastore connection oneOf to include MSSQL and Oracle connection schemas.
  • Register Hive metastore connection creation for MssqlConnection and OracleConnection, and expand metastore connection validation to include both.
  • Add new metastore SQLAlchemy dialect packages (hive.mssql, hive.oracle) and corresponding unit tests.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/hiveConnection.json Adds MSSQL/Oracle connection schema refs to Hive metastore backend options.
ingestion/src/metadata/ingestion/source/database/hive/connection.py Registers new metastore connection handlers for MSSQL and Oracle and updates validation in test_connection.
ingestion/src/metadata/ingestion/source/database/hive/metadata.py Extends metastore connection validation/parsing to include MSSQL and Oracle.
ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/mssql/dialect.py Introduces MSSQL Hive metastore SQL dialect implementation.
ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/mssql/init.py Registers hive.mssql SQLAlchemy dialect plugin.
ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/oracle/dialect.py Introduces Oracle Hive metastore SQL dialect implementation.
ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/oracle/init.py Registers hive.oracle SQLAlchemy dialect plugin.
ingestion/tests/unit/topology/database/test_hive.py Adds unit tests for metastore connection validation for MSSQL/Oracle objects and dicts.
ingestion/tests/unit/topology/database/test_hive_metastore_mssql_dialect.py Adds unit tests validating MSSQL dialect SQL shape and behaviors.
ingestion/tests/unit/topology/database/test_hive_metastore_oracle_dialect.py Adds unit tests validating Oracle dialect SQL shape and behaviors.

Comment on lines +273 to +277
return create_generic_db_connection(
connection=custom_connection,
get_connection_url_fn=get_connection_url_common,
get_connection_args_fn=get_connection_args_common,
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle metastore handler builds the SQLAlchemy URL via get_connection_url_common(), but OracleConnection doesn’t have a database/databaseSchema field and requires oracleConnectionType (service_name / schema / TNS). As a result the generated hive+oracle://... URL will omit the service name/schema and is likely invalid. Consider reusing the existing Oracle URL builder (e.g., OracleConnection.get_connection_url / a shared helper) so oracleConnectionType is encoded into the URL correctly for Hive metastore connections.

Copilot uses AI. Check for mistakes.
Comment on lines +254 to +261
@get_metastore_connection.register
def _(connection: OracleConnection):
# import required to load sqlalchemy plugin
# pylint: disable=import-outside-toplevel,unused-import
from metadata.ingestion.source.database.hive.metastore_dialects.oracle import ( # nopycln: import
HiveOracleMetaStoreDialect,
)

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle metastore connection path doesn’t apply the cx_Oracle→oracledb compatibility shim used by the main Oracle connector (sys.modules["cx_Oracle"] = oracledb, version pinning, etc.). Since HiveOracleMetaStoreDialect inherits OracleDialect_cx_oracle, engine creation will fail unless cx_Oracle is installed. Reuse the Oracle connector’s URL builder / initialization logic (or switch the dialect to the oracledb driver) so the required DBAPI is consistently available.

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +16
from sqlalchemy import text
from sqlalchemy.dialects.mssql.pyodbc import MSDialect_pyodbc
from sqlalchemy.engine import reflection
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveMssqlMetaStoreDialect inherits from MSDialect_pyodbc, which forces a pyodbc DBAPI dependency and expects pyodbc-style URL parameters (e.g., driver=). However the Hive metastore connection code overrides the scheme to hive+mssql and uses get_connection_url_common(), which does not include the ODBC driver and will commonly be used with the default MSSQL stack (sqlalchemy-pytds) where pyodbc isn’t installed. This combination will likely break MSSQL metastore connections unless users install the separate mssql-odbc extras and configure driver options. Prefer inheriting from sqlalchemy_pytds.dialect.MSDialect_pytds (matching the default mssql+pytds support), or otherwise align the dialect base class and URL builder with the intended MSSQL driver requirements.

Copilot uses AI. Check for mistakes.
class HiveMssqlMetaStoreDialect(HiveMetaStoreDialectMixin, MSDialect_pyodbc):
"""
MSSQL metastore dialect class for Hive metastore backed by SQL Server.
Uses square-bracket quoting compatible with MSSQL and supports CTEs.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSSQL dialect docstring says it “Uses square-bracket quoting”, but the implementation deliberately uses unquoted identifiers (and the unit tests assert that double-quoted identifiers are not used). Please update the docstring to match the actual quoting strategy to avoid misleading future maintainers.

Suggested change
Uses square-bracket quoting compatible with MSSQL and supports CTEs.
Uses unquoted identifiers (no automatic identifier quoting) and supports CTEs.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +62
schema_join = (
f"""
JOIN DBS db ON tbsl.DB_ID = db.DB_ID
AND db.NAME = '{schema}'
"""
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query interpolates schema and table_name directly into SQL via f-strings. Besides being vulnerable to SQL injection if these values are ever user-controlled, it will also break for legitimate names containing quotes/special characters. Prefer using bound parameters with text() (e.g., ... = :schema, ... = :table_name) and passing values via .bindparams(...), or use SQLAlchemy constructs for safe quoting.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +62
schema_join = (
f"""
JOIN "DBS" db ON tbsl."DB_ID" = db."DB_ID"
AND db."NAME" = '{schema}'
"""
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query interpolates schema and table_name directly into SQL via f-strings. Besides being vulnerable to SQL injection if these values are ever user-controlled, it will also break for legitimate names containing quotes/special characters. Prefer using bound parameters with text() (e.g., ... = :schema, ... = :table_name) and passing values via .bindparams(...), or use SQLAlchemy constructs for safe quoting.

Copilot uses AI. Check for mistakes.
@harshach harshach added the safe to test Add this label to run secure Github workflows on PRs label Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

⚠️ TypeScript Types Need Update

The generated TypeScript types are out of sync with the JSON schema changes.

Since this is a pull request from a forked repository, the types cannot be automatically committed.
Please generate and commit the types manually:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true
git add src/generated/
git commit -m "Update generated TypeScript types"
git push

After pushing the changes, this check will pass automatically.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (26)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2026-27962 🔥 CRITICAL 1.6.6 1.6.9
Authlib CVE-2026-28490 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28498 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28802 🚨 HIGH 1.6.6 1.6.7
PyJWT CVE-2026-32597 🚨 HIGH 2.11.0 2.12.0
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
apache-airflow-providers-http CVE-2025-69219 🚨 HIGH 5.6.4 6.0.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
litellm CVE-2026-35030 🔥 CRITICAL 1.81.6 1.83.0
litellm CVE-2026-35029 🚨 HIGH 1.81.6 1.83.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
pyasn1 CVE-2026-30922 🚨 HIGH 0.6.2 0.6.3
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
tornado CVE-2026-31958 🚨 HIGH 6.5.4 6.5.5
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (2)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.6 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2026-25679 🚨 HIGH v1.25.6 1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpng-dev CVE-2026-33416 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng-dev CVE-2026-33636 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng16-16 CVE-2026-33416 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng16-16 CVE-2026-33636 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (13)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🟡 Playwright Results — all passed (19 flaky)

✅ 3954 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 294 0 5 4
🟡 Shard 2 751 0 8 8
🟡 Shard 3 729 0 3 7
🟡 Shard 4 758 0 1 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 735 0 2 8
🟡 19 flaky test(s) (passed on retry)
  • Features/TagsSuggestion.spec.ts › should decline suggested tags for a container column (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from help section (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
  • Features/Container.spec.ts › Copy column link button should copy the column URL to clipboard (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Subdomain assets should be visible when parent domain is selected (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Search suggestions should be filtered by selected domain (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
  • Features/IncidentManager.spec.ts › Next, Previous and page indicator (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

⚠️ TypeScript Types Need Update

The generated TypeScript types are out of sync with the JSON schema changes.

Since this is a pull request from a forked repository, the types cannot be automatically committed.
Please generate and commit the types manually:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true
git add src/generated/
git commit -m "Update generated TypeScript types"
git push

After pushing the changes, this check will pass automatically.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copilot AI review requested due to automatic review settings April 17, 2026 16:56
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ TypeScript Types Need Update

The generated TypeScript types are out of sync with the JSON schema changes.

Since this is a pull request from a forked repository, the types cannot be automatically committed.
Please generate and commit the types manually:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true
git add src/generated/
git commit -m "Update generated TypeScript types"
git push

After pushing the changes, this check will pass automatically.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.

Comment on lines +58 to +90
def _get_table_columns(self, connection, table_name, schema):
schema_join = (
f"""
JOIN DBS db ON tbsl.DB_ID = db.DB_ID
AND db.NAME = '{schema}'
"""
if schema
else ""
)

query = f"""
WITH regular_columns AS (
SELECT
col.COLUMN_NAME,
col.TYPE_NAME,
col.COMMENT
FROM COLUMNS_V2 col
JOIN CDS cds ON col.CD_ID = cds.CD_ID
JOIN SDS sds ON sds.CD_ID = cds.CD_ID
JOIN TBLS tbsl ON sds.SD_ID = tbsl.SD_ID
AND tbsl.TBL_NAME = '{table_name}'
{schema_join}
),
partition_columns AS (
SELECT
pk.PKEY_NAME AS COLUMN_NAME,
pk.PKEY_TYPE AS TYPE_NAME,
pk.PKEY_COMMENT AS COMMENT
FROM PARTITION_KEYS pk
JOIN TBLS tbsl ON pk.TBL_ID = tbsl.TBL_ID
AND tbsl.TBL_NAME = '{table_name}'
{schema_join}
)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These queries interpolate schema/table_name directly into SQL strings. Since schema can ultimately come from user-provided config (and table_name may include quotes), this creates an injection risk and can also break on names containing '. Use SQLAlchemy bind parameters for values in predicates (e.g., ... db.NAME = :schema, ... TBL_NAME = :table_name) instead of f-string substitution.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +90
def _get_table_columns(self, connection, table_name, schema):
schema_join = (
f"""
JOIN "DBS" db ON tbsl."DB_ID" = db."DB_ID"
AND db."NAME" = '{schema}'
"""
if schema
else ""
)

query = f"""
WITH regular_columns AS (
SELECT
col."COLUMN_NAME",
col."TYPE_NAME",
col."COMMENT"
FROM "COLUMNS_V2" col
JOIN "CDS" cds ON col."CD_ID" = cds."CD_ID"
JOIN "SDS" sds ON sds."CD_ID" = cds."CD_ID"
JOIN "TBLS" tbsl ON sds."SD_ID" = tbsl."SD_ID"
AND tbsl."TBL_NAME" = '{table_name}'
{schema_join}
),
partition_columns AS (
SELECT
pk."PKEY_NAME" AS "COLUMN_NAME",
pk."PKEY_TYPE" AS "TYPE_NAME",
pk."PKEY_COMMENT" AS "COMMENT"
FROM "PARTITION_KEYS" pk
JOIN "TBLS" tbsl ON pk."TBL_ID" = tbsl."TBL_ID"
AND tbsl."TBL_NAME" = '{table_name}'
{schema_join}
)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These queries interpolate schema/table_name directly into SQL strings. Since schema can come from user-provided config (and names may include '), this creates an injection risk and can break on identifiers containing quotes. Prefer SQLAlchemy bind parameters for predicate values (e.g., db."NAME" = :schema, tbsl."TBL_NAME" = :table_name) rather than f-string substitution.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 31 changed files in this pull request and generated no new comments.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 26, 2026

Code Review 👍 Approved with suggestions 4 resolved / 5 findings

Integrates MSSQL and Oracle backends for the Hive metastore, resolving previous issues with test assertions, query syntax, and redundant connection logic. Simplification of the ternary conditional on line 155 is recommended for conciseness.

💡 Quality: Redundant ternary sample_data_config if sample_data_config else None

On line 155, sample_data_config if sample_data_config else None is equivalent to just sample_data_config, since the variable is already None when the condition is false. This adds unnecessary verbosity.

Suggested fix
data=sampler_interface.generate_sample_data(
    sample_data_config
),
✅ 4 resolved
Bug: Test asserts SELECT count == 2, but CTE query contains 4 SELECTs

📄 ingestion/tests/unit/topology/database/test_hive_metastore_mssql_dialect.py:90
In test_get_table_columns_query_structure, the assertion executed_query.upper().count("SELECT") == 2 is incorrect. The CTE-based query has 4 SELECT keywords: one each inside the regular_columns and partition_columns CTEs, plus SELECT * FROM regular_columns and SELECT * FROM partition_columns in the UNION ALL. This test should fail when run.

Quality: Trailing semicolon in MSSQL view definition query

📄 ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/mssql/dialect.py:44 📄 ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/mssql/dialect.py:118
The get_view_definition query in the MSSQL dialect has a trailing semicolon (line 118: WHERE tbls.VIEW_ORIGINAL_TEXT IS NOT NULL;), while get_table_comment in the same file (line 139) and the Oracle dialect's equivalent do not. While sqlalchemy.text() generally tolerates trailing semicolons, this inconsistency could cause subtle issues with certain MSSQL ODBC driver configurations. Similarly, get_schema_names (line 44) has a semicolon while other queries don't.

Quality: Duplicated metastore connection validation logic in two files

📄 ingestion/src/metadata/ingestion/source/database/hive/metadata.py:110-119 📄 ingestion/src/metadata/ingestion/source/database/hive/connection.py:301-313 📄 ingestion/src/metadata/ingestion/source/database/hive/metadata.py:110-121 📄 ingestion/src/metadata/ingestion/source/database/hive/connection.py:301-315
The dict-to-connection-object validation loop (iterating over PostgresConnection, MysqlConnection, MssqlConnection, OracleConnection) is duplicated in both metadata.py::_get_validated_metastore_connection and connection.py::test_connection. If a new backend is added, both must be updated in lockstep. Consider extracting this into a shared helper (e.g., resolve_metastore_connection(raw_dict) -> Connection) to keep the connection type list in one place.

Edge Case: Missing local error handling for global config fetch in processor

In processor.py, the call to self.metadata.get_profiler_config_settings() (line 142) has no local try/except. While the outer _run try/except catches the exception, it causes the entire table's sample data collection to fail if a transient API error occurs when fetching the global profiler config. This is inconsistent with messaging_service.py (line 167-183), which catches the exception locally and defaults to False (allowing normal operation).

A transient network error or API blip when fetching the global config should not prevent sample data collection for that table — the safe default is to proceed without the global override.

🤖 Prompt for agents
Code Review: Integrates MSSQL and Oracle backends for the Hive metastore, resolving previous issues with test assertions, query syntax, and redundant connection logic. Simplification of the ternary conditional on line 155 is recommended for conciseness.

1. 💡 Quality: Redundant ternary `sample_data_config if sample_data_config else None`

   On line 155, `sample_data_config if sample_data_config else None` is equivalent to just `sample_data_config`, since the variable is already `None` when the condition is false. This adds unnecessary verbosity.

   Suggested fix:
   data=sampler_interface.generate_sample_data(
       sample_data_config
   ),

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 32 changed files in this pull request and generated 3 comments.

Comment on lines +229 to +233
connection_copy = deepcopy(connection.__dict__)
connection_copy["scheme"] = CustomMssqlScheme.HIVE_MSSQL

custom_connection = CustomMssqlConnection(**connection_copy)

Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MSSQL metastore connections this overwrites the user-selected MssqlConnection.scheme (mssql+pyodbc/pytds/pymssql) with a single hive+mssql scheme. Since mssqlConnection.json defines multiple schemes (defaulting to mssql+pytds), this can change the DBAPI/driver used for the metastore and break connections in environments that rely on the configured/default scheme. Consider preserving the original scheme and applying the Hive metastore reflection behavior another way, or mapping each supported MSSQL scheme to a distinct hive-metastore scheme/dialect so driver selection is retained.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +40
from sqlalchemy import text
from sqlalchemy.dialects.mssql.base import MSDialect
from sqlalchemy.engine import reflection

from metadata.ingestion.source.database.hive.metastore_dialects.mixin import (
HiveMetaStoreDialectMixin,
)
from metadata.utils.logger import ingestion_logger
from metadata.utils.sqlalchemy_utils import (
get_table_comment_wrapper,
get_view_definition_wrapper,
)

logger = ingestion_logger()


# pylint: disable=abstract-method
class HiveMssqlMetaStoreDialect(HiveMetaStoreDialectMixin, MSDialect):
"""
MSSQL metastore dialect class for Hive metastore backed by SQL Server.
Uses unquoted identifiers and supports CTEs.
"""

name = "hive"
driver = "mssql"
supports_statement_cache = False
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dialect subclasses sqlalchemy.dialects.mssql.base.MSDialect, but MSSQL connectivity in this codebase is scheme/driver-specific (mssql+pyodbc, mssql+pytds, mssql+pymssql). If the metastore dialect always uses the base dialect, it can’t reflect the configured driver’s behavior and can diverge from the scheme used elsewhere (notably default mssql+pytds). Consider providing driver-specific Hive metastore dialects (e.g. based on the selected scheme) or explicitly constraining/validating which MSSQL driver is supported for Hive metastore connections.

Copilot uses AI. Check for mistakes.
try:
settings = self.metadata.get_profiler_config_settings()
except Exception as exc:
logger.debug(f"Could not fetch global profiler config: {exc}")
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching all exceptions here is fine for resiliency, but logging at debug without a stack trace makes diagnosing recurring failures difficult (and f-strings eagerly format even when debug is disabled). Consider logging with exc_info=True (or logger.exception at debug/warn as appropriate) and using lazy formatting so the root cause isn’t lost while still allowing the processor to continue.

Suggested change
logger.debug(f"Could not fetch global profiler config: {exc}")
logger.debug(
"Could not fetch global profiler config: %s", exc, exc_info=True
)

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MSSQL & Oracle backends to Hive metastore

4 participants