feat(hive): add MSSQL and Oracle backends for Hive metastore #26977
feat(hive): add MSSQL and Oracle backends for Hive metastore #26977SaaiAravindhRaja wants to merge 13 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Adds MSSQL and Oracle as additional supported Hive metastore backends (alongside MySQL/Postgres) by extending the Hive connection schema, registering new metastore connection handlers, and introducing SQLAlchemy metastore dialect plugins with unit tests.
Changes:
- Extend
hiveConnection.jsonmetastore connectiononeOfto include MSSQL and Oracle connection schemas. - Register Hive metastore connection creation for
MssqlConnectionandOracleConnection, and expand metastore connection validation to include both. - Add new metastore SQLAlchemy dialect packages (
hive.mssql,hive.oracle) and corresponding unit tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/hiveConnection.json | Adds MSSQL/Oracle connection schema refs to Hive metastore backend options. |
| ingestion/src/metadata/ingestion/source/database/hive/connection.py | Registers new metastore connection handlers for MSSQL and Oracle and updates validation in test_connection. |
| ingestion/src/metadata/ingestion/source/database/hive/metadata.py | Extends metastore connection validation/parsing to include MSSQL and Oracle. |
| ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/mssql/dialect.py | Introduces MSSQL Hive metastore SQL dialect implementation. |
| ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/mssql/init.py | Registers hive.mssql SQLAlchemy dialect plugin. |
| ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/oracle/dialect.py | Introduces Oracle Hive metastore SQL dialect implementation. |
| ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/oracle/init.py | Registers hive.oracle SQLAlchemy dialect plugin. |
| ingestion/tests/unit/topology/database/test_hive.py | Adds unit tests for metastore connection validation for MSSQL/Oracle objects and dicts. |
| ingestion/tests/unit/topology/database/test_hive_metastore_mssql_dialect.py | Adds unit tests validating MSSQL dialect SQL shape and behaviors. |
| ingestion/tests/unit/topology/database/test_hive_metastore_oracle_dialect.py | Adds unit tests validating Oracle dialect SQL shape and behaviors. |
| return create_generic_db_connection( | ||
| connection=custom_connection, | ||
| get_connection_url_fn=get_connection_url_common, | ||
| get_connection_args_fn=get_connection_args_common, | ||
| ) |
There was a problem hiding this comment.
Oracle metastore handler builds the SQLAlchemy URL via get_connection_url_common(), but OracleConnection doesn’t have a database/databaseSchema field and requires oracleConnectionType (service_name / schema / TNS). As a result the generated hive+oracle://... URL will omit the service name/schema and is likely invalid. Consider reusing the existing Oracle URL builder (e.g., OracleConnection.get_connection_url / a shared helper) so oracleConnectionType is encoded into the URL correctly for Hive metastore connections.
| @get_metastore_connection.register | ||
| def _(connection: OracleConnection): | ||
| # import required to load sqlalchemy plugin | ||
| # pylint: disable=import-outside-toplevel,unused-import | ||
| from metadata.ingestion.source.database.hive.metastore_dialects.oracle import ( # nopycln: import | ||
| HiveOracleMetaStoreDialect, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Oracle metastore connection path doesn’t apply the cx_Oracle→oracledb compatibility shim used by the main Oracle connector (sys.modules["cx_Oracle"] = oracledb, version pinning, etc.). Since HiveOracleMetaStoreDialect inherits OracleDialect_cx_oracle, engine creation will fail unless cx_Oracle is installed. Reuse the Oracle connector’s URL builder / initialization logic (or switch the dialect to the oracledb driver) so the required DBAPI is consistently available.
| from sqlalchemy import text | ||
| from sqlalchemy.dialects.mssql.pyodbc import MSDialect_pyodbc | ||
| from sqlalchemy.engine import reflection |
There was a problem hiding this comment.
HiveMssqlMetaStoreDialect inherits from MSDialect_pyodbc, which forces a pyodbc DBAPI dependency and expects pyodbc-style URL parameters (e.g., driver=). However the Hive metastore connection code overrides the scheme to hive+mssql and uses get_connection_url_common(), which does not include the ODBC driver and will commonly be used with the default MSSQL stack (sqlalchemy-pytds) where pyodbc isn’t installed. This combination will likely break MSSQL metastore connections unless users install the separate mssql-odbc extras and configure driver options. Prefer inheriting from sqlalchemy_pytds.dialect.MSDialect_pytds (matching the default mssql+pytds support), or otherwise align the dialect base class and URL builder with the intended MSSQL driver requirements.
| class HiveMssqlMetaStoreDialect(HiveMetaStoreDialectMixin, MSDialect_pyodbc): | ||
| """ | ||
| MSSQL metastore dialect class for Hive metastore backed by SQL Server. | ||
| Uses square-bracket quoting compatible with MSSQL and supports CTEs. |
There was a problem hiding this comment.
The MSSQL dialect docstring says it “Uses square-bracket quoting”, but the implementation deliberately uses unquoted identifiers (and the unit tests assert that double-quoted identifiers are not used). Please update the docstring to match the actual quoting strategy to avoid misleading future maintainers.
| Uses square-bracket quoting compatible with MSSQL and supports CTEs. | |
| Uses unquoted identifiers (no automatic identifier quoting) and supports CTEs. |
| schema_join = ( | ||
| f""" | ||
| JOIN DBS db ON tbsl.DB_ID = db.DB_ID | ||
| AND db.NAME = '{schema}' | ||
| """ |
There was a problem hiding this comment.
This query interpolates schema and table_name directly into SQL via f-strings. Besides being vulnerable to SQL injection if these values are ever user-controlled, it will also break for legitimate names containing quotes/special characters. Prefer using bound parameters with text() (e.g., ... = :schema, ... = :table_name) and passing values via .bindparams(...), or use SQLAlchemy constructs for safe quoting.
| schema_join = ( | ||
| f""" | ||
| JOIN "DBS" db ON tbsl."DB_ID" = db."DB_ID" | ||
| AND db."NAME" = '{schema}' | ||
| """ |
There was a problem hiding this comment.
This query interpolates schema and table_name directly into SQL via f-strings. Besides being vulnerable to SQL injection if these values are ever user-controlled, it will also break for legitimate names containing quotes/special characters. Prefer using bound parameters with text() (e.g., ... = :schema, ... = :table_name) and passing values via .bindparams(...), or use SQLAlchemy constructs for safe quoting.
|
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.airlift:aircompressor |
CVE-2025-67721 | 🚨 HIGH | 0.27 | 2.0.3 |
io.netty:netty-codec-http |
CVE-2026-33870 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.10.Final |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
CVE-2026-33871 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.11.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.spark:spark-core_2.12 |
CVE-2025-54920 | 🚨 HIGH | 3.5.6 | 3.5.7 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (26)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
Authlib |
CVE-2026-27962 | 🔥 CRITICAL | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28490 | 🚨 HIGH | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28498 | 🚨 HIGH | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28802 | 🚨 HIGH | 1.6.6 | 1.6.7 |
PyJWT |
CVE-2026-32597 | 🚨 HIGH | 2.11.0 | 2.12.0 |
Werkzeug |
CVE-2024-34069 | 🚨 HIGH | 2.2.3 | 3.0.3 |
aiohttp |
CVE-2025-69223 | 🚨 HIGH | 3.12.12 | 3.13.3 |
apache-airflow |
CVE-2026-26929 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-28779 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-30911 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow-providers-http |
CVE-2025-69219 | 🚨 HIGH | 5.6.4 | 6.0.0 |
cryptography |
CVE-2026-26007 | 🚨 HIGH | 42.0.8 | 46.0.5 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 5.3.0 | 6.1.0 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 6.0.1 | 6.1.0 |
litellm |
CVE-2026-35030 | 🔥 CRITICAL | 1.81.6 | 1.83.0 |
litellm |
CVE-2026-35029 | 🚨 HIGH | 1.81.6 | 1.83.0 |
protobuf |
CVE-2026-0994 | 🚨 HIGH | 4.25.8 | 6.33.5, 5.29.6 |
pyOpenSSL |
CVE-2026-27459 | 🚨 HIGH | 24.1.0 | 26.0.0 |
pyasn1 |
CVE-2026-30922 | 🚨 HIGH | 0.6.2 | 0.6.3 |
ray |
CVE-2025-62593 | 🔥 CRITICAL | 2.47.1 | 2.52.0 |
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
tornado |
CVE-2026-31958 | 🚨 HIGH | 6.5.4 | 6.5.5 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: usr/bin/docker
Vulnerabilities (2)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
stdlib |
CVE-2025-68121 | 🔥 CRITICAL | v1.25.6 | 1.24.13, 1.25.7, 1.26.0-rc.3 |
stdlib |
CVE-2026-25679 | 🚨 HIGH | v1.25.6 | 1.25.8, 1.26.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
libpng-dev |
CVE-2026-33416 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng-dev |
CVE-2026-33636 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng16-16 |
CVE-2026-33416 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng16-16 |
CVE-2026-33636 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Java
Vulnerabilities (37)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.airlift:aircompressor |
CVE-2025-67721 | 🚨 HIGH | 0.27 | 2.0.3 |
io.netty:netty-codec-http |
CVE-2026-33870 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.10.Final |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
CVE-2026-33871 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.11.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.spark:spark-core_2.12 |
CVE-2025-54920 | 🚨 HIGH | 3.5.6 | 3.5.7 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (13)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
apache-airflow |
CVE-2026-26929 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-28779 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-30911 | 🚨 HIGH | 3.1.7 | 3.1.8 |
cryptography |
CVE-2026-26007 | 🚨 HIGH | 42.0.8 | 46.0.5 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 5.3.0 | 6.1.0 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 6.0.1 | 6.1.0 |
pyOpenSSL |
CVE-2026-27459 | 🚨 HIGH | 24.1.0 | 26.0.0 |
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/extended_sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/lineage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data_aut.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage_aut.yaml
No Vulnerabilities Found
🟡 Playwright Results — all passed (19 flaky)✅ 3954 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 86 skipped
🟡 19 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
|
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
ff767a3 to
7b8ed7d
Compare
|
| def _get_table_columns(self, connection, table_name, schema): | ||
| schema_join = ( | ||
| f""" | ||
| JOIN DBS db ON tbsl.DB_ID = db.DB_ID | ||
| AND db.NAME = '{schema}' | ||
| """ | ||
| if schema | ||
| else "" | ||
| ) | ||
|
|
||
| query = f""" | ||
| WITH regular_columns AS ( | ||
| SELECT | ||
| col.COLUMN_NAME, | ||
| col.TYPE_NAME, | ||
| col.COMMENT | ||
| FROM COLUMNS_V2 col | ||
| JOIN CDS cds ON col.CD_ID = cds.CD_ID | ||
| JOIN SDS sds ON sds.CD_ID = cds.CD_ID | ||
| JOIN TBLS tbsl ON sds.SD_ID = tbsl.SD_ID | ||
| AND tbsl.TBL_NAME = '{table_name}' | ||
| {schema_join} | ||
| ), | ||
| partition_columns AS ( | ||
| SELECT | ||
| pk.PKEY_NAME AS COLUMN_NAME, | ||
| pk.PKEY_TYPE AS TYPE_NAME, | ||
| pk.PKEY_COMMENT AS COMMENT | ||
| FROM PARTITION_KEYS pk | ||
| JOIN TBLS tbsl ON pk.TBL_ID = tbsl.TBL_ID | ||
| AND tbsl.TBL_NAME = '{table_name}' | ||
| {schema_join} | ||
| ) |
There was a problem hiding this comment.
These queries interpolate schema/table_name directly into SQL strings. Since schema can ultimately come from user-provided config (and table_name may include quotes), this creates an injection risk and can also break on names containing '. Use SQLAlchemy bind parameters for values in predicates (e.g., ... db.NAME = :schema, ... TBL_NAME = :table_name) instead of f-string substitution.
| def _get_table_columns(self, connection, table_name, schema): | ||
| schema_join = ( | ||
| f""" | ||
| JOIN "DBS" db ON tbsl."DB_ID" = db."DB_ID" | ||
| AND db."NAME" = '{schema}' | ||
| """ | ||
| if schema | ||
| else "" | ||
| ) | ||
|
|
||
| query = f""" | ||
| WITH regular_columns AS ( | ||
| SELECT | ||
| col."COLUMN_NAME", | ||
| col."TYPE_NAME", | ||
| col."COMMENT" | ||
| FROM "COLUMNS_V2" col | ||
| JOIN "CDS" cds ON col."CD_ID" = cds."CD_ID" | ||
| JOIN "SDS" sds ON sds."CD_ID" = cds."CD_ID" | ||
| JOIN "TBLS" tbsl ON sds."SD_ID" = tbsl."SD_ID" | ||
| AND tbsl."TBL_NAME" = '{table_name}' | ||
| {schema_join} | ||
| ), | ||
| partition_columns AS ( | ||
| SELECT | ||
| pk."PKEY_NAME" AS "COLUMN_NAME", | ||
| pk."PKEY_TYPE" AS "TYPE_NAME", | ||
| pk."PKEY_COMMENT" AS "COMMENT" | ||
| FROM "PARTITION_KEYS" pk | ||
| JOIN "TBLS" tbsl ON pk."TBL_ID" = tbsl."TBL_ID" | ||
| AND tbsl."TBL_NAME" = '{table_name}' | ||
| {schema_join} | ||
| ) |
There was a problem hiding this comment.
These queries interpolate schema/table_name directly into SQL strings. Since schema can come from user-provided config (and names may include '), this creates an injection risk and can break on identifiers containing quotes. Prefer SQLAlchemy bind parameters for predicate values (e.g., db."NAME" = :schema, tbsl."TBL_NAME" = :table_name) rather than f-string substitution.
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
Code Review 👍 Approved with suggestions 4 resolved / 5 findingsIntegrates MSSQL and Oracle backends for the Hive metastore, resolving previous issues with test assertions, query syntax, and redundant connection logic. Simplification of the ternary conditional on line 155 is recommended for conciseness. 💡 Quality: Redundant ternary
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
| connection_copy = deepcopy(connection.__dict__) | ||
| connection_copy["scheme"] = CustomMssqlScheme.HIVE_MSSQL | ||
|
|
||
| custom_connection = CustomMssqlConnection(**connection_copy) | ||
|
|
There was a problem hiding this comment.
For MSSQL metastore connections this overwrites the user-selected MssqlConnection.scheme (mssql+pyodbc/pytds/pymssql) with a single hive+mssql scheme. Since mssqlConnection.json defines multiple schemes (defaulting to mssql+pytds), this can change the DBAPI/driver used for the metastore and break connections in environments that rely on the configured/default scheme. Consider preserving the original scheme and applying the Hive metastore reflection behavior another way, or mapping each supported MSSQL scheme to a distinct hive-metastore scheme/dialect so driver selection is retained.
| from sqlalchemy import text | ||
| from sqlalchemy.dialects.mssql.base import MSDialect | ||
| from sqlalchemy.engine import reflection | ||
|
|
||
| from metadata.ingestion.source.database.hive.metastore_dialects.mixin import ( | ||
| HiveMetaStoreDialectMixin, | ||
| ) | ||
| from metadata.utils.logger import ingestion_logger | ||
| from metadata.utils.sqlalchemy_utils import ( | ||
| get_table_comment_wrapper, | ||
| get_view_definition_wrapper, | ||
| ) | ||
|
|
||
| logger = ingestion_logger() | ||
|
|
||
|
|
||
| # pylint: disable=abstract-method | ||
| class HiveMssqlMetaStoreDialect(HiveMetaStoreDialectMixin, MSDialect): | ||
| """ | ||
| MSSQL metastore dialect class for Hive metastore backed by SQL Server. | ||
| Uses unquoted identifiers and supports CTEs. | ||
| """ | ||
|
|
||
| name = "hive" | ||
| driver = "mssql" | ||
| supports_statement_cache = False |
There was a problem hiding this comment.
This dialect subclasses sqlalchemy.dialects.mssql.base.MSDialect, but MSSQL connectivity in this codebase is scheme/driver-specific (mssql+pyodbc, mssql+pytds, mssql+pymssql). If the metastore dialect always uses the base dialect, it can’t reflect the configured driver’s behavior and can diverge from the scheme used elsewhere (notably default mssql+pytds). Consider providing driver-specific Hive metastore dialects (e.g. based on the selected scheme) or explicitly constraining/validating which MSSQL driver is supported for Hive metastore connections.
| try: | ||
| settings = self.metadata.get_profiler_config_settings() | ||
| except Exception as exc: | ||
| logger.debug(f"Could not fetch global profiler config: {exc}") |
There was a problem hiding this comment.
Catching all exceptions here is fine for resiliency, but logging at debug without a stack trace makes diagnosing recurring failures difficult (and f-strings eagerly format even when debug is disabled). Consider logging with exc_info=True (or logger.exception at debug/warn as appropriate) and using lazy formatting so the root cause isn’t lost while still allowing the processor to continue.
| logger.debug(f"Could not fetch global profiler config: {exc}") | |
| logger.debug( | |
| "Could not fetch global profiler config: %s", exc, exc_info=True | |
| ) |
|
|



Fixes #12787
Adds MSSQL and Oracle as supported backends for Hive metastore, alongside the existing MySQL and PostgreSQL support.
Changes
metastore_dialects/mssql/andmetastore_dialects/oracle/packages with SQL dialect implementationshiveConnection.json: addmssqlConnectionandoracleConnectiontometastoreConnection.oneOfconnection.py: register@get_metastore_connectionsingledispatch handlers forMssqlConnectionandOracleConnectionmetadata.py: extend_get_validated_metastore_connectionto handle both new connection typesMSSQL notes: uses CTEs (unlike MySQL 5.7) with unquoted identifiers
Oracle notes: uses CTEs with double-quoted identifiers (mirrors Postgres pattern)
Summary by Gitar
SamplerProcessorto gracefully tolerate transient failures when fetching profiler configuration.This will update automatically on new commits.