[Fix][Connector-V2][JDBC] Detect Hive partition keys in metadata by zhangqs0205 · Pull Request #10706 · apache/seatunnel

zhangqs0205 · 2026-04-03T09:13:24Z

Purpose of this pull request

This PR fixes missing Hive partitionKeys in JDBC catalog metadata.

It adds a JDBC dialect extension point for partition key detection, detects Hive partition keys from DESCRIBE <table>, and propagates them into the relevant CatalogTable metadata paths.

Does this PR introduce any user-facing change?

It fixes JDBC metadata for Hive tables so partition keys can be auto-detected instead of always being empty.

How was this patch tested?

Added HiveDialectTest and extended CatalogUtilsTest.

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If necessary, please update incompatible-changes.md to describe the incompatibility caused by this PR.
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

DanielLeens

I found that ran CatalogUtilsTest + HiveDialectTest, and walked the JDBC source path from JdbcCatalogUtils.getTables().

Blocking issue: the new Hive partition-key extraction does not hit the normal catalog path.

when a catalog is available, JdbcCatalogUtils.getTables() goes through getCatalogTable(tableConfig, jdbcCatalog, jdbcDialect) and then jdbcCatalog.getTable(tablePath) (seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/utils/JdbcCatalogUtils.java:78-94, 174-216)
AbstractJdbcCatalog.getTable() calls the protected getPartitionKeys(connection, tablePath) hook (seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/catalog/AbstractJdbcCatalog.java:230-236)
that hook still defaults to Collections.emptyList() in AbstractJdbcCatalog.java:257-259
the new SQL logic lives in HiveDialect.getPartitionKeys(...) (seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/hive/HiveDialect.java:77-90), but the catalog path never calls it

So the fix only works in the fallback "direct JDBC without catalog" branch, not in the normal catalog-backed path. The new tests pass because they cover HiveDialect and helper merging in isolation, not the real JdbcCatalogUtils -> AbstractJdbcCatalog.getTable() chain.

Please wire the Hive partition-key lookup into the catalog implementation that the normal source path actually uses.

davidzollo

+1
Good job, thanks for the fix.

DanielLeens · 2026-04-25T09:16:32Z

Hi @zhangqs0205, I rechecked the current head locally after the latest review activity.

The blocker from my earlier review is addressed on the current head:

the Hive partition keys are no longer only parsed in isolation
they are now carried into the real CatalogTable construction path
the current Build is green

Runtime chain I rechecked

Hive JDBC metadata
  -> HiveDialect.getPartitionKeys(connection, tablePath)
      -> parse DESCRIBE <table> partition section
  -> JdbcCatalogUtils.getCatalogTable(...)
      -> CatalogUtils.getCatalogTable(..., partitionKeys)
      -> CatalogTable.partitionKeys

The important difference from the older blocked version is that the partition keys are now passed through the reachable metadata path instead of only being tested at the parser/helper level.

Conclusion: can merge

Blocking items:

None from my side.

Non-blocking follow-up:

If you want even stronger confidence later, a more integration-style check against a real Hive-compatible driver output would still be nice hardening, but I would not block this PR on that.

Thanks for the fix. The current head looks good to me now.

outdated

DanielLeens

Hi @zhangqs0205, I re-reviewed the current head locally from seatunnel-review-10706 against upstream/dev. I did not run Maven locally in this pass; this is a source-level review plus current GitHub check metadata.

What This PR Fixes

User pain: Hive partition columns are often missing from JDBC catalog metadata, so table-path-based reads lose partition-key information and downstream schema checks become incomplete.
Fix approach: teach HiveDialect to parse partition keys from DESCRIBE <table> output, thread those keys into CatalogTable, and add unit + E2E coverage.
One-line value: this makes the Hive JDBC catalog path preserve the real partition-key contract instead of dropping it.

Runtime Chain I Checked

table_path source / catalog loading
  -> AbstractJdbcCatalog.getTable(...)
      -> getPartitionKeys(conn, tablePath)
          -> HiveDialect.getPartitionKeys()
              -> DESCRIBE db.table
              -> parse "# Partition Information" section
      -> CatalogTable.of(..., partitionKeys, ...)
  -> JdbcCatalogUtils.getTables(...)
      -> JdbcSourceTable carries partition keys into source metadata

Findings

HiveDialect.getPartitionKeys() is on the real metadata-loading path used by AbstractJdbcCatalog.getTable(), so the new parsing logic is not dead code.
The parsed partition-key list is now threaded into CatalogTable, which is the right contract boundary for later source/schema logic.
The unit tests cover both the DESCRIBE parsing path and the catalog-table propagation path, and the Hive E2E now asserts the partition table plus dt/hr metadata path. That is good coverage for the user-facing behavior.
I did not find a reopened code blocker in the latest implementation.

Merge Decision

Conclusion: can merge after fixes

Blocking items

No code blocker from my side.
The current Build check is still running, so merge should wait for CI to finish green.

Suggested non-blocking follow-up

None from me in the current code path.

Overall, this is a practical metadata fix and the added tests now cover the exact Hive partition-key path that users depend on.

DanielLeens

Thanks for working on this. I reviewed the full current head locally on seatunnel-review-10706 against upstream/dev. This was a source-level review only; I did not run local Maven/tests in this batch.

This is a practical metadata-path fix, and the important chain now looks right on the current head:

table_path source / catalog loading
  -> AbstractJdbcCatalog.getTable(...)
      -> getPartitionKeys(conn, tablePath)
          -> HiveDialect.getPartitionKeys()
              -> DESCRIBE db.table
              -> parse the "# Partition Information" section
      -> CatalogTable.of(..., partitionKeys, ...)
  -> JdbcCatalogUtils.getTables(...)
      -> JdbcSourceTable carries partition keys forward

Key points from the current head:

HiveDialect.getPartitionKeys() is on the real metadata-loading path used by AbstractJdbcCatalog.getTable(), so the new parsing logic is not dead code.
The parsed partition keys are now threaded into CatalogTable, which is the correct contract boundary for downstream schema/source logic.
The unit tests plus the Hive E2E cover the exact partition-key metadata path that users depend on.
I did not find a reopened code blocker in the latest implementation.

The only remaining gate from my side is CI: the current GitHub Build check is CANCELLED, so there is no green signal for the latest head yet.

Conclusion: can merge after fixes

Blocking items:

No code blocker from my side.
Re-run Build and make sure the latest head is green before merge.

Non-blocking follow-up:

None from me on the current code path.

This is a useful metadata fix and the runtime path now looks correctly wired.

DanielLeens

Thanks for the update. I re-reviewed the latest head locally against upstream/dev, focusing on the Hive metadata-loading path that feeds JDBC catalog/table-path reads.

Runtime path I checked:

table_path source / catalog loading
  -> AbstractJdbcCatalog.getTable(...)
      -> getPartitionKeys(conn, tablePath)
          -> HiveDialect.getPartitionKeys()
              -> DESCRIBE db.table
              -> parse "# Partition Information" section
      -> CatalogTable.of(..., partitionKeys, ...)
  -> JdbcCatalogUtils.getTables(...)
      -> JdbcSourceTable carries partition keys into source metadata

Key findings:

HiveDialect.getPartitionKeys() is on the real metadata-loading path used by AbstractJdbcCatalog.getTable(), so the new parsing logic is not dead code.
The parsed partition-key list is now threaded into CatalogTable, which is the right contract boundary for later source/schema logic.
The unit tests cover both the DESCRIBE parsing path and the catalog-table propagation path, and the Hive E2E now asserts the partition-table plus dt/hr metadata path.
I did not find a reopened code blocker in the latest implementation.

CI / verification:

I did not run Maven locally in this pass; this is a source-level review.
The current top-level Build status on the latest head is CANCELLED, so the branch still needs a clean rerun before merge.

Conclusion: merge after fixes

Blocking items

No code blocker from my side.
Please rerun the latest Build and get a clean green result before merge.

Suggested non-blocking follow-ups

None in the current code path.

Overall, this is a practical metadata fix, and the added tests cover the exact Hive partition-key path that users depend on.

davidzollo

Thanks for working on this. I re-reviewed the full current head locally on seatunnel-review-10706 against upstream/dev. This was a source-level review only; I did not run local Maven in this round.

What This PR Fixes

User pain: Hive/Inceptor metadata discovery is inconsistent through standard JDBC APIs, especially around partition keys and sometimes primary-key metadata.
Fix approach: the PR parses Hive partition keys from DESCRIBE output and also adds a fallback around DatabaseMetaData.getPrimaryKeys().
One-line summary: the Hive-specific direction is reasonable, but the shared primary-key fallback is currently too broad and changes behavior for every JDBC dialect, not just Hive.

1. Code Review

1.1 Core Logic Analysis

Exact change scope:
- seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/catalog/utils/CatalogUtils.java:130-163
- seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/hive/HiveDialect.java:78-112
Before / after:

// before
try (ResultSet rs = metaData.getPrimaryKeys(...)) {
    while (rs.next()) {
        primaryKeyColumns.add(...);
    }
}

// after
try (ResultSet rs = metaData.getPrimaryKeys(...)) {
    while (rs.next()) {
        primaryKeyColumns.add(...);
    }
} catch (SQLException e) {
    log.warn("Failed to get primary key info for table {}, returning empty primary key. Error: {}", tablePath, e.getMessage());
    return Optional.empty();
}

Key findings:
1. The Hive DESCRIBE parsing path is on the real runtime path and looks directionally correct.
2. The shared CatalogUtils.getPrimaryKey() helper is not Hive-only. Other JDBC catalogs also go through it.
3. Catching every SQLException here means a real metadata failure now becomes "this table has no primary key".
4. That is not just a logging choice. It changes downstream schema / constraint semantics.
Runtime chain I checked:

JDBC catalog introspection
  -> CatalogUtils.getPrimaryKey(DatabaseMetaData, TablePath)
      -> DatabaseMetaData.getPrimaryKeys(...)
      -> Optional<PrimaryKey>
  -> CatalogTable / TableSchema assembly
      -> downstream save-mode / upsert / constraint decisions use that PK signal

Hive-specific partition path
  -> HiveDialect.getPartitionKeys(Connection, TablePath)
      -> DESCRIBE table
      -> parse "# Partition Information" section

1.2 Compatibility Impact

Verdict: partially incompatible.
API/config/defaults/protocol/serialization are unchanged.
The incompatible part is behavior: a shared JDBC metadata failure is now downgraded to "no primary key", which is broader than the Hive problem this PR is trying to fix.

1.3 Performance / Side Effects

No meaningful CPU or memory regression from the Hive parser itself.
The main side effect is semantic, not performance-related: real metadata failures are now silently flattened into a different business meaning.

1.4 Error Handling and Logs

Issue 1: The shared primary-key fallback is too broad

Location: CatalogUtils.java:144
Why this is a problem:
this code sits on the normal JDBC metadata path. Right now any SQLException from getPrimaryKeys() becomes Optional.empty(), even when the driver should have returned real PK metadata and something actually went wrong.
Risk:
downstream logic can now treat a metadata failure as "this table genuinely has no primary key", which can mislead save-mode, upsert, or catalog-sync behavior.
Suggested fix:
narrow the fallback to dialects or exception types that are explicitly known to mean "primary key metadata is unsupported" instead of swallowing every SQLException.
Severity: High

Issue 2: The current Build is still red

Location: GitHub Build
Why this matters:
the latest Apache-side Build is still failing, and the linked fork run currently shows failures including unit-test (8, ubuntu-latest) and updated-modules-integration-test-part-4 (11, ubuntu-latest).
Severity: Medium

2. Code Quality

The new Hive partition parsing is readable and scoped correctly.
Test coverage is moving in the right direction, but I would still add one regression case proving that a non-Hive JDBC metadata failure is not silently downgraded to "empty PK".

3. Architecture

The Hive dialect part is a precise fix.
The shared fallback in CatalogUtils is not. It broadens the behavior boundary beyond the original problem.

4. Issue Summary

No.	Issue	Location	Severity
1	Shared PK metadata failures are downgraded too broadly	`CatalogUtils.java:144`	High
2	Current Build is still red	GitHub `Build`	Medium

5. Merge Conclusion

Conclusion: merge after fixes

Blocking items

Issue 1 must be fixed. The fallback needs to be narrowed so it does not change behavior for every JDBC dialect.
Issue 2 also needs to be cleared by getting the latest Build green.

Suggested non-blocking follow-up

Add one non-Hive regression test around shared PK metadata failures so this boundary does not widen again later.

The Hive-specific direction still looks good to me. The blocker is that the current shared fallback changes the semantics for all JDBC catalogs, not just Hive.

Posted from the wrong account by mistake; superseded by the DanielLeens review.

DanielLeens

Thanks for working on this. I re-reviewed the full current head locally on seatunnel-review-10706 against upstream/dev. This was a source-level review only; I did not run local Maven in this round.

What This PR Fixes

User pain: Hive/Inceptor metadata discovery is inconsistent through standard JDBC APIs, especially around partition keys and sometimes primary-key metadata.
Fix approach: the PR parses Hive partition keys from DESCRIBE output and also adds a fallback around DatabaseMetaData.getPrimaryKeys().
One-line summary: the Hive-specific direction is reasonable, but the shared primary-key fallback is currently too broad and changes behavior for every JDBC dialect, not just Hive.

1. Code Review

1.1 Core Logic Analysis

Exact change scope:
- seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/catalog/utils/CatalogUtils.java:130-163
- seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/hive/HiveDialect.java:78-112
Before / after:

// before
try (ResultSet rs = metaData.getPrimaryKeys(...)) {
    while (rs.next()) {
        primaryKeyColumns.add(...);
    }
}

// after
try (ResultSet rs = metaData.getPrimaryKeys(...)) {
    while (rs.next()) {
        primaryKeyColumns.add(...);
    }
} catch (SQLException e) {
    log.warn("Failed to get primary key info for table {}, returning empty primary key. Error: {}", tablePath, e.getMessage());
    return Optional.empty();
}

Key findings:
1. The Hive DESCRIBE parsing path is on the real runtime path and looks directionally correct.
2. The shared CatalogUtils.getPrimaryKey() helper is not Hive-only. Other JDBC catalogs also go through it.
3. Catching every SQLException here means a real metadata failure now becomes "this table has no primary key".
4. That is not just a logging choice. It changes downstream schema / constraint semantics.
Runtime chain I checked:

JDBC catalog introspection
  -> CatalogUtils.getPrimaryKey(DatabaseMetaData, TablePath)
      -> DatabaseMetaData.getPrimaryKeys(...)
      -> Optional<PrimaryKey>
  -> CatalogTable / TableSchema assembly
      -> downstream save-mode / upsert / constraint decisions use that PK signal

Hive-specific partition path
  -> HiveDialect.getPartitionKeys(Connection, TablePath)
      -> DESCRIBE table
      -> parse "# Partition Information" section

1.2 Compatibility Impact

Verdict: partially incompatible.
API/config/defaults/protocol/serialization are unchanged.
The incompatible part is behavior: a shared JDBC metadata failure is now downgraded to "no primary key", which is broader than the Hive problem this PR is trying to fix.

1.3 Performance / Side Effects

No meaningful CPU or memory regression from the Hive parser itself.
The main side effect is semantic, not performance-related: real metadata failures are now silently flattened into a different business meaning.

1.4 Error Handling and Logs

Issue 1: The shared primary-key fallback is too broad

Location: CatalogUtils.java:144
Why this is a problem:
this code sits on the normal JDBC metadata path. Right now any SQLException from getPrimaryKeys() becomes Optional.empty(), even when the driver should have returned real PK metadata and something actually went wrong.
Risk:
downstream logic can now treat a metadata failure as "this table genuinely has no primary key", which can mislead save-mode, upsert, or catalog-sync behavior.
Suggested fix:
narrow the fallback to dialects or exception types that are explicitly known to mean "primary key metadata is unsupported" instead of swallowing every SQLException.
Severity: High

Issue 2: The current Build is still red

Location: GitHub Build
Why this matters:
the latest Apache-side Build is still failing, and the linked fork run currently shows failures including unit-test (8, ubuntu-latest) and updated-modules-integration-test-part-4 (11, ubuntu-latest).
Severity: Medium

2. Code Quality

The new Hive partition parsing is readable and scoped correctly.
Test coverage is moving in the right direction, but I would still add one regression case proving that a non-Hive JDBC metadata failure is not silently downgraded to "empty PK".

3. Architecture

The Hive dialect part is a precise fix.
The shared fallback in CatalogUtils is not. It broadens the behavior boundary beyond the original problem.

4. Issue Summary

No.	Issue	Location	Severity
1	Shared PK metadata failures are downgraded too broadly	`CatalogUtils.java:144`	High
2	Current Build is still red	GitHub `Build`	Medium

5. Merge Conclusion

Conclusion: merge after fixes

Blocking items

Issue 1 must be fixed. The fallback needs to be narrowed so it does not change behavior for every JDBC dialect.
Issue 2 also needs to be cleared by getting the latest Build green.

Suggested non-blocking follow-up

Add one non-Hive regression test around shared PK metadata failures so this boundary does not widen again later.

The Hive-specific direction still looks good to me. The blocker is that the current shared fallback changes the semantics for all JDBC catalogs, not just Hive.

zhangqs0205 · 2026-05-13T07:01:18Z

Thanks for working on this. I re-reviewed the full current head locally on seatunnel-review-10706 against upstream/dev. This was a source-level review only; I did not run local Maven in this round.

What This PR Fixes

User pain: Hive/Inceptor metadata discovery is inconsistent through standard JDBC APIs, especially around partition keys and sometimes primary-key metadata.

Fix approach: the PR parses Hive partition keys from DESCRIBE output and also adds a fallback around DatabaseMetaData.getPrimaryKeys().

One-line summary: the Hive-specific direction is reasonable, but the shared primary-key fallback is currently too broad and changes behavior for every JDBC dialect, not just Hive.

1. Code Review

1.1 Core Logic Analysis

Exact change scope:

seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/catalog/utils/CatalogUtils.java:130-163

seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/hive/HiveDialect.java:78-112

Before / after:
// before
try (ResultSet rs = metaData.getPrimaryKeys(...)) {
    while (rs.next()) {
        primaryKeyColumns.add(...);
    }
}
// after
try (ResultSet rs = metaData.getPrimaryKeys(...)) {
   while (rs.next()) {
       primaryKeyColumns.add(...);
   }
} catch (SQLException e) {
   log.warn("Failed to get primary key info for table {}, returning empty primary key. Error: {}", tablePath, e.getMessage());
   return Optional.empty();
}
Key findings:

The Hive DESCRIBE parsing path is on the real runtime path and looks directionally correct.

The shared CatalogUtils.getPrimaryKey() helper is not Hive-only. Other JDBC catalogs also go through it.

Catching every SQLException here means a real metadata failure now becomes "this table has no primary key".

That is not just a logging choice. It changes downstream schema / constraint semantics.

Runtime chain I checked:
JDBC catalog introspection
  -> CatalogUtils.getPrimaryKey(DatabaseMetaData, TablePath)
      -> DatabaseMetaData.getPrimaryKeys(...)
      -> Optional<PrimaryKey>
  -> CatalogTable / TableSchema assembly
      -> downstream save-mode / upsert / constraint decisions use that PK signal

Hive-specific partition path
  -> HiveDialect.getPartitionKeys(Connection, TablePath)
      -> DESCRIBE table
      -> parse "# Partition Information" section
1.2 Compatibility Impact

Verdict: partially incompatible.

API/config/defaults/protocol/serialization are unchanged.

The incompatible part is behavior: a shared JDBC metadata failure is now downgraded to "no primary key", which is broader than the Hive problem this PR is trying to fix.

1.3 Performance / Side Effects

No meaningful CPU or memory regression from the Hive parser itself.

The main side effect is semantic, not performance-related: real metadata failures are now silently flattened into a different business meaning.

1.4 Error Handling and Logs

Issue 1: The shared primary-key fallback is too broad

Location: CatalogUtils.java:144

Why this is a problem:
this code sits on the normal JDBC metadata path. Right now any SQLException from getPrimaryKeys() becomes Optional.empty(), even when the driver should have returned real PK metadata and something actually went wrong.

Risk:
downstream logic can now treat a metadata failure as "this table genuinely has no primary key", which can mislead save-mode, upsert, or catalog-sync behavior.

Suggested fix:
narrow the fallback to dialects or exception types that are explicitly known to mean "primary key metadata is unsupported" instead of swallowing every SQLException.

Severity: High

Issue 2: The current Build is still red

Location: GitHub Build

Why this matters:
the latest Apache-side Build is still failing, and the linked fork run currently shows failures including unit-test (8, ubuntu-latest) and updated-modules-integration-test-part-4 (11, ubuntu-latest).

Severity: Medium

2. Code Quality

The new Hive partition parsing is readable and scoped correctly.

Test coverage is moving in the right direction, but I would still add one regression case proving that a non-Hive JDBC metadata failure is not silently downgraded to "empty PK".

3. Architecture

The Hive dialect part is a precise fix.

The shared fallback in CatalogUtils is not. It broadens the behavior boundary beyond the original problem.

4. Issue Summary

No. Issue Location Severity
1 Shared PK metadata failures are downgraded too broadly CatalogUtils.java:144 High
2 Current Build is still red GitHub Build Medium

5. Merge Conclusion

Conclusion: merge after fixes

Blocking items

Issue 1 must be fixed. The fallback needs to be narrowed so it does not change behavior for every JDBC dialect.

Issue 2 also needs to be cleared by getting the latest Build green.

Suggested non-blocking follow-up

Add one non-Hive regression test around shared PK metadata failures so this boundary does not widen again later.

The Hive-specific direction still looks good to me. The blocker is that the current shared fallback changes the semantics for all JDBC catalogs, not just Hive.

@DanielLeens
Thanks for the review. I agree that catching SQLException in the shared CatalogUtils.getPrimaryKey() is too broad and may change behavior for non-Hive JDBC catalogs. The original scope of this PR is Hive partition key detection, so I will remove this fallback from CatalogUtils and keep the change scoped to the Hive partition-key metadata path.

DanielLeens

Thanks for the update. I re-reviewed the latest head from scratch against upstream/dev, and I rechecked the real Hive metadata path rather than only the helper diff.

What This PR Fixes

User pain: Hive / Inceptor metadata discovery is inconsistent through standard JDBC APIs, especially for partition keys on table_path reads.
Fix approach: the current head adds a Hive-specific DESCRIBE parser for partition keys and threads those keys into CatalogTable without broadening the old shared primary-key fallback.
One-line summary: the Hive-specific fix is now scoped to the actual Hive metadata path instead of changing shared JDBC semantics for every dialect.

Runtime Chain Rechecked

user config with table_path
  -> JdbcCatalogUtils.getTables(...) [JdbcCatalogUtils.java:384-416]
      -> jdbcDialect.getPartitionKeys(connection, tablePath)
      -> CatalogUtils.getCatalogTable(connection, tablePath, typeMapper, partitionKeys)
          -> getTableSchema(...) still keeps the normal metadata / PK behavior
          -> CatalogTable(..., partitionKeys)

catalog path
  -> AbstractJdbcCatalog.getTable(...) [AbstractJdbcCatalog.java:232-267]
      -> getPartitionKeys(conn, tablePath)
          -> HiveDialect.getPartitionKeys(...) [HiveDialect.java:76-114]
              -> DESCRIBE db.table
              -> parse "# Partition Information"
      -> CatalogTable.of(..., partitionKeys, ...)

Findings

The previous Daniel blocker around the overly broad shared primary-key fallback is fixed on the current head. CatalogUtils.getTableSchema(...) still propagates SQLException, and the new regression test explicitly locks that behavior.
The Hive partition-key extraction is on the real metadata path now. Both JdbcCatalogUtils.getTables(...) and AbstractJdbcCatalog.getTable(...) thread the dialect-specific partition key list into CatalogTable.
The new tests cover the right contract boundaries for this patch: Hive DESCRIBE parsing, CatalogTable partition-key propagation, and the Hive E2E table-path path with dt/hr partitions.
I do not see a reopened source-level blocker on the current head.

Compatibility / Side Effects

No user-facing config or protocol changed.
The new behavior is Hive-specific; other JDBC dialects do not inherit the old broad SQLException -> no PK fallback anymore.
The only user-visible change is that Hive partition keys now appear correctly in CatalogTable metadata for the table-path flow.

Tests / CI

The focused unit tests and Hive E2E additions are aligned with the real code path.
The current top-level Build is still red on the reviewed head. The visible failures span a wide set of jobs, including unit-test (11, ubuntu-latest), updated-modules-integration-test-part-1/2/4/8, and many follow-on cancelled jobs, so there is not yet a clean green signal for the latest revision.

Conclusion: can merge after fixes

Blocking items:

No code blocker from my side on the current head.
Please get the latest Build back to green before merge.

Suggested non-blocking follow-up:

None for the current code path.

The latest revision fixes the scope problem Daniel previously blocked, and the Hive metadata path now looks correctly wired. The remaining gate is CI, not a reopened source-level issue.

DanielLeens

Thanks for working on this. I went through the latest head from the full JDBC catalog path instead of only the Hive-specific diff, and the Hive partition-key fix itself makes sense. I did find one blocking compatibility issue before this can be merged.

What problem this PR solves

User pain point
In Hive JDBC mode, DatabaseMetaData#getPrimaryKeys() is not reliable enough, so SeaTunnel can lose partition-key metadata while building CatalogTable.
Fix approach
The PR adds a dialect-level supportsPrimaryKeyMetadata() switch and lets HiveDialect parse partition keys from DESCRIBE instead of depending on JDBC primary-key metadata.
One-line summary
This makes Hive table discovery carry partition keys more reliably.

1. Code change review

1.1 Core logic analysis

The main runtime path is:

JdbcCatalogUtils.getTables() [JdbcCatalogUtils.java:143-147]
  -> CatalogUtils.getCatalogTable(connection, tablePath, jdbcDialect) [CatalogUtils.java:283-288]
    -> CatalogUtils.getTableSchema(metadata, tablePath, dialect) [CatalogUtils.java:230-237]
      -> dialect.supportsPrimaryKeyMetadata()
         -> HiveDialect returns false [HiveDialect.java:56-60]
         -> InceptorDialect extends HiveDialect [InceptorDialect.java:24-34]

So the PR is not only changing Hive behavior. It is also changing the generic JDBC fallback path for every dialect that inherits HiveDialect.

Before:

getTableSchema(metadata, tablePath, typeMapper, getPrimaryKey(metadata, tablePath))

After:

Optional<PrimaryKey> primaryKey =
    dialect.supportsPrimaryKeyMetadata()
        ? getPrimaryKey(metadata, tablePath)
        : Optional.empty();

1.2 Compatibility impact

Partially incompatible.

This changes how CatalogTable.getTableSchema().getPrimaryKey() is populated in the generic JDBC path. That is a compatibility-sensitive contract for downstream source/sink logic.

1.3 Performance / side effects

The Hive DESCRIBE path itself is fine, but the side effect is that Inceptor now loses primary-key metadata in the same fallback path unless it overrides the new behavior explicitly.

1.4 Error handling and logging

Issue 1: the new Hive-specific metadata switch also changes Inceptor primary-key behavior

Location:
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/catalog/utils/CatalogUtils.java:230-237
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/hive/HiveDialect.java:56-60
seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/inceptor/InceptorDialect.java:24-34
Problem:
The PR intends to special-case Hive, but InceptorDialect inherits HiveDialect, so it also inherits supportsPrimaryKeyMetadata() == false. That means the generic JDBC fallback path stops reading Inceptor primary-key metadata as well.
Risk:
This is a compatibility regression. Even if the Hive fix is correct, Inceptor users can silently lose CatalogTable primary-key information and affect downstream logic that depends on it.
Best fix:
Option A, recommended: scope the “skip PK metadata” behavior to real Hive only, and do not let Inceptor inherit it implicitly.
Option B: if Inceptor really must behave the same way, add explicit source evidence and regression coverage to show that this is intentional and safe.
Severity: High

2. Code quality evaluation

2.1 Code style

The code is readable, but the dialect inheritance boundary is too broad right now.

2.2 Test coverage and stability

Hive unit coverage was added for DESCRIBE parsing.
JdbcHiveIT now checks that partition keys are carried into CatalogTable.
I did not find a flaky-test pattern in the new tests.
However, there is still no Inceptor regression coverage for the inherited behavior change.

Test stability rating: Stable.

2.3 Documentation

No documentation blocker here; the main problem is the dialect compatibility boundary.

3. Architecture

3.1 Solution quality

This is a precise fix for Hive itself, but not yet a safe fix for the wider Hive-compatible dialect family.

3.2 Maintainability

Leaving this in the parent dialect will make future Hive-compatible dialects harder to reason about.

3.3 Extensibility

Future dialect exceptions should stay explicit at the concrete dialect level.

3.4 Historical compatibility

Not fully compatible in the current form because of the Inceptor regression risk.

4. Issue summary

Issue 1: the new Hive-specific metadata switch also changes Inceptor primary-key behavior
Location: CatalogUtils.java:230-237 / HiveDialect.java:56-60 / InceptorDialect.java:24-34
Severity: High

5. Merge conclusion

Conclusion: can merge after fixes

Blocking items
Issue 1: please narrow the new supportsPrimaryKeyMetadata() behavior so Inceptor does not lose primary-key metadata implicitly.
Suggested follow-ups
None.

One more note on CI: the current Build check is red on updated-modules-integration-test-part-4 (11, ubuntu-latest), which is consistent with this PR touching the JDBC/Hive integration-test bucket, so that failure should also be checked again after the dialect boundary is fixed.

DanielLeens

Sorry for missing this in the last review — that's on me. I re-reviewed the latest head from scratch and found one blocker that I should have caught earlier. This is a carryover from the previous version, not something introduced by the latest commit.

Thanks for the update. I walked the full current JDBC/Hive metadata path again instead of only looking at the helper diff.

What this PR fixes

User pain point: Hive table_path reads can lose partition-key metadata, so CatalogTable.partitionKeys stays empty even though the underlying table is partitioned.
Fix approach: the latest head routes the direct JDBC table_path path through CatalogUtils.getCatalogTable(connection, tablePath, jdbcDialect), lets HiveDialect.getPartitionKeys(...) parse the DESCRIBE <table> partition section, and keeps Inceptor on the normal primary-key metadata path via an explicit override.
One-line summary: this is the right direction for wiring Hive partition-key detection into the real JDBC metadata path that users actually hit.

Full runtime chain I rechecked

Jdbc source initialization
  -> JdbcSource.<init>() [JdbcSource.java:64-68]
      -> JdbcCatalogUtils.getTables(...) [JdbcCatalogUtils.java:78-146]
          -> findCatalog(...)
              -> no dedicated Hive / Inceptor catalog factory here, so the normal flow falls back to direct JDBC loading
          -> getCatalogTable(tableConfig, connection, jdbcDialect) [JdbcCatalogUtils.java:407-412]
              -> CatalogUtils.getCatalogTable(connection, tablePath, jdbcDialect) [CatalogUtils.java:283-288]
                  -> dialect.getPartitionKeys(connection, tablePath)
                      -> HiveDialect.getPartitionKeys(...) [HiveDialect.java:83-119]
                          -> DESCRIBE db.table
                          -> parse the "# Partition Information" section
                  -> getTableSchema(metadata, tablePath, dialect) [CatalogUtils.java:230-237]
                      -> HiveDialect.supportsPrimaryKeyMetadata() == false [HiveDialect.java:56-60]
                      -> InceptorDialect.supportsPrimaryKeyMetadata() == true [InceptorDialect.java:31-35]
                  -> CatalogTable.of(..., partitionKeys, ...)

Findings

The main runtime path does hit this fix. This is not dead helper code.
The Hive-specific metadata boundary now looks correct: Hive skips unreliable PK metadata, while Inceptor explicitly keeps it.
I do not see a reopened source-level blocker in the metadata-loading logic itself.
I did find one blocking regression in the newly added Hive E2E assertion.

Issue 1: the new Hive E2E assertion is internally inconsistent

Location: seatunnel-e2e/seatunnel-connector-v2-e2e/connector-jdbc-e2e/connector-jdbc-e2e-part-3/src/test/java/org/apache/seatunnel/connectors/seatunnel/jdbc/JdbcHiveIT.java:160-177, seatunnel-e2e/seatunnel-connector-v2-e2e/connector-jdbc-e2e/connector-jdbc-e2e-part-3/src/test/resources/jdbc_hive_partition_source_and_assert.conf:54-56
Problem:
the new test inserts 6 partitioned rows and increments id from 1 to 6, but the assert-sink config still requires the id field to satisfy [{min = 1}, {max = 3}].
With the real assert execution path, this is a deterministic failure rather than a theoretical edge case:
AssertExecutor.fail(...) checks field rules row by row, so rows with id=4/5/6 will violate the configured max = 3.
Risk:
the new E2E cannot serve as a reliable regression guard for this fix, and CI can fail even when the actual Hive metadata path is correct.
Recommended fix:
Option A, recommended: keep the current 6-row partition coverage and update the asserted id max to 6.
Option B: reduce the inserted sample size to 3 rows, but that weakens the partition-coverage value of the new E2E.
Severity: High

Compatibility / side effects

No API, config, default-value, protocol, or serialization-format change.
The runtime behavior change is the intended Hive metadata repair, while Inceptor keeps its previous PK-metadata behavior through the explicit override.

Tests

The new unit coverage is directionally good:
- HiveDialectTest covers DESCRIBE partition parsing.
- CatalogUtilsTest covers the Hive PK-metadata boundary and the identifier-case fallback boundary.
I do not see a new flaky-test pattern in the added tests.
The blocker is test correctness, not test stability.

Conclusion: can merge after fixes

Blocking items:

Fix the new Hive E2E assertion so it matches the 6 inserted partitioned rows.

Non-blocking follow-up:

The latest GitHub Build is still pending on the reviewed head. I do not see a new failure signature to attribute yet, but please wait for the updated Build result after fixing the E2E assertion.

Overall, the metadata-path repair itself looks right on the current head. The remaining blocker is that the newly added regression test is currently asserting an impossible contract.

DanielLeens

Thanks for the follow-up. I re-reviewed the latest head from scratch, with extra attention to the blocker from the previous Daniel review.

What problem this PR solves

User pain point
Hive table_path reads can lose partition-key metadata, so CatalogTable.partitionKeys stays empty even though the table is partitioned.
Fix approach
The PR routes the direct JDBC table_path path through the real CatalogTable construction flow and lets HiveDialect parse partition keys from DESCRIBE, while InceptorDialect keeps its own primary-key metadata contract.
One sentence
The metadata-path fix still looks right, and the blocking E2E assertion mismatch from the previous round is fixed on the latest head.

Simple example: the new Hive E2E inserts 6 rows across 3 partition dates and 2 partition hours. The assert config now checks id <= 6 instead of the old impossible id <= 3, so the regression guard matches the data it actually inserts.

1. Code change review

1.1 Core logic analysis

Runtime chain I rechecked:

Jdbc source initialization
  -> JdbcCatalogUtils.getCatalogTable(...)
      -> CatalogUtils.getCatalogTable(connection, tablePath, jdbcDialect)
          -> HiveDialect.getPartitionKeys(...)
              -> DESCRIBE db.table
              -> parse the partition section
          -> getTableSchema(...)
              -> HiveDialect.supportsPrimaryKeyMetadata() == false
              -> InceptorDialect.supportsPrimaryKeyMetadata() == true
          -> CatalogTable.of(..., partitionKeys, ...)

The previous blocker was in the new E2E assertion path. On the latest head:

JdbcHiveIT still inserts 6 partitioned rows (id = 1..6) in the same way.
jdbc_hive_partition_source_and_assert.conf now raises the id max from 3 to 6.
InceptorDialect still keeps the explicit supportsPrimaryKeyMetadata() == true override.

Key findings:

The real JDBC metadata path still hits this fix; it is not dead helper logic.
The previous Daniel blocker is fixed on the latest head.
I do not see a reopened source-level blocker in the Hive / Inceptor metadata boundary.
The remaining gate is CI rather than code-path correctness.

1.2 Compatibility impact

Fully compatible from the current source-review perspective. The intended Hive metadata repair remains scoped correctly, while Inceptor keeps its prior primary-key metadata behavior.

1.3 Performance / side effects

No meaningful CPU, memory, GC, concurrency, retry, or resource-release regression surfaced in the current head.

1.4 Error handling and logging

No new source-level blocking issue found on the latest head.

2. Code quality evaluation

2.1 Code style

The current dialect boundary is much cleaner than the earlier versions.

2.2 Test coverage and stability

HiveDialectTest covers partition parsing.
CatalogUtilsTest covers the Hive/Inceptor PK-metadata boundary.
The Hive E2E assertion is now internally consistent with the inserted data.

Test stability rating: Stable.

2.3 Documentation

No new documentation blocker from this round.

3. Architecture

3.1 Solution quality

This is now a precise fix for the real Hive metadata path.

3.2 Maintainability

Good. The Hive-specific exception stays explicit, and Inceptor no longer inherits it accidentally.

3.3 Extensibility

The pattern is reusable for other JDBC dialects that need concrete metadata overrides.

3.4 Historical compatibility

No historical-compatibility blocker found on the latest head.

4. Issue summary

No new source-level blocking issue found.

5. Merge conclusion

Conclusion: can merge

Blocking items

No code blocker from my side on the latest head.

Suggested but non-blocking follow-ups

The latest GitHub Build check is cancelled, so please rerun / get a green Build before the actual merge.

From the source-review side, the current head looks good to me now. Thanks for closing the E2E mismatch cleanly.

zhangqs0205 · 2026-05-18T14:17:45Z

@DanielLeens

Thanks for the review and the follow-up.

While adding the JdbcHiveIT coverage for the JdbcCatalogUtils.getTables() path, I found a few Hive JDBC metadata gaps that were hidden before, so I fixed them together with the regression coverage:

Hive partition keys are not exposed reliably through standard JDBC table/column metadata.
I added a Hive-specific DESCRIBE <table> partition parser and wired dialect.getPartitionKeys(...) into the real getTables() / CatalogTable metadata path, so CatalogTable.partitionKeys can now carry dt/hr correctly.
Hive primary-key metadata is not reliable through DatabaseMetaData#getPrimaryKeys().
I added the dialect capability supportsPrimaryKeyMetadata() and made Hive skip PK metadata lookup explicitly in getTableSchema(). This avoids treating Hive’s unsupported PK metadata path as a real table failure, while preserving the normal SQLException behavior for dialects that do support PK metadata.
Some Hive JDBC metadata capability methods for identifier case are unsupported.
I narrowed the fallback in JdbcIdentifierUtils.identifierCaseStrategy(...) to only handle SQLFeatureNotSupportedException and fall back to case-insensitive matching. Other real SQLExceptions are still propagated.

I also fixed the new Hive E2E assert config to match the 6 inserted partition rows and kept explicit field_value = [] entries for type-only checks, because the assert executor expects a non-null rule list when field_rules are configured.

davidzollo

+1 if CI passes.
LGTM

zhangqs0205 · 2026-05-18T14:23:00Z

By the way, CI in this project usually needs to be rerun several times before all checks pass.

davidzollo · 2026-05-18T14:32:37Z

By the way, CI in this project usually needs to be rerun several times before all checks pass.

Yes, you can only retry the failed CI tasks

DanielLeens · 2026-05-19T00:40:42Z

Hi @zhangqs0205, thanks for the detailed write-up — that context helps. The current Daniel approval already covers head d848475555ff1976ba906561c46a1f05941781a6, and the points you listed match the paths I traced on this same head through JdbcCatalogUtils.getTables(), JdbcCatalogUtils.getTableSchema(), and JdbcIdentifierUtils.identifierCaseStrategy(...). Since there is no new commit after that approval, I am treating this as a reply-only follow-up rather than starting another full review on unchanged code. On CI, the current Build result looks like a cancelled run rather than a source-level regression signal, so rerunning that check on the same head is the right next step.

zhangqs0205 · 2026-05-20T13:39:14Z

Hi @davidzollo ，CI is passed

DanielLeens · 2026-05-21T01:00:39Z

Thanks for the update, and good to hear CI is green now.

I checked the current PR head d848475555ff1976ba906561c46a1f05941781a6, and it is still the same revision I already reviewed earlier, so there is no new code delta for me to re-review in this round.

If you push another commit later, feel free to ping me again and I will take another pass.

github-actions Bot added connectors-v2 jdbc labels Apr 3, 2026

zhangqs0205 force-pushed the fix/jdbc-hive-partition-keys branch from c3656ac to eca75ed Compare April 3, 2026 09:40

This comment was marked as duplicate.

Sign in to view

DanielLeens previously requested changes Apr 6, 2026

View reviewed changes

DanielLeens mentioned this pull request Apr 10, 2026

[Bug] [connector-jdbc] Hive JDBC CatalogTable.partitionKeys is empty in JdbcSource #10597

Closed

3 tasks

davidzollo removed the connectors-v2 label Apr 14, 2026

davidzollo previously approved these changes Apr 25, 2026

View reviewed changes

github-actions Bot added the approved label Apr 25, 2026

github-actions Bot added the reviewed label Apr 25, 2026

zhangqs0205 dismissed davidzollo’s stale review via 1e8b9eb April 28, 2026 11:09

github-actions Bot added e2e connectors-v2 and removed approved reviewed labels Apr 28, 2026

[Fix][Connector-V2][JDBC] Detect Hive partition keys in metadata

4abb40b

zhangqs0205 force-pushed the fix/jdbc-hive-partition-keys branch from 1e8b9eb to 4abb40b Compare April 28, 2026 11:10

DanielLeens reviewed Apr 28, 2026

View reviewed changes

DanielLeens reviewed May 4, 2026

View reviewed changes

[Fix][Connector-V2] Harden Hive JDBC metadata loading

30cecd8

davidzollo previously requested changes May 7, 2026

View reviewed changes

DanielLeens suggested changes May 7, 2026

View reviewed changes

[Fix][Connector-V2] Preserve JDBC primary key metadata failures

1047697

DanielLeens reviewed May 14, 2026

View reviewed changes

zhangqingsong added 2 commits May 16, 2026 17:53

[Fix][Connector-V2] Scope Hive primary key metadata lookup

9e68dac

[Fix][Connector-V2] Handle unsupported JDBC identifier metadata

d907e88

DanielLeens reviewed May 16, 2026

View reviewed changes

[Fix][Connector-V2] Narrow Hive JDBC metadata fallback

f506c74

DanielLeens suggested changes May 17, 2026

View reviewed changes

[Fix][Connector-V2] Fix Hive partition assert config

d848475

DanielLeens approved these changes May 18, 2026

View reviewed changes

github-actions Bot added the reviewed label May 18, 2026

davidzollo approved these changes May 18, 2026

View reviewed changes

github-actions Bot added the approved label May 18, 2026

davidzollo merged commit 7c7701d into apache:dev May 26, 2026
4 checks passed

Conversation

zhangqs0205 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

DanielLeens left a comment • edited by davidzollo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidzollo left a comment

Choose a reason for hiding this comment

Uh oh!

DanielLeens commented Apr 25, 2026

Conclusion: can merge

Uh oh!

DanielLeens left a comment

Choose a reason for hiding this comment

What This PR Fixes

Runtime Chain I Checked

Findings

Merge Decision

Conclusion: can merge after fixes

Uh oh!

DanielLeens left a comment

Choose a reason for hiding this comment

Conclusion: can merge after fixes

Uh oh!

DanielLeens left a comment

Choose a reason for hiding this comment

Conclusion: merge after fixes

Uh oh!

davidzollo left a comment

Choose a reason for hiding this comment

What This PR Fixes

1. Code Review

1.1 Core Logic Analysis

1.2 Compatibility Impact

1.3 Performance / Side Effects

1.4 Error Handling and Logs

2. Code Quality

3. Architecture

4. Issue Summary

5. Merge Conclusion

Conclusion: merge after fixes

Uh oh!

DanielLeens left a comment

Choose a reason for hiding this comment

What This PR Fixes

1. Code Review

1.1 Core Logic Analysis

1.2 Compatibility Impact

1.3 Performance / Side Effects

1.4 Error Handling and Logs

2. Code Quality

3. Architecture

4. Issue Summary

5. Merge Conclusion

Conclusion: merge after fixes

Uh oh!

zhangqs0205 commented May 13, 2026

What This PR Fixes

1. Code Review

1.1 Core Logic Analysis

1.2 Compatibility Impact

1.3 Performance / Side Effects

1.4 Error Handling and Logs

2. Code Quality

3. Architecture

4. Issue Summary

5. Merge Conclusion

Conclusion: merge after fixes

Uh oh!

DanielLeens left a comment

zhangqs0205 commented Apr 3, 2026 •

edited

Loading

DanielLeens left a comment •

edited by davidzollo

Loading