Skip to content

[#11287] feat(spark-connector): Add view support#11288

Open
diqiu50 wants to merge 8 commits into
apache:mainfrom
diqiu50:spark-view
Open

[#11287] feat(spark-connector): Add view support#11288
diqiu50 wants to merge 8 commits into
apache:mainfrom
diqiu50:spark-view

Conversation

@diqiu50
Copy link
Copy Markdown
Contributor

@diqiu50 diqiu50 commented May 28, 2026

What changes were proposed in this pull request?

Add read-only view support to the Gravitino Spark Hive connector:

  • HiveViewCatalogOperations: accept Spark dialect in view creation and loading.
  • BaseCatalog: fall through to view lookup in loadTable when identifier is
    not a table; check view existence in tableExists.
  • SparkHiveView: new V2 Table that executes view SQL via SparkSession on the
    driver and returns rows as a single in-memory partition.
  • GravitinoHiveCatalog: override createSparkView to return SparkHiveView.

Why are the changes needed?

Fix: #11287

Does this PR introduce any user-facing change?

Yes — users can now SELECT from Gravitino views via Spark SQL on Hive catalogs.
Note: CREATE VIEW via Spark SQL is not supported due to a Spark V2 named catalog
limitation; views must be created through the Gravitino API.

How was this patch tested?

  • Unit test: testCreateViewAcceptsSparkDialect in TestHiveCatalogOperations
  • Integration tests: 4 new cases in SparkHiveCatalogIT:
    testSelectHiveView, testTableExistsForView, testShowTablesExcludesViews,
    testCreateViewViaSql

Copilot AI review requested due to automatic review settings May 28, 2026 12:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds read-only Gravitino view support to the Spark Hive V2 connector by falling back from table resolution to view resolution and introducing a Spark-side Table implementation to execute view SQL.

Changes:

  • Extend Spark connector table resolution (loadTable, tableExists) to recognize Gravitino views.
  • Add SparkHiveView to execute view SQL (Spark dialect preferred; Hive dialect fallback) and expose results via a V2 scan.
  • Update Hive view catalog operations to accept Spark dialect; add/extend unit + integration coverage for view behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
spark-connector/spark-common/src/main/java/org/apache/gravitino/spark/connector/catalog/BaseCatalog.java Adds view fallback in loadTable and view-aware existence checks; introduces loadViewAsTable/createSparkView hooks.
spark-connector/spark-common/src/main/java/org/apache/gravitino/spark/connector/hive/SparkHiveView.java New Spark V2 Table wrapper that executes stored view SQL and returns rows via a single partition scan.
spark-connector/spark-common/src/main/java/org/apache/gravitino/spark/connector/hive/GravitinoHiveCatalog.java Overrides createSparkView to wrap Hive views with SparkHiveView.
catalogs/catalog-hive/src/main/java/org/apache/gravitino/catalog/hive/HiveViewCatalogOperations.java Allows Spark dialect when validating/loading/serializing HMS views.
spark-connector/spark-common/src/test/java/org/apache/gravitino/spark/connector/integration/test/hive/SparkHiveCatalogIT.java Adds integration tests for selecting from views, tableExists behavior, SHOW TABLES behavior, and CREATE VIEW limitation.
catalogs/catalog-hive/src/test/java/org/apache/gravitino/catalog/hive/TestHiveCatalogOperations.java Updates unit test to assert Spark dialect is accepted for view creation.

@github-actions
Copy link
Copy Markdown

Code Coverage Report

Overall Project 66.53% -0.24% 🟢
Files changed 32.27% 🔴

Module Coverage
aliyun 1.72% 🔴
api 46.82% 🟢
authorization-common 85.96% 🟢
aws 3.66% 🔴
azure 2.47% 🔴
catalog-common 10.04% 🔴
catalog-fileset 80.33% 🟢
catalog-glue 66.08% 🟢
catalog-hive 79.8% -1.96% 🟢
catalog-jdbc-clickhouse 80.02% 🟢
catalog-jdbc-common 45.31% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.29% 🟢
catalog-jdbc-starrocks 78.51% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 44.89% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 85.65% 🟢
catalog-lakehouse-paimon 79.29% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.91% 🟢
common 49.99% 🟢
core 82.37% 🟢
filesystem-hadoop3 76.97% 🟢
flink 0.0% 🔴
flink-common 41.2% 🟢
flink-runtime 0.0% 🔴
gcp 14.12% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 53.26% 🟢
iceberg-common 54.98% 🟢
iceberg-rest-server 70.89% 🟢
idp-basic 85.99% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 20.83% 🔴
lance-rest-server 60.27% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.73% 🟢
server-common 72.88% 🟢
spark 32.79% 🔴
spark-common 37.39% -9.55% 🔴
trino-connector 39.44% 🔴
Files
Module File Coverage
catalog-hive HiveViewCatalogOperations.java 75.44% 🟢
HiveView.java 71.43% 🟢
spark-common BaseCatalog.java 0.0% 🔴
GravitinoHiveCatalog.java 0.0% 🔴
SparkHiveView.java 0.0% 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add view support to Spark connector

2 participants