Replace spark plugins regtests with JUnit by MonkeyCanCode · Pull Request #4588 · apache/polaris

MonkeyCanCode · 2026-05-31T03:12:11Z

ML: https://lists.apache.org/thread/4bx31cfbcqfxzgpsddvc9kcfbn9l093y

Sample PR to remove docker based regtests for spark plugins with JUnit IT that spawns a fresh JV on a pruned class path (only polaris spark bundle jar and spark dependencies). The rest of the SQL tests are already covered by integration tests and this close the gaps for JAR loading.

Checklist

🛡️ Don't disclose security issues! (contact security@apache.org)
🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
🧪 Added/updated tests with good coverage, or manually tested (and explained how)
💡 Added comments for complex logic
🧾 Updated CHANGELOG.md (if needed)
📚 Updated documentation in site/content/in-dev/unreleased (if needed)

dimas-b

Thanks for working on this PR, @MonkeyCanCode ! I like the general direction of this refactoring. Minor comments below.

dimas-b · 2026-06-02T01:19:47Z

+    try (SparkSession spark = SparkSession.builder().getOrCreate()) {
+      spark.sql("USE polaris");
+      spark.sql("CREATE NAMESPACE bundle_ns");
+      spark.sql("CREATE TABLE bundle_ns.t (id INT, value STRING) USING ICEBERG");


What happens if/when this SQL fails?

So if any of the sql failed, the catch block would catch it (e.g. change USE polaris to USE polaris_known) such as following:

./gradlew :polaris-spark-integration-3.5_2.12:intTest ... BundleJarSanityIT > testBundleJarLoading(Path, PolarisApiEndpoints, ClientCredentials) FAILED java.lang.AssertionError at BundleJarSanityIT.java:142 ... 2026-06-01 21:59:43,992 INFO [io.qua.htt.access-log] [ea6f1dc2-296f-4477-ae6d-52f0c49c7a4c_0000000000000000002,POLARIS] [,,,] (executor-thread-1) 127.0.0.1 - root [01/Jun/2026:21:59:43 -0500] "POST /api/management/v1/catalogs HTTP/1.1" 201 425 2026-06-01 21:59:44,008 INFO [io.qua.htt.access-log] [ea6f1dc2-296f-4477-ae6d-52f0c49c7a4c_0000000000000000003,POLARIS] [,,,] (executor-thread-1) 127.0.0.1 - - [01/Jun/2026:21:59:44 -0500] "POST /api/catalog/v1/oauth/tokens HTTP/1.1" 200 765 [Isolated Spark] SLF4J(W): Class path contains multiple SLF4J providers. [Isolated Spark] SLF4J(W): Found provider [org.slf4j.impl.JBossSlf4jServiceProvider@57bc27f5] [Isolated Spark] SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@5fb759d6] [Isolated Spark] SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation. [Isolated Spark] SLF4J(I): Actual provider is of type [org.slf4j.impl.JBossSlf4jServiceProvider@57bc27f5] [Isolated Spark] Jun 01, 2026 9:59:50 PM org.jboss.logmanager.JBossLoggerFinder getLogger [Isolated Spark] ERROR: The LogManager accessed before the "java.util.logging.manager" system property was set to "org.jboss.logmanager.LogManager". Results may be unexpected. [Isolated Spark] org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: [SCHEMA_NOT_FOUND] The schema `polaris_known` cannot be found. Verify the spelling and correctness of the schema and catalog. [Isolated Spark] If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog. [Isolated Spark] To tolerate the error on drop use DROP SCHEMA IF EXISTS. [Isolated Spark] at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$setCurrentNamespace$1(CatalogManager.scala:122) [Isolated Spark] at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$setCurrentNamespace$1$adapted(CatalogManager.scala:119) [Isolated Spark] at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabaseWithNameCheck(SessionCatalog.scala:344) [Isolated Spark] at org.apache.spark.sql.connector.catalog.CatalogManager.setCurrentNamespace(CatalogManager.scala:119) [Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2(SetCatalogAndNamespaceExec.scala:36) [Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2$adapted(SetCatalogAndNamespaceExec.scala:36) [Isolated Spark] at scala.Option.foreach(Option.scala:407) [Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.run(SetCatalogAndNamespaceExec.scala:36) [Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) [Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) [Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) [Isolated Spark] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107) [Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) [Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) [Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) [Isolated Spark] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) [Isolated Spark] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107) [Isolated Spark] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) [Isolated Spark] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461) [Isolated Spark] at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) [Isolated Spark] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461) [Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) [Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) [Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) [Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) [Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) [Isolated Spark] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437) [Isolated Spark] at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98) [Isolated Spark] at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85) [Isolated Spark] at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83) [Isolated Spark] at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220) [Isolated Spark] at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) [Isolated Spark] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [Isolated Spark] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) [Isolated Spark] at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691) [Isolated Spark] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) [Isolated Spark] at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682) [Isolated Spark] at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713) [Isolated Spark] at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744) [Isolated Spark] at org.apache.polaris.spark.quarkus.it.BundleSanityChecker.main(BundleSanityChecker.java:26)

dimas-b · 2026-06-02T01:25:12Z

+      throws Exception {
+    // Filter the current classpath: drop polaris-spark / polaris-core so the bundle jar
+    // is the sole source of those classes; keep external jars (spark-sql, iceberg, etc.).
+    String[] parts = System.getProperty("java.class.path").split(File.pathSeparator);


This is a neat idea... yet, I was thinking about using Gradle to build the class path (from test dependencies, without polaris-* artifacts) then run *IT tests via JUnit.

If other classes inside intTest need Polaris code, we can create a new test dir (e.g. sparkTest) for these new test cases (similar to cloudTest).

Then we could make a SparkSession directly here.

I hope the presence of JUnit on the class path is not a concern.

WDYT?

I was think the real gap between the current regtest (under plugin/spark and not the one from project root directory) and integration is the "spark-shell --jar xxxx", which this approach is more close to real world simulation with a new process will try to mimics how a user actually deploy the jar. With the purposed route, I am worried we may run into classpath hell or hide some packaging bugs.

If we think the purposed approach shouldn't be a concern with and be more preferred, I am fine with making the requested changes.

Replace spark plugins regtests with JUnit

0bb33a6

github-project-automation Bot added this to Basic Kanban Board May 31, 2026

github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board May 31, 2026

MonkeyCanCode requested review from dimas-b, flyrain, gh-yzou and snazy May 31, 2026 03:35

Merge branch 'main' into spark_plugin_remove_regtests

45ae7e6

dimas-b reviewed Jun 2, 2026

View reviewed changes

Use AssertJ

5ed94a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace spark plugins regtests with JUnit#4588

Replace spark plugins regtests with JUnit#4588
MonkeyCanCode wants to merge 3 commits into
apache:mainfrom
MonkeyCanCode:spark_plugin_remove_regtests

MonkeyCanCode commented May 31, 2026 •

edited

Loading

Uh oh!

dimas-b left a comment

Uh oh!

dimas-b Jun 2, 2026

Uh oh!

MonkeyCanCode Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

dimas-b Jun 2, 2026

Uh oh!

MonkeyCanCode Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MonkeyCanCode commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

dimas-b Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

MonkeyCanCode Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dimas-b Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

MonkeyCanCode Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MonkeyCanCode commented May 31, 2026 •

edited

Loading

MonkeyCanCode Jun 2, 2026 •

edited

Loading