Replace spark plugins regtests with JUnit#4588
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for working on this PR, @MonkeyCanCode ! I like the general direction of this refactoring. Minor comments below.
| try (SparkSession spark = SparkSession.builder().getOrCreate()) { | ||
| spark.sql("USE polaris"); | ||
| spark.sql("CREATE NAMESPACE bundle_ns"); | ||
| spark.sql("CREATE TABLE bundle_ns.t (id INT, value STRING) USING ICEBERG"); |
There was a problem hiding this comment.
What happens if/when this SQL fails?
There was a problem hiding this comment.
So if any of the sql failed, the catch block would catch it (e.g. change USE polaris to USE polaris_known) such as following:
./gradlew :polaris-spark-integration-3.5_2.12:intTest
...
BundleJarSanityIT > testBundleJarLoading(Path, PolarisApiEndpoints, ClientCredentials) FAILED
java.lang.AssertionError at BundleJarSanityIT.java:142
...
2026-06-01 21:59:43,992 INFO [io.qua.htt.access-log] [ea6f1dc2-296f-4477-ae6d-52f0c49c7a4c_0000000000000000002,POLARIS] [,,,] (executor-thread-1) 127.0.0.1 - root [01/Jun/2026:21:59:43 -0500] "POST /api/management/v1/catalogs HTTP/1.1" 201 425
2026-06-01 21:59:44,008 INFO [io.qua.htt.access-log] [ea6f1dc2-296f-4477-ae6d-52f0c49c7a4c_0000000000000000003,POLARIS] [,,,] (executor-thread-1) 127.0.0.1 - - [01/Jun/2026:21:59:44 -0500] "POST /api/catalog/v1/oauth/tokens HTTP/1.1" 200 765
[Isolated Spark] SLF4J(W): Class path contains multiple SLF4J providers.
[Isolated Spark] SLF4J(W): Found provider [org.slf4j.impl.JBossSlf4jServiceProvider@57bc27f5]
[Isolated Spark] SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@5fb759d6]
[Isolated Spark] SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
[Isolated Spark] SLF4J(I): Actual provider is of type [org.slf4j.impl.JBossSlf4jServiceProvider@57bc27f5]
[Isolated Spark] Jun 01, 2026 9:59:50 PM org.jboss.logmanager.JBossLoggerFinder getLogger
[Isolated Spark] ERROR: The LogManager accessed before the "java.util.logging.manager" system property was set to "org.jboss.logmanager.LogManager". Results may be unexpected.
[Isolated Spark] org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: [SCHEMA_NOT_FOUND] The schema `polaris_known` cannot be found. Verify the spelling and correctness of the schema and catalog.
[Isolated Spark] If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
[Isolated Spark] To tolerate the error on drop use DROP SCHEMA IF EXISTS.
[Isolated Spark] at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$setCurrentNamespace$1(CatalogManager.scala:122)
[Isolated Spark] at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$setCurrentNamespace$1$adapted(CatalogManager.scala:119)
[Isolated Spark] at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabaseWithNameCheck(SessionCatalog.scala:344)
[Isolated Spark] at org.apache.spark.sql.connector.catalog.CatalogManager.setCurrentNamespace(CatalogManager.scala:119)
[Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2(SetCatalogAndNamespaceExec.scala:36)
[Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2$adapted(SetCatalogAndNamespaceExec.scala:36)
[Isolated Spark] at scala.Option.foreach(Option.scala:407)
[Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.run(SetCatalogAndNamespaceExec.scala:36)
[Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
[Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
[Isolated Spark] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
[Isolated Spark] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
[Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
[Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
[Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
[Isolated Spark] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
[Isolated Spark] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
[Isolated Spark] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
[Isolated Spark] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
[Isolated Spark] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
[Isolated Spark] at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
[Isolated Spark] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
[Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
[Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
[Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
[Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
[Isolated Spark] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
[Isolated Spark] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
[Isolated Spark] at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
[Isolated Spark] at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
[Isolated Spark] at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
[Isolated Spark] at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
[Isolated Spark] at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
[Isolated Spark] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
[Isolated Spark] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
[Isolated Spark] at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
[Isolated Spark] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
[Isolated Spark] at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
[Isolated Spark] at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
[Isolated Spark] at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
[Isolated Spark] at org.apache.polaris.spark.quarkus.it.BundleSanityChecker.main(BundleSanityChecker.java:26)
| throws Exception { | ||
| // Filter the current classpath: drop polaris-spark / polaris-core so the bundle jar | ||
| // is the sole source of those classes; keep external jars (spark-sql, iceberg, etc.). | ||
| String[] parts = System.getProperty("java.class.path").split(File.pathSeparator); |
There was a problem hiding this comment.
This is a neat idea... yet, I was thinking about using Gradle to build the class path (from test dependencies, without polaris-* artifacts) then run *IT tests via JUnit.
If other classes inside intTest need Polaris code, we can create a new test dir (e.g. sparkTest) for these new test cases (similar to cloudTest).
Then we could make a SparkSession directly here.
I hope the presence of JUnit on the class path is not a concern.
WDYT?
There was a problem hiding this comment.
I was think the real gap between the current regtest (under plugin/spark and not the one from project root directory) and integration is the "spark-shell --jar xxxx", which this approach is more close to real world simulation with a new process will try to mimics how a user actually deploy the jar. With the purposed route, I am worried we may run into classpath hell or hide some packaging bugs.
If we think the purposed approach shouldn't be a concern with and be more preferred, I am fine with making the requested changes.
ML: https://lists.apache.org/thread/4bx31cfbcqfxzgpsddvc9kcfbn9l093y
Sample PR to remove docker based regtests for spark plugins with JUnit IT that spawns a fresh JV on a pruned class path (only polaris spark bundle jar and spark dependencies). The rest of the SQL tests are already covered by integration tests and this close the gaps for JAR loading.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)