Skip to content

Replace spark plugins regtests with JUnit#4588

Open
MonkeyCanCode wants to merge 3 commits into
apache:mainfrom
MonkeyCanCode:spark_plugin_remove_regtests
Open

Replace spark plugins regtests with JUnit#4588
MonkeyCanCode wants to merge 3 commits into
apache:mainfrom
MonkeyCanCode:spark_plugin_remove_regtests

Conversation

@MonkeyCanCode
Copy link
Copy Markdown
Contributor

@MonkeyCanCode MonkeyCanCode commented May 31, 2026

ML: https://lists.apache.org/thread/4bx31cfbcqfxzgpsddvc9kcfbn9l093y

Sample PR to remove docker based regtests for spark plugins with JUnit IT that spawns a fresh JV on a pruned class path (only polaris spark bundle jar and spark dependencies). The rest of the SQL tests are already covered by integration tests and this close the gaps for JAR loading.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this PR, @MonkeyCanCode ! I like the general direction of this refactoring. Minor comments below.

try (SparkSession spark = SparkSession.builder().getOrCreate()) {
spark.sql("USE polaris");
spark.sql("CREATE NAMESPACE bundle_ns");
spark.sql("CREATE TABLE bundle_ns.t (id INT, value STRING) USING ICEBERG");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if/when this SQL fails?

Copy link
Copy Markdown
Contributor Author

@MonkeyCanCode MonkeyCanCode Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if any of the sql failed, the catch block would catch it (e.g. change USE polaris to USE polaris_known) such as following:

./gradlew :polaris-spark-integration-3.5_2.12:intTest
...
BundleJarSanityIT > testBundleJarLoading(Path, PolarisApiEndpoints, ClientCredentials) FAILED
    java.lang.AssertionError at BundleJarSanityIT.java:142
...
2026-06-01 21:59:43,992 INFO  [io.qua.htt.access-log] [ea6f1dc2-296f-4477-ae6d-52f0c49c7a4c_0000000000000000002,POLARIS] [,,,] (executor-thread-1) 127.0.0.1 - root [01/Jun/2026:21:59:43 -0500] "POST /api/management/v1/catalogs HTTP/1.1" 201 425
2026-06-01 21:59:44,008 INFO  [io.qua.htt.access-log] [ea6f1dc2-296f-4477-ae6d-52f0c49c7a4c_0000000000000000003,POLARIS] [,,,] (executor-thread-1) 127.0.0.1 - - [01/Jun/2026:21:59:44 -0500] "POST /api/catalog/v1/oauth/tokens HTTP/1.1" 200 765
[Isolated Spark] SLF4J(W): Class path contains multiple SLF4J providers.
[Isolated Spark] SLF4J(W): Found provider [org.slf4j.impl.JBossSlf4jServiceProvider@57bc27f5]
[Isolated Spark] SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@5fb759d6]
[Isolated Spark] SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
[Isolated Spark] SLF4J(I): Actual provider is of type [org.slf4j.impl.JBossSlf4jServiceProvider@57bc27f5]
[Isolated Spark] Jun 01, 2026 9:59:50 PM org.jboss.logmanager.JBossLoggerFinder getLogger
[Isolated Spark] ERROR: The LogManager accessed before the "java.util.logging.manager" system property was set to "org.jboss.logmanager.LogManager". Results may be unexpected.
[Isolated Spark] org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: [SCHEMA_NOT_FOUND] The schema `polaris_known` cannot be found. Verify the spelling and correctness of the schema and catalog.
[Isolated Spark] If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
[Isolated Spark] To tolerate the error on drop use DROP SCHEMA IF EXISTS.
[Isolated Spark] 	at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$setCurrentNamespace$1(CatalogManager.scala:122)
[Isolated Spark] 	at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$setCurrentNamespace$1$adapted(CatalogManager.scala:119)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabaseWithNameCheck(SessionCatalog.scala:344)
[Isolated Spark] 	at org.apache.spark.sql.connector.catalog.CatalogManager.setCurrentNamespace(CatalogManager.scala:119)
[Isolated Spark] 	at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2(SetCatalogAndNamespaceExec.scala:36)
[Isolated Spark] 	at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2$adapted(SetCatalogAndNamespaceExec.scala:36)
[Isolated Spark] 	at scala.Option.foreach(Option.scala:407)
[Isolated Spark] 	at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.run(SetCatalogAndNamespaceExec.scala:36)
[Isolated Spark] 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
[Isolated Spark] 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
[Isolated Spark] 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
[Isolated Spark] 	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
[Isolated Spark] 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
[Isolated Spark] 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
[Isolated Spark] 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
[Isolated Spark] 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
[Isolated Spark] 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
[Isolated Spark] 	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
[Isolated Spark] 	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
[Isolated Spark] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
[Isolated Spark] 	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
[Isolated Spark] 	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
[Isolated Spark] 	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
[Isolated Spark] 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
[Isolated Spark] 	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
[Isolated Spark] 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
[Isolated Spark] 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
[Isolated Spark] 	at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
[Isolated Spark] 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
[Isolated Spark] 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
[Isolated Spark] 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
[Isolated Spark] 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
[Isolated Spark] 	at org.apache.polaris.spark.quarkus.it.BundleSanityChecker.main(BundleSanityChecker.java:26)

throws Exception {
// Filter the current classpath: drop polaris-spark / polaris-core so the bundle jar
// is the sole source of those classes; keep external jars (spark-sql, iceberg, etc.).
String[] parts = System.getProperty("java.class.path").split(File.pathSeparator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat idea... yet, I was thinking about using Gradle to build the class path (from test dependencies, without polaris-* artifacts) then run *IT tests via JUnit.

If other classes inside intTest need Polaris code, we can create a new test dir (e.g. sparkTest) for these new test cases (similar to cloudTest).

Then we could make a SparkSession directly here.

I hope the presence of JUnit on the class path is not a concern.

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was think the real gap between the current regtest (under plugin/spark and not the one from project root directory) and integration is the "spark-shell --jar xxxx", which this approach is more close to real world simulation with a new process will try to mimics how a user actually deploy the jar. With the purposed route, I am worried we may run into classpath hell or hide some packaging bugs.

If we think the purposed approach shouldn't be a concern with and be more preferred, I am fine with making the requested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants