[GLUTEN-11916][VL][TEST] Enable subquery/exists-subquery/exists-orderby-limit.sql with SPARK-57125 workaround#12165
Open
rdtr wants to merge 1 commit into
Open
Conversation
…by-limit.sql with SPARK-57125 workaround GlutenSQLQueryTestSuite excludes ConvertToLocalRelation, ConstantFolding and NullPropagation by default to force queries through Gluten's offload paths. However, EXISTS+OFFSET queries in exists-orderby-limit.sql hit Spark's LimitPushDown rule which rewrites LocalLimit(le, Offset(oe, child)) into Offset(oe, LocalLimit(Add(le, oe), child)) and relies on ConstantFolding to subsequently fold `Add(Literal(N), Literal(M))` to `Literal(N + M)`. Without ConstantFolding the unfolded Add reaches physical planning where BasicOperators only matches LocalLimit(IntegerLiteral, _), producing AssertionError: No plan for LocalLimit (1 + 2) wrapped as [INTERNAL_ERROR] during the planning phase. This patch enables the test and re-enables ConstantFolding for just this SQL file via a per-file `--SET spark.sql.optimizer.excludedRules=...` directive that keeps only ConvertToLocalRelation excluded. The upstream Spark fix is tracked as SPARK-57125 (Apache Spark PR #56180), which makes LimitPushDown produce a literal sum directly so the rule no longer depends on ConstantFolding. Once that lands and Gluten picks up the Spark version, the `--SET` directive in this file can be removed. Note: the test framework's `--SET` parser splits values by comma, so multiple excluded rules cannot be specified in a single directive (recorded separately for a future Spark/Gluten follow-up). NullPropagation getting re-enabled is acceptable for this test.
rdtr
added a commit
to rdtr/spark
that referenced
this pull request
May 28, 2026
…ve commas in config values What changes were proposed in this pull request? `SQLQueryTestHelper.getSparkSettings` splits `--SET` directive values on every comma, which conflicts with Spark configs whose values themselves contain commas (e.g. `spark.sql.optimizer.excludedRules` accepts a comma-separated rule list). The current parser crashes with `StringIndexOutOfBoundsException` when it encounters such a value. Change the split to only occur at commas that are immediately followed by what looks like a new `key=` (word characters or dots ending in `=`). This preserves the documented multi-setting form `--SET k1=v1,k2=v2` while allowing values to contain commas. Adds `SQLQueryTestHelperSuite` with focused unit tests. Why are the changes needed? The parser cannot currently express settings whose values contain commas, forcing users to scope down their SET to a single value. This was hit when trying to specify a multi-rule `excludedRules` value in Apache Gluten's spark41 SQL test workaround (apache/gluten#12165). Does this PR introduce any user-facing change? No. Test-framework-only change. Existing tests that rely on the documented multi-setting form continue to parse as before. How was this patch tested? New `SQLQueryTestHelperSuite` with 6 cases covering: single setting, multi- setting in one `--SET`, multiple `--SET` lines, comma-containing value, mixed, and non-SET comments. All pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enables
subquery/exists-subquery/exists-orderby-limit.sqlin spark41 SQL query tests by re-enablingConstantFoldingfor just this one file via a per-file--SET spark.sql.optimizer.excludedRules=...directive. The test was previously TODO-commented out because it crashed with anINTERNAL_ERRORduring physical planning.Root cause
GlutenSQLQueryTestSuiteexcludesConvertToLocalRelation,ConstantFolding,and
NullPropagationfrom the optimizer by default to force queries throughGluten's offload paths. With
ConstantFoldingexcluded, Spark'sLimitPushDownrule produces an unfolded
Addexpression thatBasicOperatorsinSparkStrategiescannot match, causing physical planning to fail with:wrapped as
[INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error.The trigger is the EXISTS+OFFSET pattern (query # 17 in this SQL file):
LimitPushDownrewritesLocalLimit(le, Offset(oe, child))intoOffset(oe, LocalLimit(Add(le, oe), child))and relies onConstantFoldingto subsequently foldAdd(Literal(1), Literal(2))toLiteral(3)so thatBasicOperators(which only matchesLocalLimit(IntegerLiteral, _)) can produce a physical plan.Fix
Add a per-file
--SETdirective that overrides the default exclusion to only excludeConvertToLocalRelation, re-enablingConstantFolding(andNullPropagation, which gets re-enabled because the test framework's--SETparser splits values by comma — see the note below). This keeps Gluten's offload paths exercised while allowing the test to plan.The upstream Spark fix is tracked as SPARK-57125 (PR
apache/spark#56180), which makes
LimitPushDownproduce a folded literal directly so the rule no longer dependson
ConstantFolding. Once that lands and Gluten picks up the Spark version,the
--SETdirective in this file can be removed.Test framework limitation
The Gluten/Spark SQL test framework's
--SETparser atSQLQueryTestHelper.scala:476-481splits values by comma at the top level,which means multi-rule values like
excludedRules=Rule1,Rule2can't be specified in a single--SET(
StringIndexOutOfBoundsException). For now, acceptingNullPropagationbeing re-enabled is fine for this test. Filed as a separate follow-up.
Verification
AssertionError: No plan for LocalLimit (1 + 2)byenabling the test without the
--SETdirective on the originalGlutenSQLQueryTestSuite(not a diagnostic subclass).--SETdirective applied, the test passes.master and passes with the upstream fix.
Regarding Spark 4.0
The same SQL test file (with the same EXISTS+OFFSET queries) is enabled and
appears to pass in
gluten-ut/spark40. I checked:OFFSET queries)
LimitPushDownrule in Spark 4.0.0 source (verified via the GitHubv4.0.0 tag)
BasicOperatorsIntegerLiteral-only matchers in Spark 4.0.0 sourceConvertToLocalRelation,ConstantFolding,NullPropagation) ingluten-ut/spark40/.../GlutenSQLQueryTestSuite.scalagluten-ut/spark40is enabled invelox_backend_x86.ymland thespark-test-spark40-slowjob runsGlutenSQLQueryTestSuitevia theExtendedSQLTesttag — and passesI couldn't identify what makes Spark 4.0.0 + Gluten avoid hitting this code path while Spark 4.1.1 + Gluten triggers it. The bug ingredients look identical. If a reviewer with more context on the spark40 setup can shed light, that would be appreciated — but it doesn't block this PR. I am happy to build and test locally with 4.0 if necessary.
Test plan
spark-test-spark41-slowrunsGlutenSQLQueryTestSuitewithExtendedSQLTesttag, which exercises this file.GlutenSQLQueryTestSuitefiltered to
subquery/exists-subquery/exists-orderby-limit.sql.Related: GLUTEN-11916, #12146
(first batch of Spark 4.1 TODO test fixes).
Related issue: #11916