[SPARK-57128][SQL][TESTS] SQLQueryTestHelper --SET parser must preserve commas in config values#56184
Open
rdtr wants to merge 1 commit into
Open
[SPARK-57128][SQL][TESTS] SQLQueryTestHelper --SET parser must preserve commas in config values#56184rdtr wants to merge 1 commit into
rdtr wants to merge 1 commit into
Conversation
…ve commas in config values What changes were proposed in this pull request? `SQLQueryTestHelper.getSparkSettings` splits `--SET` directive values on every comma, which conflicts with Spark configs whose values themselves contain commas (e.g. `spark.sql.optimizer.excludedRules` accepts a comma-separated rule list). The current parser crashes with `StringIndexOutOfBoundsException` when it encounters such a value. Change the split to only occur at commas that are immediately followed by what looks like a new `key=` (word characters or dots ending in `=`). This preserves the documented multi-setting form `--SET k1=v1,k2=v2` while allowing values to contain commas. Adds `SQLQueryTestHelperSuite` with focused unit tests. Why are the changes needed? The parser cannot currently express settings whose values contain commas, forcing users to scope down their SET to a single value. This was hit when trying to specify a multi-rule `excludedRules` value in Apache Gluten's spark41 SQL test workaround (apache/gluten#12165). Does this PR introduce any user-facing change? No. Test-framework-only change. Existing tests that rely on the documented multi-setting form continue to parse as before. How was this patch tested? New `SQLQueryTestHelperSuite` with 6 cases covering: single setting, multi- setting in one `--SET`, multiple `--SET` lines, comma-containing value, mixed, and non-SET comments. All pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
SQLQueryTestHelper.getSparkSettingssplits--SETdirective values on every comma:This breaks for any setting whose value itself contains a comma (e.g.
spark.sql.optimizer.excludedRules, which is documented as a comma-separated list of rule names). The parser splits inside the value, the next segment has no=, andvalue.substring(1)then throwsStringIndexOutOfBoundsException.This PR changes the split to only occur at commas that are immediately followed by what looks like a new
key=:settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { ... })The positive lookahead
(?=[\w.]+=)peeks at the following characters and only splits when they form a config key followed by=, without consuming those characters. This preserves the documented multi-setting form--SET k1=v1,k2=v2while now also correctly handling values that themselves contain commas.Why are the changes needed?
Several Spark configs take comma-separated lists as values —
spark.sql.optimizer.excludedRulesbeing the most common one. SQL test files currently cannot express such a value in a--SETdirective at all; the parser fails before the test even runs, forcing authors to scope--SETdown to one excluded rule per directive.This was surfaced while writing a per-file workaround in Apache Gluten (apache/incubator-gluten#12165), which reuses this test helper and needs to exclude multiple rules at once.
Does this PR introduce any user-facing change?
No. Test-framework-only change. All existing
--SETdirectives in the SQL test corpus continue to parse identically.How was this patch tested?
Added
SQLQueryTestHelperSuite(new file) with focused unit tests covering:key=valuek1=v1,k2=v2in a single--SET--SETlinesexcludedRules=Rule1,Rule2) — fails onmaster, passes after the fix--SETcomments are ignoredAll 6 tests pass with the fix applied.