Skip to content

[SPARK-57128][SQL][TESTS] SQLQueryTestHelper --SET parser must preserve commas in config values#56184

Open
rdtr wants to merge 1 commit into
apache:masterfrom
rdtr:sqlquerytest-helper-multi-config-set
Open

[SPARK-57128][SQL][TESTS] SQLQueryTestHelper --SET parser must preserve commas in config values#56184
rdtr wants to merge 1 commit into
apache:masterfrom
rdtr:sqlquerytest-helper-multi-config-set

Conversation

@rdtr
Copy link
Copy Markdown

@rdtr rdtr commented May 28, 2026

What changes were proposed in this pull request?

SQLQueryTestHelper.getSparkSettings splits --SET directive values on every comma:

val settingLines = comments.filter(_.startsWith("--SET ")).map(_.substring(6))
settingLines.flatMap(_.split(",").map { kv =>
  val (conf, value) = kv.span(_ != '=')
  conf.trim -> value.substring(1).trim
})

This breaks for any setting whose value itself contains a comma (e.g. spark.sql.optimizer.excludedRules, which is documented as a comma-separated list of rule names). The parser splits inside the value, the next segment has no =, and value.substring(1) then throws StringIndexOutOfBoundsException.

This PR changes the split to only occur at commas that are immediately followed by what looks like a new key=:

settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { ... })

The positive lookahead (?=[\w.]+=) peeks at the following characters and only splits when they form a config key followed by =, without consuming those characters. This preserves the documented multi-setting form --SET k1=v1,k2=v2 while now also correctly handling values that themselves contain commas.

Why are the changes needed?

Several Spark configs take comma-separated lists as values — spark.sql.optimizer.excludedRules being the most common one. SQL test files currently cannot express such a value in a --SET directive at all; the parser fails before the test even runs, forcing authors to scope --SET down to one excluded rule per directive.

This was surfaced while writing a per-file workaround in Apache Gluten (apache/incubator-gluten#12165), which reuses this test helper and needs to exclude multiple rules at once.

Does this PR introduce any user-facing change?

No. Test-framework-only change. All existing --SET directives in the SQL test corpus continue to parse identically.

How was this patch tested?

Added SQLQueryTestHelperSuite (new file) with focused unit tests covering:

  • single key=value
  • documented multi-setting form k1=v1,k2=v2 in a single --SET
  • multiple --SET lines
  • a value containing commas (e.g. excludedRules=Rule1,Rule2) — fails on master, passes after the fix
  • mixed: multi-setting where one of the values itself contains commas
  • non---SET comments are ignored

All 6 tests pass with the fix applied.

…ve commas in config values

What changes were proposed in this pull request?

`SQLQueryTestHelper.getSparkSettings` splits `--SET` directive values on every
comma, which conflicts with Spark configs whose values themselves contain commas
(e.g. `spark.sql.optimizer.excludedRules` accepts a comma-separated rule list).
The current parser crashes with `StringIndexOutOfBoundsException` when it
encounters such a value.

Change the split to only occur at commas that are immediately followed by what
looks like a new `key=` (word characters or dots ending in `=`). This preserves
the documented multi-setting form `--SET k1=v1,k2=v2` while allowing values to
contain commas.

Adds `SQLQueryTestHelperSuite` with focused unit tests.

Why are the changes needed?

The parser cannot currently express settings whose values contain commas,
forcing users to scope down their SET to a single value. This was hit when
trying to specify a multi-rule `excludedRules` value in Apache Gluten's spark41
SQL test workaround (apache/gluten#12165).

Does this PR introduce any user-facing change?

No. Test-framework-only change. Existing tests that rely on the documented
multi-setting form continue to parse as before.

How was this patch tested?

New `SQLQueryTestHelperSuite` with 6 cases covering: single setting, multi-
setting in one `--SET`, multiple `--SET` lines, comma-containing value,
mixed, and non-SET comments. All pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant