Skip to content

Commit 2b8e9f5

Browse files
committed
[SPARK-57128][SQL][TESTS] SQLQueryTestHelper --SET parser must preserve commas in config values
What changes were proposed in this pull request? `SQLQueryTestHelper.getSparkSettings` splits `--SET` directive values on every comma, which conflicts with Spark configs whose values themselves contain commas (e.g. `spark.sql.optimizer.excludedRules` accepts a comma-separated rule list). The current parser crashes with `StringIndexOutOfBoundsException` when it encounters such a value. Change the split to only occur at commas that are immediately followed by what looks like a new `key=` (word characters or dots ending in `=`). This preserves the documented multi-setting form `--SET k1=v1,k2=v2` while allowing values to contain commas. Adds `SQLQueryTestHelperSuite` with focused unit tests. Why are the changes needed? The parser cannot currently express settings whose values contain commas, forcing users to scope down their SET to a single value. This was hit when trying to specify a multi-rule `excludedRules` value in Apache Gluten's spark41 SQL test workaround (apache/gluten#12165). Does this PR introduce any user-facing change? No. Test-framework-only change. Existing tests that rely on the documented multi-setting form continue to parse as before. How was this patch tested? New `SQLQueryTestHelperSuite` with 6 cases covering: single setting, multi- setting in one `--SET`, multiple `--SET` lines, comma-containing value, mixed, and non-SET comments. All pass.
1 parent fc5abd6 commit 2b8e9f5

2 files changed

Lines changed: 70 additions & 1 deletion

File tree

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -474,7 +474,12 @@ trait SQLQueryTestHelper extends SQLConfHelper with Logging {
474474

475475
protected def getSparkSettings(comments: Array[String]): Array[(String, String)] = {
476476
val settingLines = comments.filter(_.startsWith("--SET ")).map(_.substring(6))
477-
settingLines.flatMap(_.split(",").map { kv =>
477+
// Split on commas that are followed by what looks like a new `key=`. This preserves
478+
// commas inside config values such as
479+
// --SET spark.sql.optimizer.excludedRules=Rule1,Rule2
480+
// while still supporting the documented multi-setting form
481+
// --SET key1=v1,key2=v2
482+
settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { kv =>
478483
val (conf, value) = kv.span(_ != '=')
479484
conf.trim -> value.substring(1).trim
480485
})
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
package org.apache.spark.sql
18+
19+
import org.apache.spark.SparkFunSuite
20+
21+
class SQLQueryTestHelperSuite extends SparkFunSuite with SQLQueryTestHelper {
22+
23+
test("getSparkSettings: single key=value") {
24+
val result = getSparkSettings(Array("--SET spark.sql.foo=1"))
25+
assert(result.toSeq === Seq("spark.sql.foo" -> "1"))
26+
}
27+
28+
test("getSparkSettings: multiple key=value pairs in one --SET (documented form)") {
29+
val result = getSparkSettings(Array("--SET spark.sql.foo=1,spark.sql.bar=2"))
30+
assert(result.toSeq === Seq("spark.sql.foo" -> "1", "spark.sql.bar" -> "2"))
31+
}
32+
33+
test("getSparkSettings: multiple --SET statements") {
34+
val result = getSparkSettings(
35+
Array("--SET spark.sql.foo=1", "--SET spark.sql.bar=2"))
36+
assert(result.toSeq === Seq("spark.sql.foo" -> "1", "spark.sql.bar" -> "2"))
37+
}
38+
39+
test("getSparkSettings: value containing commas (e.g. excludedRules list)") {
40+
val excludedRules =
41+
"org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation," +
42+
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding"
43+
val result = getSparkSettings(
44+
Array(s"--SET spark.sql.optimizer.excludedRules=$excludedRules"))
45+
assert(result.toSeq === Seq("spark.sql.optimizer.excludedRules" -> excludedRules))
46+
}
47+
48+
test("getSparkSettings: mixed -- multiple settings where one value contains commas") {
49+
val excludedRules =
50+
"org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation," +
51+
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding"
52+
val result = getSparkSettings(
53+
Array(s"--SET spark.sql.optimizer.excludedRules=$excludedRules,spark.sql.foo=1"))
54+
assert(result.toSeq === Seq(
55+
"spark.sql.optimizer.excludedRules" -> excludedRules,
56+
"spark.sql.foo" -> "1"))
57+
}
58+
59+
test("getSparkSettings: ignores non --SET comments") {
60+
val result = getSparkSettings(
61+
Array("-- a comment", "--SET spark.sql.foo=1", "-- another"))
62+
assert(result.toSeq === Seq("spark.sql.foo" -> "1"))
63+
}
64+
}

0 commit comments

Comments
 (0)