Skip to content

Commit efa725e

Browse files
szehon-hoaokolnychyi
authored andcommitted
[SPARK-56343][SQL][TESTS] Add MERGE INTO test for type mismatch without schema evolution trigger condition
### What changes were proposed in this pull request? Add two tests to `MergeIntoSchemaEvolutionTypeWideningAndExtraFieldTests` for MERGE INTO schema evolution with cross-column assignments (which does not trigger schema evolution), where the source has a type mismatch on a same-named column: 1. **No evolution for compatible cross-column assignment**: `UPDATE SET salary = s.bonus` where `source.salary` is LONG and `target.salary` is INT. Since the assignment uses `s.bonus` (not `s.salary`), the type mismatch on `salary` should be irrelevant and no schema evolution should occur. Asserts both data and schema remain unchanged. 2. **Error for incompatible cross-column assignment**: `UPDATE SET salary = s.bonus` where `s.bonus` is STRING and `target.salary` is INT. This should fail regardless of schema evolution because the explicit assignment has incompatible types. ### Why are the changes needed? These tests cover a gap in schema evolution test coverage. The existing tests for cross-column assignments (`salary = s.bonus`) did not include the scenario where the source also has a same-named column (`salary`) with a different type. This is important to verify that schema evolution correctly considers only the actual assignment columns, not unrelated same-named columns with wider types. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests added and run via: ``` build/sbt 'sql/testOnly *GroupBasedMergeIntoSchemaEvolutionSQLSuite -- -z "type mismatch on existing column"' ``` All 4 test cases pass (2 tests x 2 variants: with/without evolution clause). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude claude-4.6-opus-high-thinking) Closes #55173 from szehon-ho/SPARK-56343. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Anton Okolnychyi <aokolnychyi@apache.org>
1 parent 33c18ee commit efa725e

File tree

1 file changed

+88
-0
lines changed

1 file changed

+88
-0
lines changed

sql/core/src/test/scala/org/apache/spark/sql/connector/MergeIntoSchemaEvolutionTypeWideningAndExtraFieldTests.scala

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,94 @@ trait MergeIntoSchemaEvolutionTypeWideningAndExtraFieldTests
144144
(3, 75, "newdep")).toDF("pk", "salary", "dep")
145145
)
146146

147+
// When assigning s.bonus to existing t.salary and source.salary has a wider type (long) than
148+
// target.salary (int), no evolution should occur because the assignment uses s.bonus, not
149+
// s.salary. The type mismatch on the same-named column should be irrelevant.
150+
testEvolution("source has extra column with type mismatch on existing column -" +
151+
"should not evolve when assigning from differently named source column")(
152+
targetData = {
153+
val schema = StructType(Seq(
154+
StructField("pk", IntegerType, nullable = false),
155+
StructField("salary", IntegerType),
156+
StructField("dep", StringType)
157+
))
158+
spark.createDataFrame(spark.sparkContext.parallelize(Seq(
159+
Row(1, 100, "hr"),
160+
Row(2, 200, "software")
161+
)), schema)
162+
},
163+
sourceData = {
164+
val schema = StructType(Seq(
165+
StructField("pk", IntegerType, nullable = false),
166+
StructField("salary", LongType),
167+
StructField("dep", StringType),
168+
StructField("bonus", LongType)
169+
))
170+
spark.createDataFrame(spark.sparkContext.parallelize(Seq(
171+
Row(2, 150L, "dummy", 50L),
172+
Row(3, 250L, "dummy", 75L)
173+
)), schema)
174+
},
175+
clauses = Seq(
176+
update(set = "salary = s.bonus"),
177+
insert(values = "(pk, salary, dep) VALUES (s.pk, s.bonus, 'newdep')")
178+
),
179+
expected = Seq(
180+
(1, 100, "hr"),
181+
(2, 50, "software"),
182+
(3, 75, "newdep")).toDF("pk", "salary", "dep"),
183+
expectedWithoutEvolution = Seq(
184+
(1, 100, "hr"),
185+
(2, 50, "software"),
186+
(3, 75, "newdep")).toDF("pk", "salary", "dep"),
187+
expectedSchema = StructType(Seq(
188+
StructField("pk", IntegerType, nullable = false),
189+
StructField("salary", IntegerType),
190+
StructField("dep", StringType)
191+
)),
192+
expectedSchemaWithoutEvolution = StructType(Seq(
193+
StructField("pk", IntegerType, nullable = false),
194+
StructField("salary", IntegerType),
195+
StructField("dep", StringType)
196+
))
197+
)
198+
199+
// When assigning s.bonus (StringType) to target salary (IntegerType), the types are
200+
// incompatible. This should fail both with and without schema evolution because the explicit
201+
// assignment has mismatched types regardless of evolution.
202+
testEvolution("source has extra column with type mismatch on existing column -" +
203+
"should fail when assigning from incompatible source column")(
204+
targetData = {
205+
val schema = StructType(Seq(
206+
StructField("pk", IntegerType, nullable = false),
207+
StructField("salary", IntegerType),
208+
StructField("dep", StringType)
209+
))
210+
spark.createDataFrame(spark.sparkContext.parallelize(Seq(
211+
Row(1, 100, "hr"),
212+
Row(2, 200, "software")
213+
)), schema)
214+
},
215+
sourceData = {
216+
val schema = StructType(Seq(
217+
StructField("pk", IntegerType, nullable = false),
218+
StructField("salary", LongType),
219+
StructField("dep", StringType),
220+
StructField("bonus", StringType)
221+
))
222+
spark.createDataFrame(spark.sparkContext.parallelize(Seq(
223+
Row(2, 150L, "dummy", "fifty"),
224+
Row(3, 250L, "dummy", "seventy-five")
225+
)), schema)
226+
},
227+
clauses = Seq(
228+
update(set = "salary = s.bonus"),
229+
insert(values = "(pk, salary, dep) VALUES (s.pk, s.bonus, 'newdep')")
230+
),
231+
expectErrorContains = "Cannot safely cast",
232+
expectErrorWithoutEvolutionContains = "Cannot safely cast"
233+
)
234+
147235
// No evolution when using named_struct to construct value without referencing new field
148236
testNestedStructsEvolution("source has extra struct field -" +
149237
"no evolution when not directly referencing new field - INSERT")(

0 commit comments

Comments
 (0)