Commit 096afb5
committed
[SPARK-56686][FOLLOWUP][SQL] Mark CDC streaming rewrite via attribute metadata
### What changes were proposed in this pull request?
Follow-up to #55636 addressing post-merge review comments from zikangh:
1. **Deduplicate `isCarryoverPair`.** The carry-over predicate (`_del_cnt = 1 AND _ins_cnt = 1 AND _rv_cnt = 2 AND _min_rv = _max_rv`) was duplicated between the batch path's `addCarryOverPairFilter` and the streaming path's inline filter. Extracted a shared `buildCarryOverPairPredicate` helper and call it from both.
2. **Mark the streaming row-level rewrite via attribute metadata rather than helper column name.** `UnsupportedOperationChecker` previously detected the rewrite by string-matching the `__spark_cdc_events` aggregate alias name. Switched to a metadata marker (`ResolveChangelogTable.streamingPostProcessingMarker`) attached to the alias's output attribute -- mirroring the existing `EventTimeWatermark.delayKey` and `SessionWindow.marker` patterns. The marker travels with the attribute through optimization.
3. **Expand streaming E2E coverage.** New tests in `ChangelogEndToEndSuite`:
- composite rowId carry-over removal,
- composite rowId update detection (different tuples kept raw),
- carry-over + update detection across multiple commits,
- DELETE-all-rows and UPDATE-all-rows fixtures,
- append-only workload pass-through,
- no-op UPDATE labeled as update (rcv differs on pre/post),
- large carry-over removal (9 carry-over pairs + 1 real delete).
### Why are the changes needed?
zikangh raised these on the merged PR. Bundled together so they can be reviewed and shipped as one follow-up.
### Does this PR introduce _any_ user-facing change?
No. Internal refactor (#1, #2) and additional test coverage (#3). The behavior of streaming CDC reads is unchanged.
### How was this patch tested?
All 157 tests pass across the four CDC suites:
- `ChangelogResolutionSuite`
- `ResolveChangelogTablePostProcessingSuite`
- `ResolveChangelogTableStreamingPostProcessingSuite`
- `ChangelogEndToEndSuite`
Also confirmed:
- `UnsupportedOperationsSuite` (216 tests) still passes after the marker-based detection switch.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
Closes #55653 from gengliangwang/streamingCDC-followup-zikangh.
Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 84d9c84)
Signed-off-by: Gengliang Wang <gengliang@apache.org>1 parent 0eb4fc1 commit 096afb5
3 files changed
Lines changed: 387 additions & 24 deletions
File tree
- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis
- core/src/test/scala/org/apache/spark/sql/connector
Lines changed: 38 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
93 | 104 | | |
94 | 105 | | |
95 | 106 | | |
| |||
385 | 396 | | |
386 | 397 | | |
387 | 398 | | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
388 | 405 | | |
389 | | - | |
| 406 | + | |
| 407 | + | |
390 | 408 | | |
391 | 409 | | |
392 | 410 | | |
393 | 411 | | |
394 | 412 | | |
395 | 413 | | |
396 | | - | |
397 | | - | |
398 | | - | |
399 | | - | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
404 | | - | |
| 414 | + | |
405 | 415 | | |
406 | 416 | | |
407 | 417 | | |
| |||
531 | 541 | | |
532 | 542 | | |
533 | 543 | | |
534 | | - | |
535 | | - | |
536 | | - | |
537 | | - | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
538 | 548 | | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
539 | 553 | | |
540 | | - | |
| 554 | + | |
541 | 555 | | |
542 | 556 | | |
543 | 557 | | |
544 | 558 | | |
545 | 559 | | |
546 | | - | |
547 | | - | |
| 560 | + | |
548 | 561 | | |
549 | 562 | | |
550 | | - | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
551 | 571 | | |
552 | 572 | | |
553 | 573 | | |
| |||
Lines changed: 8 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
284 | 284 | | |
285 | 285 | | |
286 | 286 | | |
287 | | - | |
288 | | - | |
289 | | - | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
290 | 291 | | |
291 | 292 | | |
292 | 293 | | |
| |||
296 | 297 | | |
297 | 298 | | |
298 | 299 | | |
299 | | - | |
| 300 | + | |
300 | 301 | | |
301 | | - | |
| 302 | + | |
| 303 | + | |
302 | 304 | | |
303 | 305 | | |
304 | 306 | | |
| |||
307 | 309 | | |
308 | 310 | | |
309 | 311 | | |
310 | | - | |
| 312 | + | |
311 | 313 | | |
312 | 314 | | |
313 | 315 | | |
| |||
0 commit comments