Skip to content

feat(iceberg): skip equality deletes for cdc inserts#934

Merged
hash-data merged 18 commits into
stagingfrom
feat/skip-eq-deletes-cdc-inserts
May 5, 2026
Merged

feat(iceberg): skip equality deletes for cdc inserts#934
hash-data merged 18 commits into
stagingfrom
feat/skip-eq-deletes-cdc-inserts

Conversation

@vikaxsh
Copy link
Copy Markdown
Collaborator

@vikaxsh vikaxsh commented Apr 30, 2026

Description

Equality deletes are only required during the backfill→CDC overlap window. Track this with a new dedup_inserts flag in the Iceberg olake_2pc table property: Java sets it to true on backfill commit; Go clears it to false after the first successful CDC commit. While the flag is true, CDC inserts emit _op_type="i" (equality delete + write); otherwise _op_type="c" (write only). Applies to both the arrow writer and the legacy gRPC writer.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Scenario A
  • Scenario B

Screenshots or Recordings

Documentation

  • N/A (bug fix, refactor, or test changes only)

Related PR's (If Any):

@vikaxsh vikaxsh requested a deployment to integration_tests April 30, 2026 12:05 — with GitHub Actions Waiting
@vikaxsh vikaxsh changed the title feat(iceberg): skip equality deletes for inserts feat(iceberg): skip equality deletes for cdc inserts Apr 30, 2026
@vikaxsh vikaxsh requested a deployment to integration_tests May 1, 2026 02:12 — with GitHub Actions Waiting
@vikaxsh vikaxsh requested a deployment to integration_tests May 4, 2026 07:56 — with GitHub Actions Waiting
@vikaxsh vikaxsh requested a deployment to integration_tests May 4, 2026 12:28 — with GitHub Actions Waiting
@vikaxsh vikaxsh requested a deployment to integration_tests May 4, 2026 13:31 — with GitHub Actions Waiting
@vikaxsh vikaxsh requested a deployment to integration_tests May 4, 2026 14:28 — with GitHub Actions Waiting
@vikaxsh vikaxsh requested a deployment to integration_tests May 4, 2026 14:39 — with GitHub Actions Waiting
Comment thread drivers/abstract/abstract.go
Comment thread drivers/abstract/cdc.go Outdated
@vikaxsh vikaxsh requested a deployment to integration_tests May 4, 2026 16:14 — with GitHub Actions Waiting
Comment thread drivers/abstract/abstract.go Outdated
@vikaxsh vikaxsh had a problem deploying to integration_tests May 4, 2026 16:26 — with GitHub Actions Failure
hash-data
hash-data previously approved these changes May 4, 2026
Comment thread destination/parquet/parquet.go Outdated
@vikaxsh vikaxsh had a problem deploying to integration_tests May 4, 2026 18:06 — with GitHub Actions Failure
@vikaxsh vikaxsh had a problem deploying to integration_tests May 4, 2026 18:23 — with GitHub Actions Failure
Comment thread destination/parquet/parquet.go Outdated
@vikaxsh vikaxsh had a problem deploying to integration_tests May 4, 2026 18:59 — with GitHub Actions Error
@vikaxsh vikaxsh had a problem deploying to integration_tests May 4, 2026 19:02 — with GitHub Actions Failure
@vikaxsh vikaxsh had a problem deploying to integration_tests May 5, 2026 03:53 — with GitHub Actions Failure
@vikaxsh vikaxsh temporarily deployed to integration_tests May 5, 2026 04:50 — with GitHub Actions Inactive
@vikaxsh vikaxsh requested a deployment to integration_tests May 5, 2026 06:05 — with GitHub Actions Waiting
@vikaxsh vikaxsh had a problem deploying to integration_tests May 5, 2026 06:11 — with GitHub Actions Failure
@vikaxsh vikaxsh had a problem deploying to integration_tests May 5, 2026 07:08 — with GitHub Actions Failure
@vikaxsh vikaxsh had a problem deploying to integration_tests May 5, 2026 08:00 — with GitHub Actions Failure
@vikaxsh vikaxsh had a problem deploying to integration_tests May 5, 2026 09:02 — with GitHub Actions Failure
@vikaxsh vikaxsh had a problem deploying to integration_tests May 5, 2026 09:37 — with GitHub Actions Failure
@hash-data hash-data merged commit c47187f into staging May 5, 2026
11 checks passed
@hash-data hash-data deleted the feat/skip-eq-deletes-cdc-inserts branch May 5, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants