Skip to content

Upsert with 1M rows extremely slow due to create_match_filter and txn.delete() performance #3129

@ayushk7102

Description

@ayushk7102

Apache Iceberg version

0.11.0

Please describe the bug 🐞

CC @goutamvenkat-anyscale @koenvo @Fokko

Hello! We are implementing distributed writes from Ray Data to Iceberg. As part of upserts, we:

  1. Write data files in parallel across Ray workers (each worker writes its share of Parquet files directly to storage and returns DataFile metadata + the upsert key columns back to the driver)
  2. On the driver, concatenate all upsert keys collected from workers, call create_match_filter to build a delete predicate, then call txn.delete() followed by an append to commit

Upserting 1M rows (383 MiB) into an Iceberg table takes ~17.5 minutes, almost entirely in the delete step:

create_match_filter (1M keys → In filter):   10.26s
txn.delete():                              1054.35s
append + commit:                              1.14s
─────────────────────────────────────────────────────
Total upsert commit:                       1065.75s

PyIceberg version 0.11.0

This matches what's reported in #2159 and #2138.

The bottlenecks are:

  1. create_match_filter — constructs a Python BooleanExpression node per row, which is expensive at 1M+ keys
  2. txn.delete() — evaluates the resulting giant In expression against the table's data files with no partition pruning, effectively doing a full table scan

We have a few questions:

  1. Merge-on-read upserts — is this on the roadmap, and if so, roughly when? MoR would let us avoid the expensive delete + rewrite cycle entirely for large upserts.
  2. Optimizing create_match_filter or txn.delete() — is there a recommended way to speed these up today? For example, batching the In filter, or passing a partition-level hint to constrain the file scan?
  3. Partition-aware deletes — if the upsert key columns overlap with partition columns, is there a supported way to restrict txn.delete() to only the relevant partitions, rather than scanning the full table?

Related

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions