Commit 0774698
Fallback for upsert when arrow cannot compare source rows with target rows (apache#1878)
<!-- Fixes apache#1711 -->
Upsert operations in PyIceberg rely on Arrow joins between source and
target rows. However, Arrow Acero cannot compare certain complex types —
like `struct`, `list`, and `map` — unless they’re part of the join key.
When such types exist in non-join columns, the upsert fails with an
error like:
```ArrowInvalid: Data type struct<...> is not supported in join non-key field venue_geo```
This PR introduces a **fallback mechanism**: if Arrow fails to join due to unsupported types, we fall back to comparing only the key columns. Non-key complex fields are ignored in the join condition, but still retained in the final upserted data.
---
```python
txn.upsert(df, join_cols=["match_id"])
```
> ❌ ArrowInvalid: Data type struct<...> is not supported in join non-key field venue_geo
---
```python
txn.upsert(df, join_cols=["match_id"])
```
> ✅ Successfully inserts or updates the record, skipping complex field comparison during join
---
Yes:
- A test was added to reproduce the failure scenario with complex non-key fields.
- The new behavior is verified by asserting that the upsert completes successfully using the fallback logic.
---
> ℹ️ **Note**
> This change does not affect users who do not include complex types in their schemas. For those who do, it improves resilience while preserving data correctness.
---
Yes — upserts involving complex non-key columns (like `struct`, `list`, or `map`) no longer fail. They now succeed by skipping unsupported comparisons during the join phase.1 parent 2d0acf1 commit 0774698
1 file changed
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
674 | 674 | | |
675 | 675 | | |
676 | 676 | | |
677 | | - | |
| 677 | + | |
678 | 678 | | |
679 | 679 | | |
680 | 680 | | |
| |||
702 | 702 | | |
703 | 703 | | |
704 | 704 | | |
705 | | - | |
| 705 | + | |
706 | 706 | | |
707 | 707 | | |
708 | 708 | | |
| |||
0 commit comments