Commit dc272e6
fix: Use permissive promote in merge() concat for large_string drift (#10)
`Transaction.merge()` calls `pa.concat_tables([kept_rows, df], promote_options="default")` to combine the kept-after-anti-join rows with the user-supplied source df. `ArrowScan.to_table()` can return text and binary columns as the `large_*` PyArrow variants (`large_utf8`, `large_binary`), while user-supplied dfs - in particular ones constructed via pandas->arrow with `dtype_backend="numpy_nullable"` - typically use the non-large variants. Both sides are semantically the same Iceberg type; the difference is purely physical encoding width. `promote_options="default"` refuses to bridge `utf8 <-> large_utf8` / `binary <-> large_binary` and the merge fails with:
pyarrow.lib.ArrowTypeError: Unable to merge: Field <name> has incompatible types: string vs large_string
This is the exact failure observed against real Iceberg parquet tables when consolidating multiple feeds into one table via composite merge keys.
Switch to `promote_options="permissive"` to bridge the gap. This matches what `pyiceberg/io/pyarrow.py:to_table()` already does for the same reason - the comment there reads:
# Note: cannot use pa.Table.from_batches(itertools.chain([first_batch], batches)))
# as different batches can use different schema's (due to large_ types)
Adds three regression tests covering both directions of the string drift and the binary equivalent. The existing test fixtures all share a single `ARROW` schema with `pa.string()` on both sides of every merge, so no test ever exercised the cross-encoding path that triggers the failure.
Co-authored-by: Ayush Patel <Ayush.Patel@imc.com>1 parent d5a5a38 commit dc272e6
2 files changed
Lines changed: 171 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
996 | 996 | | |
997 | 997 | | |
998 | 998 | | |
999 | | - | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
1000 | 1010 | | |
1001 | 1011 | | |
1002 | 1012 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
696 | 696 | | |
697 | 697 | | |
698 | 698 | | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
699 | 859 | | |
700 | 860 | | |
701 | 861 | | |
| |||
0 commit comments