Skip to content

Commit ef6c286

Browse files
koenvogabeiglio
authored andcommitted
Temporary fix for filtering on empty batches (apache#1901)
Potential fix for apache#1804 Might want to write a test, but not sure yet how to reproduce without using glue. Closes apache#1804
1 parent f74426f commit ef6c286

1 file changed

Lines changed: 6 additions & 2 deletions

File tree

pyiceberg/io/pyarrow.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1441,11 +1441,15 @@ def _task_to_record_batches(
14411441

14421442
# Apply the user filter
14431443
if pyarrow_filter is not None:
1444-
current_batch = current_batch.filter(pyarrow_filter)
1444+
# Temporary fix until PyArrow 21 is released ( https://github.com/apache/arrow/pull/46057 )
1445+
table = pa.Table.from_batches([current_batch])
1446+
table = table.filter(pyarrow_filter)
14451447
# skip empty batches
1446-
if current_batch.num_rows == 0:
1448+
if table.num_rows == 0:
14471449
continue
14481450

1451+
current_batch = table.combine_chunks().to_batches()[0]
1452+
14491453
result_batch = _to_requested_schema(
14501454
projected_schema,
14511455
file_project_schema,

0 commit comments

Comments
 (0)