Skip to content

GH-30894: [C++][Python] Fix Table::Slice returning incorrect length f…#50369

Open
AdvancedUno wants to merge 2 commits into
apache:mainfrom
AdvancedUno:gh-30894-slice-no-columns
Open

GH-30894: [C++][Python] Fix Table::Slice returning incorrect length f…#50369
AdvancedUno wants to merge 2 commits into
apache:mainfrom
AdvancedUno:gh-30894-slice-no-columns

Conversation

@AdvancedUno

@AdvancedUno AdvancedUno commented Jul 5, 2026

Copy link
Copy Markdown

Rationale for this change

Table::Slice on a table with no columns does not adjust the row count. SimpleTable::Slice computes num_rows inside the per-column loop, so with zero columns the loop never runs and num_rows keeps the raw length argument instead of being clamped to the rows actually available.

import pyarrow as pa
table = pa.table({'col': range(3)})

table.slice(1).num_rows               # 2  (correct)
table.select([])[1:].num_rows         # 2  (correct - __getitem__ normalizes)
table.select([]).slice(1).num_rows    # 3  (WRONG, should be 2)
table.select([]).slice(1, 4).num_rows # 4  (WRONG, should be 2)

This violates the contract documented in table.h: "If there are not enough rows in the table, the length will be adjusted accordingly."

What changes are included in this PR?

  • cpp/src/arrow/table.cc : in SimpleTable::Slice, seed num_rows with a clamped value, std::max<int64_t>(0, std::min(length, num_rows() - offset)), so it is correct even when the column vector is empty. When columns exist the loop still overwrites num_rows with column->length(), so behavior for tables with columns is unchanged. The single-arg Table::Slice(offset) routes through this method, so it is fixed as well.
  • cpp/src/arrow/table_test.cc: new TEST_F(TestTable, SliceZeroColumns).
  • python/pyarrow/tests/test_table.py: new test_table_slice_no_columns

Are these changes tested?

Yes. The new C++ and Python regression tests cover slicing a zero-column table with offset only, offset + length, an offset past the end , and consistency with a table whose columns were all removed using SelectColumns({}).

Are there any user-facing changes?

Yes. Table.slice() on a table with no columns now returns the correct num_rows instead of echoing back the raw length argument. This is a bug fix, there are no API changes.

Closes #30894

@github-actions github-actions Bot added the awaiting review Awaiting review label Jul 5, 2026
@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #30894 has been automatically assigned in GitHub to PR creator.

1 similar comment
@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #30894 has been automatically assigned in GitHub to PR creator.

@thisisnic

Copy link
Copy Markdown
Member

Thanks for the PR @AdvancedUno! The linter is failing - you can find instructions here on running the linter, but let me know if you have any issues!

@AdvancedUno

Copy link
Copy Markdown
Author

Hi @thisisnic, I just made the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Python] Slicing a table with no columns returns a table with incorrect length.

2 participants