Skip to content

fix: entry-range arithmetic in iterate/concatenate and basket selection#1656

Open
henryiii wants to merge 3 commits into
mainfrom
fix-entry-range-arithmetic
Open

fix: entry-range arithmetic in iterate/concatenate and basket selection#1656
henryiii wants to merge 3 commits into
mainfrom
fix-entry-range-arithmetic

Conversation

@henryiii

@henryiii henryiii commented Jun 10, 2026

Copy link
Copy Markdown
Member

🤖 AI text below 🤖

This PR fixes three coupled entry-range arithmetic bugs in iterate, concatenate, and basket selection.

1. Off-by-one extra basket on exact entry_stop boundaries

When entry_stop landed exactly on a basket boundary, the basket collector read, decompressed, and interpreted one extra basket contributing zero entries.

Reproducer (3 baskets of 10 entries, offsets [0,10,20,30]):

b.entries_to_ranges_or_baskets(0, 10)  # selected baskets [0, 1], should be [0]

The condition start <= entry_stop in entries_to_ranges_or_baskets and _hasbranches_num_entries_for (src/uproot/behaviors/TBranch.py) selected a basket whose start == entry_stop. The matching per-basket trim loops in the interpretations (numerical.py, jagged.py, strings.py, objects.py) had the same start <= entry_stop <= stop / start <= entry_stop boundary conditions.

Because the interpretation expects exactly the baskets the collector supplied, both sides are fixed together (start < entry_stop), and the degenerate empty-range case (entry_start == entry_stop) is handled explicitly so empty reads still work. custom.py uses a numpy.where-based selection that does not have this off-by-one, so it was left unchanged.

After the fix, ranges are minimal and results are byte-identical for non-boundary ranges.

2. concatenate(..., allow_missing=True) shifted offsets after a skipped file

When hasbranches.arrays raised KeyInFileError (a requested branch missing from one file) and the loop continued, it skipped advancing global_start = global_stop, so every subsequent file was read at offsets shifted by the skipped file's entry count.

Reproducer (3 files x 10 entries, middle file missing branch x, entry_start=5, entry_stop=25):

  • Before: 15 entries returned (values 5-9, 20-29, including out-of-range 25-29)
  • After: 10 entries (values 5-9, 20-24), correct

Fixed by advancing global_start = global_stop in the allow_missing continue path, in both TBranch.concatenate and RNTuple.concatenate.

3. RNTuple.HasFields.iterate did not clamp per-step entry_stop

iterate passed entry_stop=start + step_size per step without clamping to the user's entry_stop (only clamped to num_entries inside arrays()).

Reproducer (50000-entry RNTuple):

nt.iterate(step_size=30000, entry_stop=40000)  # yielded 50000 entries, should be 40000

Fixed with entry_stop=min(start + step_size, entry_stop). With report=True the ranges now come out as [(0, 30000), (30000, 40000)].

Testing

  • prek -a --quiet: clean
  • Existing suites pass: tests/test_0034*, tests/test_0017*, test_0016_interpretations, test_0043_iterate_function, test_0044_concatenate_function, test_0067_common_entry_offsets, test_0194_fix_lost_cuts_in_iterate, test_0335_empty_ttree_division_by_zero, test_1573_out_of_bounds_slicing, test_1630_rntuple_iterate_report, test_1406_improved_rntuple_methods, test_1250_rntuple_improvements, plus a wide keyword sweep (iterate/concatenate/basket/interpret/string/jagged/entry): all pass.

Regression tests will be added in a follow-up commit on this branch.

🤖 Generated with Claude Code

Part of #1646.

henryiii added 2 commits June 10, 2026 15:00
Three coupled off-by-one / accounting bugs in entry-range handling:

1. When entry_stop lands exactly on a basket boundary, the basket
   collector (entries_to_ranges_or_baskets and
   _hasbranches_num_entries_for) and the per-basket trim loops in the
   numerical/jagged/strings/objects interpretations all selected one
   extra basket that contributes zero entries. The collector and
   interpretations are fixed consistently (start < entry_stop instead of
   start <= entry_stop) so the interpretation never looks up a basket the
   collector did not supply; the degenerate empty-range case is handled
   explicitly.

2. concatenate(..., allow_missing=True): when arrays() raises
   KeyInFileError and the loop continues, it skipped advancing
   global_start, shifting all subsequent files' entry offsets. Fixed in
   both TBranch.concatenate and RNTuple.concatenate.

3. RNTuple HasFields.iterate did not clamp the per-step entry_stop to the
   user-requested entry_stop, so iteration could run past it. Now clamped
   with min(start + step_size, entry_stop).

Assisted-by: ClaudeCode:claude-opus-4-8
Covers: no extra basket selected on exact entry_stop boundaries (numerical,
jagged, strings), correct boundary/empty-range reads, concatenate keeping
global offsets aligned when a file is skipped via allow_missing, and RNTuple
iterate clamping the per-step entry_stop.

Assisted-by: ClaudeCode:claude-opus-4-8
@henryiii

Copy link
Copy Markdown
Member Author

🤖 AI text below 🤖

Regression tests added in tests/test_1656_entry_range_arithmetic.py (commit 142b01f). They cover: no extra basket on exact entry_stop boundaries (numerical/jagged/strings interpretations), correct boundary and empty-range reads, concatenate(allow_missing=True) keeping global offsets aligned when a file is skipped, and RNTuple.iterate clamping the per-step entry_stop. Three of the five tests fail on the pre-fix code and pass after.

Note on the RNTuple concatenate change: it is applied for symmetry with the TTree fix and matches the reported issue, but in current code an RNTuple arrays() call with a missing field returns empty records rather than raising KeyInFileError, so the allow_missing continue path there is not reachable via a black-box reproducer and is not covered by a dedicated test.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.80%. Comparing base (1c06db0) to head (2af1b83).
⚠️ Report is 15 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/uproot/behaviors/RNTuple.py 0.00% 1 Missing ⚠️
src/uproot/behaviors/TBranch.py 66.66% 0 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (83.33%) is below the target coverage (98.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
Files with missing lines Coverage Δ
src/uproot/interpretation/jagged.py 90.27% <100.00%> (ø)
src/uproot/interpretation/numerical.py 82.21% <100.00%> (ø)
src/uproot/interpretation/objects.py 83.96% <100.00%> (ø)
src/uproot/interpretation/strings.py 71.36% <100.00%> (+0.93%) ⬆️
src/uproot/behaviors/RNTuple.py 70.84% <0.00%> (-0.14%) ⬇️
src/uproot/behaviors/TBranch.py 83.73% <66.66%> (+0.42%) ⬆️

... and 1 file with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant