fix: entry-range arithmetic in iterate/concatenate and basket selection#1656
fix: entry-range arithmetic in iterate/concatenate and basket selection#1656henryiii wants to merge 3 commits into
Conversation
Three coupled off-by-one / accounting bugs in entry-range handling: 1. When entry_stop lands exactly on a basket boundary, the basket collector (entries_to_ranges_or_baskets and _hasbranches_num_entries_for) and the per-basket trim loops in the numerical/jagged/strings/objects interpretations all selected one extra basket that contributes zero entries. The collector and interpretations are fixed consistently (start < entry_stop instead of start <= entry_stop) so the interpretation never looks up a basket the collector did not supply; the degenerate empty-range case is handled explicitly. 2. concatenate(..., allow_missing=True): when arrays() raises KeyInFileError and the loop continues, it skipped advancing global_start, shifting all subsequent files' entry offsets. Fixed in both TBranch.concatenate and RNTuple.concatenate. 3. RNTuple HasFields.iterate did not clamp the per-step entry_stop to the user-requested entry_stop, so iteration could run past it. Now clamped with min(start + step_size, entry_stop). Assisted-by: ClaudeCode:claude-opus-4-8
Covers: no extra basket selected on exact entry_stop boundaries (numerical, jagged, strings), correct boundary/empty-range reads, concatenate keeping global offsets aligned when a file is skipped via allow_missing, and RNTuple iterate clamping the per-step entry_stop. Assisted-by: ClaudeCode:claude-opus-4-8
|
🤖 AI text below 🤖 Regression tests added in Note on the RNTuple |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (83.33%) is below the target coverage (98.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files
|
🤖 AI text below 🤖
This PR fixes three coupled entry-range arithmetic bugs in
iterate,concatenate, and basket selection.1. Off-by-one extra basket on exact
entry_stopboundariesWhen
entry_stoplanded exactly on a basket boundary, the basket collector read, decompressed, and interpreted one extra basket contributing zero entries.Reproducer (3 baskets of 10 entries, offsets
[0,10,20,30]):The condition
start <= entry_stopinentries_to_ranges_or_basketsand_hasbranches_num_entries_for(src/uproot/behaviors/TBranch.py) selected a basket whosestart == entry_stop. The matching per-basket trim loops in the interpretations (numerical.py,jagged.py,strings.py,objects.py) had the samestart <= entry_stop <= stop/start <= entry_stopboundary conditions.Because the interpretation expects exactly the baskets the collector supplied, both sides are fixed together (
start < entry_stop), and the degenerate empty-range case (entry_start == entry_stop) is handled explicitly so empty reads still work.custom.pyuses anumpy.where-based selection that does not have this off-by-one, so it was left unchanged.After the fix, ranges are minimal and results are byte-identical for non-boundary ranges.
2.
concatenate(..., allow_missing=True)shifted offsets after a skipped fileWhen
hasbranches.arraysraisedKeyInFileError(a requested branch missing from one file) and the loopcontinued, it skipped advancingglobal_start = global_stop, so every subsequent file was read at offsets shifted by the skipped file's entry count.Reproducer (3 files x 10 entries, middle file missing branch
x,entry_start=5,entry_stop=25):Fixed by advancing
global_start = global_stopin theallow_missingcontinue path, in bothTBranch.concatenateandRNTuple.concatenate.3.
RNTuple.HasFields.iteratedid not clamp per-stepentry_stopiteratepassedentry_stop=start + step_sizeper step without clamping to the user'sentry_stop(only clamped tonum_entriesinsidearrays()).Reproducer (50000-entry RNTuple):
Fixed with
entry_stop=min(start + step_size, entry_stop). Withreport=Truethe ranges now come out as[(0, 30000), (30000, 40000)].Testing
prek -a --quiet: cleantests/test_0034*,tests/test_0017*,test_0016_interpretations,test_0043_iterate_function,test_0044_concatenate_function,test_0067_common_entry_offsets,test_0194_fix_lost_cuts_in_iterate,test_0335_empty_ttree_division_by_zero,test_1573_out_of_bounds_slicing,test_1630_rntuple_iterate_report,test_1406_improved_rntuple_methods,test_1250_rntuple_improvements, plus a wide keyword sweep (iterate/concatenate/basket/interpret/string/jagged/entry): all pass.Regression tests will be added in a follow-up commit on this branch.
🤖 Generated with Claude Code
Part of #1646.