Skip to content

PERF: speed up to_offset string parsing#65395

Merged
mroeschke merged 5 commits into
pandas-dev:mainfrom
jbrockmendel:perf-to_offset
May 7, 2026
Merged

PERF: speed up to_offset string parsing#65395
mroeschke merged 5 commits into
pandas-dev:mainfrom
jbrockmendel:perf-to_offset

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Summary

Speeds up pandas.tseries.frequencies.to_offset string parsing. Stacked on top of GH-65390 (remove _offset_map cache); review/merge that first.

The big win is on tick-resolution offsets ("h", "min", "s", "ms", "us", "ns", "D"). The previous implementation built a Timedelta(1, unit=name), called delta_to_tick to wrap it as a Tick, then multiplied by float(stride) — which calls Tick.__mul__(float), which calls np.isclose on every invocation. That last step alone dominated profiles. Now we look the prefix up in a small {name: (TickKlass, factor)} dict and construct the Tick subclass directly for integer strides; fractional strides like "2.5min" still fall through to the old path so unit-promotion semantics are preserved.

A few smaller wins on the non-tick path:

  • Drop int(np.fabs(stride) * stride_sign) (numpy scalar dance) in favor of explicit sign-aware Python int multiplication, mirroring the tick path.
  • Convert _warn_about_deprecated_aliases and _validate_to_offset_alias to cdef. Both are only called from to_offset, so no public-API impact.
  • Cache alias.upper() in _validate_to_offset_alias (was being called up to 3 times).
  • Precompute c_PERIOD_AND_OFFSET_DEPR_FREQSTR.values() as a frozenset (was an O(n) values-view lookup per call).
  • Walk the regex split by index rather than zip(split[0::4], split[1::4], split[2::4]) — drops three list-slice copies and the zip object on every call.

Perf

Microbench, per-call timings (Python 3.13, M-series mac):

freq before after speedup
"h" 10.6 µs 0.84 µs 12.6x
"5min" 10.6 µs 0.93 µs 11.4x
"D" 10.8 µs 0.91 µs 11.9x
"3s" 10.5 µs 0.77 µs 13.7x
"3D" 10.8 µs 0.90 µs 12.0x
"-3D" 10.7 µs 1.02 µs 10.5x
"1D1h" 21.5 µs 1.59 µs 13.5x
"5h30min" 25.0 µs 4.55 µs 5.5x
"ME" 2.04 µs 1.08 µs 1.9x
"BMS" 2.30 µs 1.02 µs 2.3x
"YS-MAR" 2.50 µs 1.50 µs 1.7x
"B" 2.20 µs 1.23 µs 1.8x
"2.5min" 17.5 µs 16.9 µs 1.0x (intentionally unchanged)

BaseOffset passthrough, None, and timedelta paths are unchanged.

Also adds asv benchmarks (ToOffset, ToOffsetPassthrough) since none existed for to_offset.

Test plan

  • pandas/tests/tslibs/test_to_offset.py — passes
  • pandas/tests/tseries/offsets/ — passes
  • pandas/tests/tseries/frequencies/ — passes
  • pandas/tests/tslibs/ — passes
  • pandas/tests/indexes/period/ — passes
  • pandas/tests/indexes/datetimes/test_date_range.py — passes
  • pandas/tests/indexes/timedeltas/ — passes

jbrockmendel and others added 2 commits April 27, 2026 20:02
The dict cache inside _get_offset was added in 2012 to avoid re-running
offset construction. Construction is now fast enough (~0.3 us) that the
cache provides only a sub-microsecond per-call savings, and to_offset
itself never returned cached identity anyway because the trailing
`offset * stride` step always produces a fresh instance.
Construct Tick subclasses directly for integer-stride tick names
(h/min/s/ms/us/ns/D), avoiding the Timedelta + delta_to_tick +
Tick.__mul__(float) chain whose float multiplication invoked np.isclose.
Drop np.fabs in the non-tick branch in favor of explicit sign-aware
multiplication, convert _warn_about_deprecated_aliases and
_validate_to_offset_alias to cdef + cache .upper() on the alias,
precompute c_PERIOD_AND_OFFSET_DEPR_FREQSTR.values() as a frozenset,
and walk the regex split by index instead of via zip + slice triples.

Tick offsets like "h", "5min", "3s" go from ~10us to ~0.8-1.0us;
compound expressions like "1D1h" go from ~21us to ~1.6us; non-tick
names like "ME", "BMS", "YS-MAR" go from ~2us to ~1.0-1.5us.

Also adds asv benchmarks for to_offset itself, which had none.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 28, 2026
@jbrockmendel jbrockmendel marked this pull request as ready for review May 7, 2026 02:23
@mroeschke mroeschke added this to the 3.1 milestone May 7, 2026
Comment thread doc/source/whatsnew/v3.1.0.rst Outdated
- Performance improvement in :func:`read_sas` for SAS7BDAT files with full-precision (8-byte) numeric columns, with up to ~2x speedup on bulk reads (:issue:`47339`)
- Performance improvement in :func:`read_sas` for compressed SAS7BDAT files by reusing the decompression buffer instead of allocating per row (:issue:`47339`)
- Performance improvement in :func:`read_sas` when decoding strings (:issue:`47339`)
- Performance improvement in :func:`tseries.frequencies.to_offset` parsing of frequency strings, especially for tick-resolution offsets (e.g. ``"h"``, ``"5min"``, ``"3s"``) and compound expressions (e.g. ``"1D1h"``) (:issue:`XXXXX`)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Performance improvement in :func:`tseries.frequencies.to_offset` parsing of frequency strings, especially for tick-resolution offsets (e.g. ``"h"``, ``"5min"``, ``"3s"``) and compound expressions (e.g. ``"1D1h"``) (:issue:`XXXXX`)
- Performance improvement in :func:`tseries.frequencies.to_offset` parsing of frequency strings, especially for tick-resolution offsets (e.g. ``"h"``, ``"5min"``, ``"3s"``) and compound expressions (e.g. ``"1D1h"``) (:issue:`65395`)

Comment thread pandas/_libs/tslibs/offsets.pyx Outdated
Comment on lines +7393 to +7395
# split has 4*N + 1 elements where N is the number of segments;
# walking by index avoids three list-slice copies + zip overhead
# vs ``zip(split[0::4], split[1::4], split[2::4])``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# split has 4*N + 1 elements where N is the number of segments;
# walking by index avoids three list-slice copies + zip overhead
# vs ``zip(split[0::4], split[1::4], split[2::4])``.
# split has 4*N + 1 elements where N is the number of segments

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + green

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mroeschke mroeschke merged commit cd890dd into pandas-dev:main May 7, 2026
47 checks passed
@mroeschke
Copy link
Copy Markdown
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the perf-to_offset branch May 7, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants