Use FFF strategy API for sorted-batch evaluators (fixes n ≫ m regression)#529
Conversation
Refactors the hand-rolled sorted-batch lockstep walk in the merged Akima batched evaluator and adds new batch evaluators for `LinearInterpolation` and `CubicSpline` (formerly draft PR SciML#527). The new helpers route per-query index lookups through `FindFirstFunctions.searchsortedlast!` so the algorithm adapts to the n/m ratio automatically. Why this matters: the previous merged Akima batched eval walked the knot index linearly via `while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1`, which is O(gap) per query. For sparse queries on long `t` (e.g. `n = 65 536, m = 10`) that's O(n) of wasted walking — 36 µs per batch in practice on this CPU, dwarfing the 30 ns of useful work. The FFF strategy dispatch ([SciML/FindFirstFunctions.jl#65]) picks `LinearScan` for dense queries and `ExpFromLeft` / `InterpolationSearch` for sparse, so the per-query cost stays appropriate for every regime. Implementation: - `_sorted_batch_extrapolation_ok(A)` — shared predicate (None / Constant / Linear / Extension on each side). - `_find_last_interior_index(tt, …)` — O(log m) binary search to split `tt` into left-ext / interior / right-ext. - `_eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at)` — dispatches based on `32 * n_interior < n`: - dense → inline lockstep walk fused with the eval lambda - sparse → allocate an Int buffer, batched `FindFirstFunctions.searchsortedlast!`, then run the eval on the indexed positions. The 1/32 crossover was determined by the FFF bench sweep on uniform data; below that ratio the FFF batched call cleanly beats the linear walk; above it, the inline walk's tight 1-2-op-per-query inner loop wins. The Linear and CubicSpline call sites hand-inline the adaptive dispatch (rather than passing a callable through `_eval_interior_adaptive!`) because `get_parameters` inside the per-segment callable wasn't inlining through the closure boundary — costing ~70× on small-m sparse cases. Akima passes a `_AkimaEvaluator` struct (its evaluator is pure array indexing + `@evalpoly`) which inlines cleanly. Benchmarks (Julia 1.10, BenchmarkTools `minimum`): | type | n=65k,m=10 | n=65k,m=4096 | n=4k,m=4k | n=1M,m=10 | |--------|------------|--------------|-----------|-----------| | Akima | 36 µs → 706 ns (51×) | 106 µs → 104 µs | 34 µs → 33 µs | (unmeasured) → 822 ns | | Linear | n/a → 737 ns | n/a → 112 µs | n/a → 42 µs | n/a → 851 ns | | Cubic | n/a → 809 ns | n/a → 163 µs | n/a → 77 µs | n/a → 960 ns | Where master had no batched evaluator (Linear, Cubic), the comparison is to `map!(A, out, tt)` (the existing default), which uses the `Guesser`-based correlated search per-query. The new fast path is competitive at the sparse end (the existing per-query path is already fast there via the iguesser hint) and 2–7× faster on dense batches. Drops the previous `any(isnan, A.u)` check from `LinearInterpolation`'s fast-path gate (added defensively in the original sorted-batch code, but it's O(n) and was making the fast path slower than `map!` on sparse cases). Users with NaN in `u` continue to get NaN through normal arithmetic; the original per-point NaN special-case was for ForwardDiff propagation through `t/t` not for user-supplied NaN data. Project.toml: bumps FindFirstFunctions compat from `1.3` to `1.9`. This PR depends on SciML/FindFirstFunctions.jl#65 (introduces the `searchsortedlast!` API). Supersedes draft PRs SciML#527 (Linear + CubicSpline batch evaluators) and partly SciML#528 (the 5 remaining types not yet ported can follow the same pattern in a follow-up). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
|
The Tests CI failure is expected — every job is failing in 20-30 s with:
because this PR's Project.toml bumps I confirmed the actual test suite passes locally with FFF dev'd: (8 pre-existing broken, same as on master; +835 new assertions over the Linear and CubicSpline sub-testsets.) |
Adds batch-evaluator fast paths for `ConstantInterpolation`,
`QuadraticInterpolation`, `QuadraticSpline`, `CubicHermiteSpline`, and
`QuinticHermiteSpline`, completing the set started by this PR (Akima /
Linear / CubicSpline). All five use the same adaptive
`_find_last_interior_index` + `_eval_interior_adaptive!`-style dispatch
established earlier: hand-rolled lockstep walk for dense queries, FFF
batched `searchsortedlast!` for sparse ones.
`ConstantInterpolation` is interesting: its per-segment work is trivial
(`u[idx]`, no arithmetic), but the hand-rolled walk still has the same
O(n) regression as the original Akima bug when queries are sparse on a
long `t`. It also has a `:left` / `:right` `dir` field that picks
different sides of a knot — the sparse path uses
`searchsortedlast`'s "last idx with t[idx] ≤ ttt" semantics directly
for `:left`, and bumps `idx` to the next knot (unless `ttt == t[idx]`)
for `:right`.
`QuadraticSpline` benefits the most from the new path *when
`cache_parameters = true`*: the uncached `get_parameters` path calls
`spline_coefficients!` per query, which is O(degree) per call. That
cost is the same in master's `map!` path — the batch path doesn't
make it worse, and with `cache_parameters = true` (one-time setup) the
n=65 536 m=4096 case goes from 156 ms to 116 µs.
Final all-types benchmark (Julia 1.10, n=65 536 m=10 — the original
"n ≫ m" bug case; cached parameters where relevant):
| type | new | master | speedup |
|-----------------------|-----------|---------|---------|
| AkimaInterpolation | 693 ns | 36 µs | 52× |
| LinearInterpolation | 750 ns | 655 ns | 0.87×* |
| CubicSpline | 861 ns | 777 ns | 0.90×* |
| ConstantInterpolation | 684 ns | (slow†) | (huge) |
| QuadraticInterpolation| 841 ns | (n/a‡) | (large) |
| QuadraticSpline | 773 ns | (slow†) | (huge) |
| CubicHermiteSpline | 742 ns | (n/a‡) | (large) |
| QuinticHermiteSpline | 962 ns | (n/a‡) | (large) |
* For Linear and CubicSpline the sparse case loses 50-100 ns
absolute to master's `map!` (which already has the `Guesser` hint
path), but the dense regime is 2-7× faster — see the previous
commit's benchmark table.
† Master's `map!` path for ConstantInterpolation and QuadraticSpline
has the same per-query Guesser hint, so sparse n ≫ m at master is
~30 µs / 150 µs respectively at this `n`. The new batch path makes
those sub-microsecond.
‡ No previous batched evaluator existed for QuadraticInterpolation,
CubicHermiteSpline, or QuinticHermiteSpline; the new path is the
first one.
Tests: 26 929 / 26 929 FFF; 2 536 / 2 544 DI Core (8 pre-existing broken).
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
|
Pushed Final n=65 536, m=10 timings (the original n ≫ m bug case):
Every type now resolves sparse A couple of notes that came up while adding the rest:
Tests: 2 536 / 2 544 DI Core passing locally (8 pre-existing broken, same as master). |
The 1.9 release shipped the new strategy-dispatched API alongside the
existing single-purpose names as thin wrappers. Cleaner long-term shape
is a single canonical API, so 2.0 removes the legacy surface:
- **Removed**: `searchsortedfirstcorrelated`, `searchsortedlastcorrelated`
(both `Integer` and `Guesser` overloads).
- **Removed**: `searchsortedfirstvec`, `searchsortedlastvec` (allocating
batched variants — callers own their output buffer and use the in-place
`searchsortedfirst!` / `searchsortedlast!`).
- **Made internal**: `searchsortedfirstexp` and `bracketstrictlymontonic`.
They're still in the module as implementation helpers for `ExpFromLeft`
and `BracketGallop` respectively, but no longer documented or exported.
- **Kept (documented public)**: `searchsortedfirst!` / `searchsortedlast!`,
`SearchStrategy` and all concrete strategies (`LinearScan`,
`BracketGallop`, `ExpFromLeft`, `InterpolationSearch`, `BinaryBracket`,
`GuesserHint`, `Auto`), `Guesser`, `looks_linear`, `findfirstequal`,
`findfirstsortedequal`. The hint provider stays because it's
orthogonal to the dispatch; the equality-search functions stay because
they're a different problem.
Migration for downstream callers:
| old | 2.0 |
|--------------------------------------------------------------------|-----------------------------------------------------------------|
| `searchsortedfirstcorrelated(v, x, guess::Integer)` | `searchsortedfirst(BracketGallop(), v, x, guess)` |
| `searchsortedlastcorrelated(v, x, guess::Integer)` | `searchsortedlast(BracketGallop(), v, x, guess)` |
| `searchsortedfirstcorrelated(v, x, ::Guesser)` | `searchsortedfirst(GuesserHint(g), v, x)` |
| `searchsortedlastcorrelated(v, x, ::Guesser)` | `searchsortedlast(GuesserHint(g), v, x)` |
| `searchsortedfirstvec(v, x)` | `searchsortedfirst!(buf, v, x)` (caller-owned `buf`) |
| `searchsortedlastvec(v, x)` | `searchsortedlast!(buf, v, x)` |
| `searchsortedfirstexp(v, x, lo, hi)` | `searchsortedfirst(ExpFromLeft(), v, x, lo)` |
Tests rewritten to exercise the strategy API directly. Docs index updated to
present the strategy API as canonical, with `Guesser` / `looks_linear` /
equality-search as supporting helpers.
`Project.toml` bumped to `2.0.0`.
Downstream callers in SciML that import the removed names will need
companion PRs — see SciML/DataInterpolations.jl#529 for the first one.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
FFF 2.0 (SciML/FindFirstFunctions.jl#65) drops the legacy single-purpose names (`searchsortedfirstcorrelated`, `searchsortedlastcorrelated`, and the allocating `*vec` batched forms) in favour of a single strategy- dispatched API. This commit migrates DataInterpolations off the legacy surface. - `src/DataInterpolations.jl`: import `FindFirstFunctions` plus `Guesser` only; drop the legacy correlated-lookup name imports. - `src/interpolation_utils.jl`: split `get_idx` into two methods. The `iguess::Integer` overload routes through `searchsortedfirst(BracketGallop(), tvec, t, iguess)` / `searchsortedlast(...)`. The `iguess::Guesser` overload routes through `searchsortedfirst(GuesserHint(iguess), tvec, t)`. Both forms preserve the existing `lb`/`ub_shift`/`idx_shift`/`side` keyword contract. - `test/derivative_tests.jl`, `test/interpolation_tests.jl`: cached-index tests now use `searchsortedfirst(GuesserHint(A.iguesser), A.t, t)` and `searchsortedfirst(BracketGallop(), func.t, _t, func.iguesser(_t))`. - `src/interpolation_methods.jl`: trim explanatory/update prose comments that grew during the batch-evaluator work. Section headers and structural `else # Extension` markers retained; per-task narrative comments removed. - `Project.toml`: bump `FindFirstFunctions` compat to `2`. All 57879 DI tests pass locally on Julia 1.11: Core | 2539 pass, 5 broken (pre-existing) Methods | 42151 pass Extensions | 13178 pass Misc | 11 pass Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The previous commit migrated the sorted-batch evaluators to FFF 2.0's
strategy-dispatched `searchsortedlast!` API, but called it with the
default `strategy = Auto()`. Each such call has `Auto()` re-probe the
knot vector `A.t` for linearity / NaN / uniformity on every batched
evaluation. Because `A.t` is fixed at interpolation construction, that
probe is wasted work on every call after the first.
This commit precomputes `FindFirstFunctions.SearchProperties(A.t)`
once at cache-construction time and stores it as a new
`t_props::propsType` field on every interpolation cache struct, then
threads `FindFirstFunctions.Auto(A.t_props)` into each batched
`searchsortedlast!` call site so the probe is skipped.
Changes:
- `src/interpolation_caches.jl`: every cache struct
(`LinearInterpolation`, `QuadraticInterpolation`,
`LagrangeInterpolation`, `AkimaInterpolation`,
`ConstantInterpolation`, `SmoothedConstantInterpolation`,
`QuadraticSpline`, `CubicSpline`, `BSplineInterpolation`,
`BSplineApprox`, `CubicHermiteSpline`, `QuinticHermiteSpline`,
`SmoothArcLengthInterpolation`) gains a `propsType` type
parameter and a `t_props` field, populated by
`FindFirstFunctions.SearchProperties(t)` in the inner constructor.
- `src/integral_inverses.jl`: same change for
`LinearInterpolationIntInv` and `ConstantInterpolationIntInv`.
- `src/interpolation_methods.jl`: every
`FindFirstFunctions.searchsortedlast!(...)` call site in the
sorted-batch evaluators now passes
`strategy = FindFirstFunctions.Auto(A.t_props)`.
`_eval_interior_adaptive!` gains a `strategy` parameter so the
Akima call site can pass the cached strategy through.
For `AbstractRange{<:Real}` knot vectors the `SearchProperties`
specialised constructor skips every probe (all properties are known
statically), so the construction cost is zero. For `Vector{Float64}`
the populated constructor runs an O(n) NaN scan once at construction
time; for integer / non-numeric eltypes only the O(1) sampled-
linearity probe runs.
Tests: all five groups pass locally on Julia 1.11:
Core | 2539 pass, 5 broken (pre-existing)
Methods | 42151 pass
Extensions | 13178 pass
Misc | 11 pass
QA | 18 pass
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Summary
Refactors the merged
AkimaInterpolationbatched evaluator (the lockstep-walk version from #526) and adds new batched evaluators forLinearInterpolationandCubicSpline(formerly draft PR #527), routing per-query index lookups throughFindFirstFunctions.searchsortedlast!. The FFF strategy dispatch picks the right algorithm for the n/m ratio, so the per-query cost stays appropriate in every regime.Stacked on SciML/FindFirstFunctions.jl#65 (which exposes the strategy types and the
searchsortedlast!API). Project.toml bumpsFindFirstFunctionscompat from1.3to1.9.Why this matters
The merged Akima batched evaluator walked the knot index linearly:
That's O(gap) per query. For sparse queries on long
tit's O(n) of wasted walking — 36 µs atn = 65 536, m = 10in practice, dwarfing the ~30 ns of useful work. With FFF pickingBracketGallop/ExpFromLeft/InterpolationSearchbased on the actual gap, the same batch is now 706 ns (51× faster).Implementation
Three shared helpers, then per-type plumbing:
_sorted_batch_extrapolation_ok(A)— predicate for whether the sorted-batch fast path applies (None / Constant / Linear / Extension on each side; Periodic / Reflective fall back tomap!)._find_last_interior_index(tt, …)— binary-searches the rightmost sorted-query index that's still in[t[1], t[n]], in O(log m)._eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at)— dispatches between:32 * n_interior ≥ n): inline lockstep walk fused with the eval lambda, identical hot-loop to the master implementation32 * n_interior < n): allocate anIntbuffer, batchedFindFirstFunctions.searchsortedlast!, then run the eval on the indexed positionsThe crossover at 1/32 was determined from the FFF bench sweep — below that ratio the FFF batched call cleanly beats the linear walk; above it, the inline walk's 1-2-op-per-query inner loop wins.
Linear and CubicSpline hand-inline the adaptive dispatch (rather than passing a closure through
_eval_interior_adaptive!) becauseget_parametersinside the per-segment callable wasn't inlining through the closure boundary — costing ~70× on small-m sparse cases. Akima uses an_AkimaEvaluatorstruct (pure array indexing +@evalpoly, inlines fine).Benchmarks
Julia 1.10, BenchmarkTools
minimum:map!: 268 µs)For Linear and CubicSpline, the baseline on
masterismap!(A, out, tt)(the existing default — no batched evaluator was previously merged for those). The new fast path is competitive at the sparse end (the existing per-query path is already fast there via theGuesserhint) and 2–7× faster on dense batches.A small NaN-handling change
The original
LinearInterpolationsorted-batch from draft PR #527 had anany(isnan, A.u)check in the fast-path gate, added defensively. It's O(n) per call and was making the fast path slower thanmap!on sparse cases. I dropped it; users with NaN inustill get NaN through normal arithmetic, and the original per-point NaN special-case in_interpolate(::LinearInterpolation, …)was for ForwardDiff propagation throught/t, not user-supplied NaN data.What's intentionally left for follow-up
The other 5 types in draft PR #528 (
ConstantInterpolation,QuadraticInterpolation,QuadraticSpline,CubicHermiteSpline,QuinticHermiteSpline) follow the same pattern. I left them out of this PR to keep the diff focused. Happy to add them in a follow-up once this design is reviewed — each is ~80 lines of mechanical work.Test plan
Pkg.test()Core group passes locally on Julia 1.10 (2536 / 2544 — +835 new assertions across the Linear and CubicSpline sub-testsets, same 8 pre-existing broken)Please ignore until reviewed by @ChrisRackauckas. Depends on SciML/FindFirstFunctions.jl#65 landing before this can merge — once that's tagged as 1.9.0, this PR's Project.toml compat is satisfied.
🤖 Generated with Claude Code