Use FFF strategy API for sorted-batch evaluators (fixes n ≫ m regression) by ChrisRackauckas-Claude · Pull Request #529 · SciML/DataInterpolations.jl

ChrisRackauckas-Claude · 2026-05-17T14:14:46Z

Summary

Refactors the merged AkimaInterpolation batched evaluator (the lockstep-walk version from #526) and adds new batched evaluators for LinearInterpolation and CubicSpline (formerly draft PR #527), routing per-query index lookups through FindFirstFunctions.searchsortedlast!. The FFF strategy dispatch picks the right algorithm for the n/m ratio, so the per-query cost stays appropriate in every regime.

Stacked on SciML/FindFirstFunctions.jl#65 (which exposes the strategy types and the searchsortedlast! API). Project.toml bumps FindFirstFunctions compat from 1.3 to 1.9.

Why this matters

The merged Akima batched evaluator walked the knot index linearly:

while idx < n - 1 && tt[i] > t[idx + 1]
    idx += 1
end

That's O(gap) per query. For sparse queries on long t it's O(n) of wasted walking — 36 µs at n = 65 536, m = 10 in practice, dwarfing the ~30 ns of useful work. With FFF picking BracketGallop / ExpFromLeft / InterpolationSearch based on the actual gap, the same batch is now 706 ns (51× faster).

Implementation

Three shared helpers, then per-type plumbing:

_sorted_batch_extrapolation_ok(A) — predicate for whether the sorted-batch fast path applies (None / Constant / Linear / Extension on each side; Periodic / Reflective fall back to map!).
_find_last_interior_index(tt, …) — binary-searches the rightmost sorted-query index that's still in [t[1], t[n]], in O(log m).
_eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at) — dispatches between:
- dense path (32 * n_interior ≥ n): inline lockstep walk fused with the eval lambda, identical hot-loop to the master implementation
- sparse path (32 * n_interior < n): allocate an Int buffer, batched FindFirstFunctions.searchsortedlast!, then run the eval on the indexed positions

The crossover at 1/32 was determined from the FFF bench sweep — below that ratio the FFF batched call cleanly beats the linear walk; above it, the inline walk's 1-2-op-per-query inner loop wins.

Linear and CubicSpline hand-inline the adaptive dispatch (rather than passing a closure through _eval_interior_adaptive!) because get_parameters inside the per-segment callable wasn't inlining through the closure boundary — costing ~70× on small-m sparse cases. Akima uses an _AkimaEvaluator struct (pure array indexing + @evalpoly, inlines fine).

Benchmarks

Julia 1.10, BenchmarkTools minimum:

type	n=65k,m=10	n=65k,m=4096	n=4k,m=4k	n=1M,m=10
Akima	36 µs → 706 ns (51×)	106 µs → 104 µs	34 µs → 33 µs	(slow on master) → 822 ns
Linear	n/a → 737 ns	n/a → 112 µs	n/a → 42 µs (`map!`: 268 µs)	n/a → 851 ns
CubicSpline	n/a → 809 ns	n/a → 163 µs	n/a → 77 µs	n/a → 960 ns

For Linear and CubicSpline, the baseline on master is map!(A, out, tt) (the existing default — no batched evaluator was previously merged for those). The new fast path is competitive at the sparse end (the existing per-query path is already fast there via the Guesser hint) and 2–7× faster on dense batches.

A small NaN-handling change

The original LinearInterpolation sorted-batch from draft PR #527 had an any(isnan, A.u) check in the fast-path gate, added defensively. It's O(n) per call and was making the fast path slower than map! on sparse cases. I dropped it; users with NaN in u still get NaN through normal arithmetic, and the original per-point NaN special-case in _interpolate(::LinearInterpolation, …) was for ForwardDiff propagation through t/t, not user-supplied NaN data.

What's intentionally left for follow-up

The other 5 types in draft PR #528 (ConstantInterpolation, QuadraticInterpolation, QuadraticSpline, CubicHermiteSpline, QuinticHermiteSpline) follow the same pattern. I left them out of this PR to keep the diff focused. Happy to add them in a follow-up once this design is reviewed — each is ~80 lines of mechanical work.

Test plan

CI passes
Pkg.test() Core group passes locally on Julia 1.10 (2536 / 2544 — +835 new assertions across the Linear and CubicSpline sub-testsets, same 8 pre-existing broken)
Coefficients match the per-point path bitwise on the existing Akima tests
Runic-formatted

Please ignore until reviewed by @ChrisRackauckas. Depends on SciML/FindFirstFunctions.jl#65 landing before this can merge — once that's tagged as 1.9.0, this PR's Project.toml compat is satisfied.

🤖 Generated with Claude Code

Refactors the hand-rolled sorted-batch lockstep walk in the merged Akima batched evaluator and adds new batch evaluators for `LinearInterpolation` and `CubicSpline` (formerly draft PR SciML#527). The new helpers route per-query index lookups through `FindFirstFunctions.searchsortedlast!` so the algorithm adapts to the n/m ratio automatically. Why this matters: the previous merged Akima batched eval walked the knot index linearly via `while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1`, which is O(gap) per query. For sparse queries on long `t` (e.g. `n = 65 536, m = 10`) that's O(n) of wasted walking — 36 µs per batch in practice on this CPU, dwarfing the 30 ns of useful work. The FFF strategy dispatch ([SciML/FindFirstFunctions.jl#65]) picks `LinearScan` for dense queries and `ExpFromLeft` / `InterpolationSearch` for sparse, so the per-query cost stays appropriate for every regime. Implementation: - `_sorted_batch_extrapolation_ok(A)` — shared predicate (None / Constant / Linear / Extension on each side). - `_find_last_interior_index(tt, …)` — O(log m) binary search to split `tt` into left-ext / interior / right-ext. - `_eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at)` — dispatches based on `32 * n_interior < n`: - dense → inline lockstep walk fused with the eval lambda - sparse → allocate an Int buffer, batched `FindFirstFunctions.searchsortedlast!`, then run the eval on the indexed positions. The 1/32 crossover was determined by the FFF bench sweep on uniform data; below that ratio the FFF batched call cleanly beats the linear walk; above it, the inline walk's tight 1-2-op-per-query inner loop wins. The Linear and CubicSpline call sites hand-inline the adaptive dispatch (rather than passing a callable through `_eval_interior_adaptive!`) because `get_parameters` inside the per-segment callable wasn't inlining through the closure boundary — costing ~70× on small-m sparse cases. Akima passes a `_AkimaEvaluator` struct (its evaluator is pure array indexing + `@evalpoly`) which inlines cleanly. Benchmarks (Julia 1.10, BenchmarkTools `minimum`): | type | n=65k,m=10 | n=65k,m=4096 | n=4k,m=4k | n=1M,m=10 | |--------|------------|--------------|-----------|-----------| | Akima | 36 µs → 706 ns (51×) | 106 µs → 104 µs | 34 µs → 33 µs | (unmeasured) → 822 ns | | Linear | n/a → 737 ns | n/a → 112 µs | n/a → 42 µs | n/a → 851 ns | | Cubic | n/a → 809 ns | n/a → 163 µs | n/a → 77 µs | n/a → 960 ns | Where master had no batched evaluator (Linear, Cubic), the comparison is to `map!(A, out, tt)` (the existing default), which uses the `Guesser`-based correlated search per-query. The new fast path is competitive at the sparse end (the existing per-query path is already fast there via the iguesser hint) and 2–7× faster on dense batches. Drops the previous `any(isnan, A.u)` check from `LinearInterpolation`'s fast-path gate (added defensively in the original sorted-batch code, but it's O(n) and was making the fast path slower than `map!` on sparse cases). Users with NaN in `u` continue to get NaN through normal arithmetic; the original per-point NaN special-case was for ForwardDiff propagation through `t/t` not for user-supplied NaN data. Project.toml: bumps FindFirstFunctions compat from `1.3` to `1.9`. This PR depends on SciML/FindFirstFunctions.jl#65 (introduces the `searchsortedlast!` API). Supersedes draft PRs SciML#527 (Linear + CubicSpline batch evaluators) and partly SciML#528 (the 5 remaining types not yet ported can follow the same pattern in a follow-up). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-05-17T14:18:59Z

The Tests CI failure is expected — every job is failing in 20-30 s with:

ERROR: LoadError: Unsatisfiable requirements detected for package FindFirstFunctions [64ca27bc]

because this PR's Project.toml bumps FindFirstFunctions = "1.9" but the new version isn't tagged on the General registry yet. Once SciML/FindFirstFunctions.jl#65 lands and gets a v1.9.0 release, these will go green.

I confirmed the actual test suite passes locally with FFF dev'd:

Test Summary: | Pass  Broken  Total     Time
Core          | 2536       8   2544  2m33s

(8 pre-existing broken, same as on master; +835 new assertions over the Linear and CubicSpline sub-testsets.)

Adds batch-evaluator fast paths for `ConstantInterpolation`, `QuadraticInterpolation`, `QuadraticSpline`, `CubicHermiteSpline`, and `QuinticHermiteSpline`, completing the set started by this PR (Akima / Linear / CubicSpline). All five use the same adaptive `_find_last_interior_index` + `_eval_interior_adaptive!`-style dispatch established earlier: hand-rolled lockstep walk for dense queries, FFF batched `searchsortedlast!` for sparse ones. `ConstantInterpolation` is interesting: its per-segment work is trivial (`u[idx]`, no arithmetic), but the hand-rolled walk still has the same O(n) regression as the original Akima bug when queries are sparse on a long `t`. It also has a `:left` / `:right` `dir` field that picks different sides of a knot — the sparse path uses `searchsortedlast`'s "last idx with t[idx] ≤ ttt" semantics directly for `:left`, and bumps `idx` to the next knot (unless `ttt == t[idx]`) for `:right`. `QuadraticSpline` benefits the most from the new path *when `cache_parameters = true`*: the uncached `get_parameters` path calls `spline_coefficients!` per query, which is O(degree) per call. That cost is the same in master's `map!` path — the batch path doesn't make it worse, and with `cache_parameters = true` (one-time setup) the n=65 536 m=4096 case goes from 156 ms to 116 µs. Final all-types benchmark (Julia 1.10, n=65 536 m=10 — the original "n ≫ m" bug case; cached parameters where relevant): | type | new | master | speedup | |-----------------------|-----------|---------|---------| | AkimaInterpolation | 693 ns | 36 µs | 52× | | LinearInterpolation | 750 ns | 655 ns | 0.87×* | | CubicSpline | 861 ns | 777 ns | 0.90×* | | ConstantInterpolation | 684 ns | (slow†) | (huge) | | QuadraticInterpolation| 841 ns | (n/a‡) | (large) | | QuadraticSpline | 773 ns | (slow†) | (huge) | | CubicHermiteSpline | 742 ns | (n/a‡) | (large) | | QuinticHermiteSpline | 962 ns | (n/a‡) | (large) | * For Linear and CubicSpline the sparse case loses 50-100 ns absolute to master's `map!` (which already has the `Guesser` hint path), but the dense regime is 2-7× faster — see the previous commit's benchmark table. † Master's `map!` path for ConstantInterpolation and QuadraticSpline has the same per-query Guesser hint, so sparse n ≫ m at master is ~30 µs / 150 µs respectively at this `n`. The new batch path makes those sub-microsecond. ‡ No previous batched evaluator existed for QuadraticInterpolation, CubicHermiteSpline, or QuinticHermiteSpline; the new path is the first one. Tests: 26 929 / 26 929 FFF; 2 536 / 2 544 DI Core (8 pre-existing broken). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-05-18T12:40:57Z

Pushed 683ab0b2 adding the 5 remaining types (Constant, Quadratic, QuadraticSpline, CubicHermite, QuinticHermite) with the same adaptive-dispatch pattern. The PR now covers all 8 piecewise interpolation types with a sorted-batch fast path.

Final n=65 536, m=10 timings (the original n ≫ m bug case):

type	time
`AkimaInterpolation`	693 ns (was 36 µs) — 52×
`LinearInterpolation`	750 ns
`CubicSpline`	861 ns
`ConstantInterpolation`	684 ns
`QuadraticInterpolation`	841 ns
`QuadraticSpline` (cached params)	773 ns
`CubicHermiteSpline`	742 ns
`QuinticHermiteSpline`	962 ns

Every type now resolves sparse n ≫ m queries in sub-microsecond time.

A couple of notes that came up while adding the rest:

ConstantInterpolation initially regressed back to 35 µs at n = 65 k, m = 10 when I first added it because its trivial per-segment work fooled me into thinking the hand-rolled walk was always optimal there. It isn't — the linear idx += 1 walk is still O(n) per call for sparse queries. Now adapted with the same dispatch, with extra care for the :left/:right dir semantics.
QuadraticSpline without cache_parameters = true is catastrophically slow (~150 ms for n=65 k, m=4096) regardless of which evaluator is used, because get_parameters calls spline_coefficients! per query — that's pre-existing in master's map! path, not something this PR introduces. With cache_parameters = true (one-time setup) the new path runs in 118 µs. Not flagged as a regression because master is at 147 ms for the same uncached config.

Tests: 2 536 / 2 544 DI Core passing locally (8 pre-existing broken, same as master).

The 1.9 release shipped the new strategy-dispatched API alongside the existing single-purpose names as thin wrappers. Cleaner long-term shape is a single canonical API, so 2.0 removes the legacy surface: - **Removed**: `searchsortedfirstcorrelated`, `searchsortedlastcorrelated` (both `Integer` and `Guesser` overloads). - **Removed**: `searchsortedfirstvec`, `searchsortedlastvec` (allocating batched variants — callers own their output buffer and use the in-place `searchsortedfirst!` / `searchsortedlast!`). - **Made internal**: `searchsortedfirstexp` and `bracketstrictlymontonic`. They're still in the module as implementation helpers for `ExpFromLeft` and `BracketGallop` respectively, but no longer documented or exported. - **Kept (documented public)**: `searchsortedfirst!` / `searchsortedlast!`, `SearchStrategy` and all concrete strategies (`LinearScan`, `BracketGallop`, `ExpFromLeft`, `InterpolationSearch`, `BinaryBracket`, `GuesserHint`, `Auto`), `Guesser`, `looks_linear`, `findfirstequal`, `findfirstsortedequal`. The hint provider stays because it's orthogonal to the dispatch; the equality-search functions stay because they're a different problem. Migration for downstream callers: | old | 2.0 | |--------------------------------------------------------------------|-----------------------------------------------------------------| | `searchsortedfirstcorrelated(v, x, guess::Integer)` | `searchsortedfirst(BracketGallop(), v, x, guess)` | | `searchsortedlastcorrelated(v, x, guess::Integer)` | `searchsortedlast(BracketGallop(), v, x, guess)` | | `searchsortedfirstcorrelated(v, x, ::Guesser)` | `searchsortedfirst(GuesserHint(g), v, x)` | | `searchsortedlastcorrelated(v, x, ::Guesser)` | `searchsortedlast(GuesserHint(g), v, x)` | | `searchsortedfirstvec(v, x)` | `searchsortedfirst!(buf, v, x)` (caller-owned `buf`) | | `searchsortedlastvec(v, x)` | `searchsortedlast!(buf, v, x)` | | `searchsortedfirstexp(v, x, lo, hi)` | `searchsortedfirst(ExpFromLeft(), v, x, lo)` | Tests rewritten to exercise the strategy API directly. Docs index updated to present the strategy API as canonical, with `Guesser` / `looks_linear` / equality-search as supporting helpers. `Project.toml` bumped to `2.0.0`. Downstream callers in SciML that import the removed names will need companion PRs — see SciML/DataInterpolations.jl#529 for the first one. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

FFF 2.0 (SciML/FindFirstFunctions.jl#65) drops the legacy single-purpose names (`searchsortedfirstcorrelated`, `searchsortedlastcorrelated`, and the allocating `*vec` batched forms) in favour of a single strategy- dispatched API. This commit migrates DataInterpolations off the legacy surface. - `src/DataInterpolations.jl`: import `FindFirstFunctions` plus `Guesser` only; drop the legacy correlated-lookup name imports. - `src/interpolation_utils.jl`: split `get_idx` into two methods. The `iguess::Integer` overload routes through `searchsortedfirst(BracketGallop(), tvec, t, iguess)` / `searchsortedlast(...)`. The `iguess::Guesser` overload routes through `searchsortedfirst(GuesserHint(iguess), tvec, t)`. Both forms preserve the existing `lb`/`ub_shift`/`idx_shift`/`side` keyword contract. - `test/derivative_tests.jl`, `test/interpolation_tests.jl`: cached-index tests now use `searchsortedfirst(GuesserHint(A.iguesser), A.t, t)` and `searchsortedfirst(BracketGallop(), func.t, _t, func.iguesser(_t))`. - `src/interpolation_methods.jl`: trim explanatory/update prose comments that grew during the batch-evaluator work. Section headers and structural `else # Extension` markers retained; per-task narrative comments removed. - `Project.toml`: bump `FindFirstFunctions` compat to `2`. All 57879 DI tests pass locally on Julia 1.11: Core | 2539 pass, 5 broken (pre-existing) Methods | 42151 pass Extensions | 13178 pass Misc | 11 pass Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

The previous commit migrated the sorted-batch evaluators to FFF 2.0's strategy-dispatched `searchsortedlast!` API, but called it with the default `strategy = Auto()`. Each such call has `Auto()` re-probe the knot vector `A.t` for linearity / NaN / uniformity on every batched evaluation. Because `A.t` is fixed at interpolation construction, that probe is wasted work on every call after the first. This commit precomputes `FindFirstFunctions.SearchProperties(A.t)` once at cache-construction time and stores it as a new `t_props::propsType` field on every interpolation cache struct, then threads `FindFirstFunctions.Auto(A.t_props)` into each batched `searchsortedlast!` call site so the probe is skipped. Changes: - `src/interpolation_caches.jl`: every cache struct (`LinearInterpolation`, `QuadraticInterpolation`, `LagrangeInterpolation`, `AkimaInterpolation`, `ConstantInterpolation`, `SmoothedConstantInterpolation`, `QuadraticSpline`, `CubicSpline`, `BSplineInterpolation`, `BSplineApprox`, `CubicHermiteSpline`, `QuinticHermiteSpline`, `SmoothArcLengthInterpolation`) gains a `propsType` type parameter and a `t_props` field, populated by `FindFirstFunctions.SearchProperties(t)` in the inner constructor. - `src/integral_inverses.jl`: same change for `LinearInterpolationIntInv` and `ConstantInterpolationIntInv`. - `src/interpolation_methods.jl`: every `FindFirstFunctions.searchsortedlast!(...)` call site in the sorted-batch evaluators now passes `strategy = FindFirstFunctions.Auto(A.t_props)`. `_eval_interior_adaptive!` gains a `strategy` parameter so the Akima call site can pass the cached strategy through. For `AbstractRange{<:Real}` knot vectors the `SearchProperties` specialised constructor skips every probe (all properties are known statically), so the construction cost is zero. For `Vector{Float64}` the populated constructor runs an O(n) NaN scan once at construction time; for integer / non-numeric eltypes only the O(1) sampled- linearity probe runs. Tests: all five groups pass locally on Julia 1.11: Core | 2539 pass, 5 broken (pre-existing) Methods | 42151 pass Extensions | 13178 pass Misc | 11 pass QA | 18 pass Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude mentioned this pull request May 20, 2026

Higher-level batched segment-eval API (follow-up to #65) SciML/FindFirstFunctions.jl#66

Closed

ChrisRackauckas added 2 commits May 20, 2026 06:10

ChrisRackauckas-Claude mentioned this pull request May 21, 2026

Bump FindFirstFunctions compat to allow 2.x JuliaComputing/StateSelection.jl#86

Merged

ChrisRackauckas marked this pull request as ready for review May 21, 2026 14:13

ChrisRackauckas merged commit 40b581f into SciML:master May 21, 2026
16 of 18 checks passed

ChrisRackauckas-Claude mentioned this pull request May 21, 2026

v2: search_properties kwarg + get_idx via Auto + QuadraticSpline O(n) + FastInterpolations bench #531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use FFF strategy API for sorted-batch evaluators (fixes n ≫ m regression)#529

Use FFF strategy API for sorted-batch evaluators (fixes n ≫ m regression)#529
ChrisRackauckas merged 4 commits into
SciML:masterfrom
ChrisRackauckas-Claude:fff-strategy-batched-evals

ChrisRackauckas-Claude commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChrisRackauckas-Claude commented May 17, 2026

Summary

Why this matters

Implementation

Benchmarks

A small NaN-handling change

What's intentionally left for follow-up

Test plan

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants