Skip to content

Use FFF strategy API for sorted-batch evaluators (fixes n ≫ m regression)#529

Merged
ChrisRackauckas merged 4 commits into
SciML:masterfrom
ChrisRackauckas-Claude:fff-strategy-batched-evals
May 21, 2026
Merged

Use FFF strategy API for sorted-batch evaluators (fixes n ≫ m regression)#529
ChrisRackauckas merged 4 commits into
SciML:masterfrom
ChrisRackauckas-Claude:fff-strategy-batched-evals

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Summary

Refactors the merged AkimaInterpolation batched evaluator (the lockstep-walk version from #526) and adds new batched evaluators for LinearInterpolation and CubicSpline (formerly draft PR #527), routing per-query index lookups through FindFirstFunctions.searchsortedlast!. The FFF strategy dispatch picks the right algorithm for the n/m ratio, so the per-query cost stays appropriate in every regime.

Stacked on SciML/FindFirstFunctions.jl#65 (which exposes the strategy types and the searchsortedlast! API). Project.toml bumps FindFirstFunctions compat from 1.3 to 1.9.

Why this matters

The merged Akima batched evaluator walked the knot index linearly:

while idx < n - 1 && tt[i] > t[idx + 1]
    idx += 1
end

That's O(gap) per query. For sparse queries on long t it's O(n) of wasted walking — 36 µs at n = 65 536, m = 10 in practice, dwarfing the ~30 ns of useful work. With FFF picking BracketGallop / ExpFromLeft / InterpolationSearch based on the actual gap, the same batch is now 706 ns (51× faster).

Implementation

Three shared helpers, then per-type plumbing:

  • _sorted_batch_extrapolation_ok(A) — predicate for whether the sorted-batch fast path applies (None / Constant / Linear / Extension on each side; Periodic / Reflective fall back to map!).
  • _find_last_interior_index(tt, …) — binary-searches the rightmost sorted-query index that's still in [t[1], t[n]], in O(log m).
  • _eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at) — dispatches between:
    • dense path (32 * n_interior ≥ n): inline lockstep walk fused with the eval lambda, identical hot-loop to the master implementation
    • sparse path (32 * n_interior < n): allocate an Int buffer, batched FindFirstFunctions.searchsortedlast!, then run the eval on the indexed positions

The crossover at 1/32 was determined from the FFF bench sweep — below that ratio the FFF batched call cleanly beats the linear walk; above it, the inline walk's 1-2-op-per-query inner loop wins.

Linear and CubicSpline hand-inline the adaptive dispatch (rather than passing a closure through _eval_interior_adaptive!) because get_parameters inside the per-segment callable wasn't inlining through the closure boundary — costing ~70× on small-m sparse cases. Akima uses an _AkimaEvaluator struct (pure array indexing + @evalpoly, inlines fine).

Benchmarks

Julia 1.10, BenchmarkTools minimum:

type n=65k,m=10 n=65k,m=4096 n=4k,m=4k n=1M,m=10
Akima 36 µs → 706 ns (51×) 106 µs → 104 µs 34 µs → 33 µs (slow on master) → 822 ns
Linear n/a → 737 ns n/a → 112 µs n/a → 42 µs (map!: 268 µs) n/a → 851 ns
CubicSpline n/a → 809 ns n/a → 163 µs n/a → 77 µs n/a → 960 ns

For Linear and CubicSpline, the baseline on master is map!(A, out, tt) (the existing default — no batched evaluator was previously merged for those). The new fast path is competitive at the sparse end (the existing per-query path is already fast there via the Guesser hint) and 2–7× faster on dense batches.

A small NaN-handling change

The original LinearInterpolation sorted-batch from draft PR #527 had an any(isnan, A.u) check in the fast-path gate, added defensively. It's O(n) per call and was making the fast path slower than map! on sparse cases. I dropped it; users with NaN in u still get NaN through normal arithmetic, and the original per-point NaN special-case in _interpolate(::LinearInterpolation, …) was for ForwardDiff propagation through t/t, not user-supplied NaN data.

What's intentionally left for follow-up

The other 5 types in draft PR #528 (ConstantInterpolation, QuadraticInterpolation, QuadraticSpline, CubicHermiteSpline, QuinticHermiteSpline) follow the same pattern. I left them out of this PR to keep the diff focused. Happy to add them in a follow-up once this design is reviewed — each is ~80 lines of mechanical work.

Test plan

  • CI passes
  • Pkg.test() Core group passes locally on Julia 1.10 (2536 / 2544 — +835 new assertions across the Linear and CubicSpline sub-testsets, same 8 pre-existing broken)
  • Coefficients match the per-point path bitwise on the existing Akima tests
  • Runic-formatted

Please ignore until reviewed by @ChrisRackauckas. Depends on SciML/FindFirstFunctions.jl#65 landing before this can merge — once that's tagged as 1.9.0, this PR's Project.toml compat is satisfied.

🤖 Generated with Claude Code

Refactors the hand-rolled sorted-batch lockstep walk in the merged Akima
batched evaluator and adds new batch evaluators for `LinearInterpolation`
and `CubicSpline` (formerly draft PR SciML#527). The new helpers route per-query
index lookups through `FindFirstFunctions.searchsortedlast!` so the
algorithm adapts to the n/m ratio automatically.

Why this matters: the previous merged Akima batched eval walked the knot
index linearly via `while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1`,
which is O(gap) per query. For sparse queries on long `t` (e.g. `n =
65 536, m = 10`) that's O(n) of wasted walking — 36 µs per batch in
practice on this CPU, dwarfing the 30 ns of useful work. The FFF
strategy dispatch ([SciML/FindFirstFunctions.jl#65]) picks `LinearScan`
for dense queries and `ExpFromLeft` / `InterpolationSearch` for sparse,
so the per-query cost stays appropriate for every regime.

Implementation:

  - `_sorted_batch_extrapolation_ok(A)` — shared predicate (None /
    Constant / Linear / Extension on each side).
  - `_find_last_interior_index(tt, …)` — O(log m) binary search to split
    `tt` into left-ext / interior / right-ext.
  - `_eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at)`
    — dispatches based on `32 * n_interior < n`:
      - dense → inline lockstep walk fused with the eval lambda
      - sparse → allocate an Int buffer, batched
        `FindFirstFunctions.searchsortedlast!`, then run the eval on the
        indexed positions.
    The 1/32 crossover was determined by the FFF bench sweep on uniform
    data; below that ratio the FFF batched call cleanly beats the linear
    walk; above it, the inline walk's tight 1-2-op-per-query inner loop
    wins.

The Linear and CubicSpline call sites hand-inline the adaptive dispatch
(rather than passing a callable through `_eval_interior_adaptive!`)
because `get_parameters` inside the per-segment callable wasn't
inlining through the closure boundary — costing ~70× on small-m sparse
cases. Akima passes a `_AkimaEvaluator` struct (its evaluator is pure
array indexing + `@evalpoly`) which inlines cleanly.

Benchmarks (Julia 1.10, BenchmarkTools `minimum`):

  | type   | n=65k,m=10 | n=65k,m=4096 | n=4k,m=4k | n=1M,m=10 |
  |--------|------------|--------------|-----------|-----------|
  | Akima  | 36 µs → 706 ns (51×) | 106 µs → 104 µs | 34 µs → 33 µs | (unmeasured) → 822 ns |
  | Linear | n/a → 737 ns         | n/a → 112 µs     | n/a → 42 µs   | n/a → 851 ns |
  | Cubic  | n/a → 809 ns         | n/a → 163 µs     | n/a → 77 µs   | n/a → 960 ns |

Where master had no batched evaluator (Linear, Cubic), the comparison is
to `map!(A, out, tt)` (the existing default), which uses the
`Guesser`-based correlated search per-query. The new fast path is
competitive at the sparse end (the existing per-query path is already
fast there via the iguesser hint) and 2–7× faster on dense batches.

Drops the previous `any(isnan, A.u)` check from `LinearInterpolation`'s
fast-path gate (added defensively in the original sorted-batch code,
but it's O(n) and was making the fast path slower than `map!` on
sparse cases). Users with NaN in `u` continue to get NaN through
normal arithmetic; the original per-point NaN special-case was for
ForwardDiff propagation through `t/t` not for user-supplied NaN data.

Project.toml: bumps FindFirstFunctions compat from `1.3` to `1.9`.

This PR depends on SciML/FindFirstFunctions.jl#65 (introduces the
`searchsortedlast!` API). Supersedes draft PRs SciML#527 (Linear +
CubicSpline batch evaluators) and partly SciML#528 (the 5 remaining types
not yet ported can follow the same pattern in a follow-up).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

The Tests CI failure is expected — every job is failing in 20-30 s with:

ERROR: LoadError: Unsatisfiable requirements detected for package FindFirstFunctions [64ca27bc]

because this PR's Project.toml bumps FindFirstFunctions = "1.9" but the new version isn't tagged on the General registry yet. Once SciML/FindFirstFunctions.jl#65 lands and gets a v1.9.0 release, these will go green.

I confirmed the actual test suite passes locally with FFF dev'd:

Test Summary: | Pass  Broken  Total     Time
Core          | 2536       8   2544  2m33s

(8 pre-existing broken, same as on master; +835 new assertions over the Linear and CubicSpline sub-testsets.)

Adds batch-evaluator fast paths for `ConstantInterpolation`,
`QuadraticInterpolation`, `QuadraticSpline`, `CubicHermiteSpline`, and
`QuinticHermiteSpline`, completing the set started by this PR (Akima /
Linear / CubicSpline). All five use the same adaptive
`_find_last_interior_index` + `_eval_interior_adaptive!`-style dispatch
established earlier: hand-rolled lockstep walk for dense queries, FFF
batched `searchsortedlast!` for sparse ones.

`ConstantInterpolation` is interesting: its per-segment work is trivial
(`u[idx]`, no arithmetic), but the hand-rolled walk still has the same
O(n) regression as the original Akima bug when queries are sparse on a
long `t`. It also has a `:left` / `:right` `dir` field that picks
different sides of a knot — the sparse path uses
`searchsortedlast`'s "last idx with t[idx] ≤ ttt" semantics directly
for `:left`, and bumps `idx` to the next knot (unless `ttt == t[idx]`)
for `:right`.

`QuadraticSpline` benefits the most from the new path *when
`cache_parameters = true`*: the uncached `get_parameters` path calls
`spline_coefficients!` per query, which is O(degree) per call. That
cost is the same in master's `map!` path — the batch path doesn't
make it worse, and with `cache_parameters = true` (one-time setup) the
n=65 536 m=4096 case goes from 156 ms to 116 µs.

Final all-types benchmark (Julia 1.10, n=65 536 m=10 — the original
"n ≫ m" bug case; cached parameters where relevant):

  | type                  | new       | master  | speedup |
  |-----------------------|-----------|---------|---------|
  | AkimaInterpolation    | 693 ns    | 36 µs   | 52×     |
  | LinearInterpolation   | 750 ns    | 655 ns  | 0.87×*  |
  | CubicSpline           | 861 ns    | 777 ns  | 0.90×*  |
  | ConstantInterpolation | 684 ns    | (slow†) | (huge)  |
  | QuadraticInterpolation| 841 ns    | (n/a‡)  | (large) |
  | QuadraticSpline       | 773 ns    | (slow†) | (huge)  |
  | CubicHermiteSpline    | 742 ns    | (n/a‡)  | (large) |
  | QuinticHermiteSpline  | 962 ns    | (n/a‡)  | (large) |

  * For Linear and CubicSpline the sparse case loses 50-100 ns
    absolute to master's `map!` (which already has the `Guesser` hint
    path), but the dense regime is 2-7× faster — see the previous
    commit's benchmark table.
  † Master's `map!` path for ConstantInterpolation and QuadraticSpline
    has the same per-query Guesser hint, so sparse n ≫ m at master is
    ~30 µs / 150 µs respectively at this `n`. The new batch path makes
    those sub-microsecond.
  ‡ No previous batched evaluator existed for QuadraticInterpolation,
    CubicHermiteSpline, or QuinticHermiteSpline; the new path is the
    first one.

Tests: 26 929 / 26 929 FFF; 2 536 / 2 544 DI Core (8 pre-existing broken).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Pushed 683ab0b2 adding the 5 remaining types (Constant, Quadratic, QuadraticSpline, CubicHermite, QuinticHermite) with the same adaptive-dispatch pattern. The PR now covers all 8 piecewise interpolation types with a sorted-batch fast path.

Final n=65 536, m=10 timings (the original n ≫ m bug case):

type time
AkimaInterpolation 693 ns (was 36 µs) — 52×
LinearInterpolation 750 ns
CubicSpline 861 ns
ConstantInterpolation 684 ns
QuadraticInterpolation 841 ns
QuadraticSpline (cached params) 773 ns
CubicHermiteSpline 742 ns
QuinticHermiteSpline 962 ns

Every type now resolves sparse n ≫ m queries in sub-microsecond time.

A couple of notes that came up while adding the rest:

  • ConstantInterpolation initially regressed back to 35 µs at n = 65 k, m = 10 when I first added it because its trivial per-segment work fooled me into thinking the hand-rolled walk was always optimal there. It isn't — the linear idx += 1 walk is still O(n) per call for sparse queries. Now adapted with the same dispatch, with extra care for the :left/:right dir semantics.

  • QuadraticSpline without cache_parameters = true is catastrophically slow (~150 ms for n=65 k, m=4096) regardless of which evaluator is used, because get_parameters calls spline_coefficients! per query — that's pre-existing in master's map! path, not something this PR introduces. With cache_parameters = true (one-time setup) the new path runs in 118 µs. Not flagged as a regression because master is at 147 ms for the same uncached config.

Tests: 2 536 / 2 544 DI Core passing locally (8 pre-existing broken, same as master).

ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/FindFirstFunctions.jl that referenced this pull request May 20, 2026
The 1.9 release shipped the new strategy-dispatched API alongside the
existing single-purpose names as thin wrappers. Cleaner long-term shape
is a single canonical API, so 2.0 removes the legacy surface:

  - **Removed**: `searchsortedfirstcorrelated`, `searchsortedlastcorrelated`
    (both `Integer` and `Guesser` overloads).
  - **Removed**: `searchsortedfirstvec`, `searchsortedlastvec` (allocating
    batched variants — callers own their output buffer and use the in-place
    `searchsortedfirst!` / `searchsortedlast!`).
  - **Made internal**: `searchsortedfirstexp` and `bracketstrictlymontonic`.
    They're still in the module as implementation helpers for `ExpFromLeft`
    and `BracketGallop` respectively, but no longer documented or exported.

  - **Kept (documented public)**: `searchsortedfirst!` / `searchsortedlast!`,
    `SearchStrategy` and all concrete strategies (`LinearScan`,
    `BracketGallop`, `ExpFromLeft`, `InterpolationSearch`, `BinaryBracket`,
    `GuesserHint`, `Auto`), `Guesser`, `looks_linear`, `findfirstequal`,
    `findfirstsortedequal`. The hint provider stays because it's
    orthogonal to the dispatch; the equality-search functions stay because
    they're a different problem.

Migration for downstream callers:

  | old                                                                | 2.0                                                            |
  |--------------------------------------------------------------------|-----------------------------------------------------------------|
  | `searchsortedfirstcorrelated(v, x, guess::Integer)`                | `searchsortedfirst(BracketGallop(), v, x, guess)`               |
  | `searchsortedlastcorrelated(v, x, guess::Integer)`                 | `searchsortedlast(BracketGallop(), v, x, guess)`                |
  | `searchsortedfirstcorrelated(v, x, ::Guesser)`                     | `searchsortedfirst(GuesserHint(g), v, x)`                       |
  | `searchsortedlastcorrelated(v, x, ::Guesser)`                      | `searchsortedlast(GuesserHint(g), v, x)`                        |
  | `searchsortedfirstvec(v, x)`                                       | `searchsortedfirst!(buf, v, x)` (caller-owned `buf`)            |
  | `searchsortedlastvec(v, x)`                                        | `searchsortedlast!(buf, v, x)`                                  |
  | `searchsortedfirstexp(v, x, lo, hi)`                               | `searchsortedfirst(ExpFromLeft(), v, x, lo)`                    |

Tests rewritten to exercise the strategy API directly. Docs index updated to
present the strategy API as canonical, with `Guesser` / `looks_linear` /
equality-search as supporting helpers.

`Project.toml` bumped to `2.0.0`.

Downstream callers in SciML that import the removed names will need
companion PRs — see SciML/DataInterpolations.jl#529 for the first one.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
FFF 2.0 (SciML/FindFirstFunctions.jl#65) drops the legacy single-purpose
names (`searchsortedfirstcorrelated`, `searchsortedlastcorrelated`, and
the allocating `*vec` batched forms) in favour of a single strategy-
dispatched API. This commit migrates DataInterpolations off the legacy
surface.

  - `src/DataInterpolations.jl`: import `FindFirstFunctions` plus `Guesser`
    only; drop the legacy correlated-lookup name imports.
  - `src/interpolation_utils.jl`: split `get_idx` into two methods. The
    `iguess::Integer` overload routes through
    `searchsortedfirst(BracketGallop(), tvec, t, iguess)` /
    `searchsortedlast(...)`. The `iguess::Guesser` overload routes through
    `searchsortedfirst(GuesserHint(iguess), tvec, t)`. Both forms preserve
    the existing `lb`/`ub_shift`/`idx_shift`/`side` keyword contract.
  - `test/derivative_tests.jl`, `test/interpolation_tests.jl`: cached-index
    tests now use `searchsortedfirst(GuesserHint(A.iguesser), A.t, t)` and
    `searchsortedfirst(BracketGallop(), func.t, _t, func.iguesser(_t))`.
  - `src/interpolation_methods.jl`: trim explanatory/update prose comments
    that grew during the batch-evaluator work. Section headers and
    structural `else  # Extension` markers retained; per-task narrative
    comments removed.
  - `Project.toml`: bump `FindFirstFunctions` compat to `2`.

All 57879 DI tests pass locally on Julia 1.11:

  Core       | 2539 pass, 5 broken (pre-existing)
  Methods    | 42151 pass
  Extensions | 13178 pass
  Misc       |    11 pass

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The previous commit migrated the sorted-batch evaluators to FFF 2.0's
strategy-dispatched `searchsortedlast!` API, but called it with the
default `strategy = Auto()`. Each such call has `Auto()` re-probe the
knot vector `A.t` for linearity / NaN / uniformity on every batched
evaluation. Because `A.t` is fixed at interpolation construction, that
probe is wasted work on every call after the first.

This commit precomputes `FindFirstFunctions.SearchProperties(A.t)`
once at cache-construction time and stores it as a new
`t_props::propsType` field on every interpolation cache struct, then
threads `FindFirstFunctions.Auto(A.t_props)` into each batched
`searchsortedlast!` call site so the probe is skipped.

Changes:

  - `src/interpolation_caches.jl`: every cache struct
    (`LinearInterpolation`, `QuadraticInterpolation`,
    `LagrangeInterpolation`, `AkimaInterpolation`,
    `ConstantInterpolation`, `SmoothedConstantInterpolation`,
    `QuadraticSpline`, `CubicSpline`, `BSplineInterpolation`,
    `BSplineApprox`, `CubicHermiteSpline`, `QuinticHermiteSpline`,
    `SmoothArcLengthInterpolation`) gains a `propsType` type
    parameter and a `t_props` field, populated by
    `FindFirstFunctions.SearchProperties(t)` in the inner constructor.
  - `src/integral_inverses.jl`: same change for
    `LinearInterpolationIntInv` and `ConstantInterpolationIntInv`.
  - `src/interpolation_methods.jl`: every
    `FindFirstFunctions.searchsortedlast!(...)` call site in the
    sorted-batch evaluators now passes
    `strategy = FindFirstFunctions.Auto(A.t_props)`.
    `_eval_interior_adaptive!` gains a `strategy` parameter so the
    Akima call site can pass the cached strategy through.

For `AbstractRange{<:Real}` knot vectors the `SearchProperties`
specialised constructor skips every probe (all properties are known
statically), so the construction cost is zero. For `Vector{Float64}`
the populated constructor runs an O(n) NaN scan once at construction
time; for integer / non-numeric eltypes only the O(1) sampled-
linearity probe runs.

Tests: all five groups pass locally on Julia 1.11:

  Core       |  2539 pass, 5 broken (pre-existing)
  Methods    | 42151 pass
  Extensions | 13178 pass
  Misc       |    11 pass
  QA         |    18 pass

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas ChrisRackauckas marked this pull request as ready for review May 21, 2026 14:13
@ChrisRackauckas ChrisRackauckas merged commit 40b581f into SciML:master May 21, 2026
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants