Skip to content

Optimize AkimaInterpolation constructor (3.6-4.1x) and add sorted-batch evaluator (8-9x)#526

Merged
ChrisRackauckas merged 4 commits into
SciML:masterfrom
ChrisRackauckas-Claude:akima-perf-opts
May 15, 2026
Merged

Optimize AkimaInterpolation constructor (3.6-4.1x) and add sorted-batch evaluator (8-9x)#526
ChrisRackauckas merged 4 commits into
SciML:masterfrom
ChrisRackauckas-Claude:akima-perf-opts

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Summary

Two AkimaInterpolation-specific perf improvements that together capture the speed wins from #524, applied to both standard Akima and the modified (makima) formula added in #525:

  1. Scalar in-place constructor kernel _akima_init!. Replaces the vectorized diff, abs.+, findall, and broadcasted-divide intermediates with a single pass over a length-(n+3) divided-difference buffer. The struct layout is unchanged.
  2. Sorted-batch callable (A::AkimaInterpolation)(out::AbstractVector, tt::AbstractVector). When tt is sorted and the extrapolation modes are simple (None, Constant, Linear, Extension), walks the knots and queries in lockstep in O(n+m) instead of running a per-query binary search. Falls back to map! when tt is unsorted or the extrapolation requires Periodic/Reflective wrapping.

The scalar _interpolate, derivative, integral, AD, and plotting paths are unchanged. Coefficients agree with the previous implementation to floating-point round-off (covered by the existing tests).

Stacking

This PR is stacked on top of #525. It branches off makima-via-akima-kwarg and the optimized constructor handles both the standard Akima and modified = true paths. If #525 is reviewed first I'll rebase this onto master once it lands. If reviewers prefer this lands first, that's also fine — I can drop the modified branch in _akima_init! and the change to #525 becomes a trivial follow-up.

Benchmarks

Julia 1.10, BenchmarkTools minimum:

Constructor before after speedup allocs
n=32 2.95 μs 0.81 μs 3.6x 36 → 7
n=256 14.97 μs 3.63 μs 4.1x 37 → 7
n=4096 136 μs 37.5 μs 3.6x 66 → 11
Batched eval (sorted) before after speedup
n=32, m=512 15.84 μs 1.75 μs 9.0x
n=32, m=4096 119 μs 13.5 μs 8.8x
n=256, m=512 18.41 μs 2.30 μs 8.0x
n=256, m=4096 124 μs 14.36 μs 8.6x

Memory reduction in the constructor: ~5-9x smaller working set (9.4 KB → 1.3 KB at n=32; 934 KB → 128 KB at n=4096).

These don't quite reach the 12-16x in #524's benchmark figure — the remaining gap is from #524 also fusing the divided-difference buffer into a sliding window (no m allocation), which I left as a buffer for clarity. The arithmetic cost of recomputing slopes twice (once for wmax, once for b) was a worse trade than keeping the single allocation.

What is in this PR

  • src/interpolation_caches.jl: replaces the body of the AkimaInterpolation constructor with a call to _akima_init!(b, c, d, u, t, Val(modified)). The new helper does a two-pass scalar sweep: first pass computes the maximum weight to set the small-weight cutoff (preserves the existing f12 > 1e-9 * maximum(f12) semantics exactly), second pass writes the b, c, d arrays in place.
  • src/interpolation_methods.jl: adds the (A::AkimaInterpolation{<:AbstractVector})(out, tt) callable, a _akima_eval_fast_applicable predicate, and the inner _akima_eval_sorted! that handles each extrapolation type explicitly in three regions (left ext, interior, right ext).
  • test/interpolation_tests.jl: new "Sorted-batch evaluator" sub-testset covering both akima and makima paths, all four fast-path extrapolation modes paired left × right, the Periodic/Reflective fallback, the None throwing behavior on out-of-range sorted queries, and the unsorted-input fallback.

Test plan

  • CI passes (full matrix: 1 × lts × pre, Core/Methods/Extensions/Misc/QA)
  • Pkg.test() all groups pass locally on Julia 1.10 (Core 1701 / Methods 42151 / Extensions 13178 / Misc 11)
  • Constructor coefficients match the prior vectorized implementation on the existing Akima testset
  • Sorted-batch output matches per-point output on randomized sorted/unsorted query sets
  • Runic-formatted

Please ignore until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

Adds a `modified::Bool = false` keyword argument to `AkimaInterpolation`
that switches the slope computation at the knots to the modified Akima
(makima) formulation introduced by MathWorks
(https://blogs.mathworks.com/cleve/2019/04/29/makima-piecewise-cubic-interpolation/).

This is the same data as PR SciML#524 proposed adding via a separate set of
`makima_*` helper functions, but integrated through the existing
`AkimaInterpolation` type instead. The struct, evaluation, derivative,
and integral paths are unchanged — only the formula for the slope
coefficients `b` differs when `modified = true`. A simple-average
fallback parallel to the original Akima protects the rare case where
all four neighboring slopes vanish.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The CI's spell checker flags the abbreviated `A_mak` as a likely typo
(suggests `make`/`mask`). Use the full word to make the test names
explicit and pass the check.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The MathWorks blog returns 403 to the Documenter linkchecker (bot
filtering), even though the URL is reachable in a browser. Add it to
linkcheck_ignore alongside the existing moler/interp.pdf entry so the
docs build doesn't fail on this PR's added docstring reference.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Two AkimaInterpolation-specific performance improvements that together
capture the speed wins from PR SciML#524, applied to both the standard Akima
and the modified (makima) formula:

1. Scalar in-place constructor kernel `_akima_init!`. Replaces the
   vectorized `diff`, `abs.+`, `findall`, and broadcasted-divide
   intermediates with a single pass over a length-(n+3) divided-difference
   buffer, dropping the constructor from ~36 allocations to 7 and giving
   a 3.6-4.1x wall-time speedup across n = 32..4096.

2. Sorted-batch callable
   `(A::AkimaInterpolation)(out::AbstractVector, tt::AbstractVector)`.
   When `tt` is sorted and the extrapolation modes are simple (None,
   Constant, Linear, Extension), walks the knots and queries in lockstep
   in O(n+m) instead of running a per-query binary search via the
   iguesser, giving an 8-9x speedup on batched evaluation. Falls back to
   `map!` when `tt` is unsorted or the extrapolation requires
   Periodic/Reflective wrapping.

The struct layout, scalar `_interpolate`, derivative, integral, AD, and
plotting paths are unchanged. Coefficients agree with the previous
implementation to floating-point round-off.

Benchmarks (Julia 1.10, n knots, m sorted queries):

  Constructor   before        after       speedup
  n=32          2.95us 36a    0.81us 7a   3.6x
  n=256        14.97us 37a    3.63us 7a   4.1x
  n=4096      136.23us 66a   37.50us 11a  3.6x

  Batched eval  before        after       speedup
  n=32, m=512  15.84us       1.75us       9.0x
  n=32, m=4096 119.32us     13.50us       8.8x
  n=256, m=512 18.41us       2.30us       8.0x
  n=256, m=4096 124.18us    14.36us       8.6x

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants