You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #65 added searchsortedlast! / searchsortedfirst! as the batched search interface — indices in, indices out. Downstream callers like SciML/DataInterpolations.jl#529 use that interface to drive per-segment polynomial evaluation. They end up writing ~80 lines of identical scaffolding around each per-type evaluator:
function _<type>_eval_sorted!(out, A, tt)
# left extrapolation block (type-specific)...
i_first_interior = i
i_last_interior =_find_last_interior_index(tt, i_first_interior, m, tn)
n_interior = i_last_interior - i_first_interior +1if n_interior >0if32* n_interior < n
idx_buf =Vector{Int}(undef, n_interior)
FindFirstFunctions.searchsortedlast!(idx_buf, t, view(tt, i_first_interior:i_last_interior))
@inboundsfor k in1:n_interior
idx =clamp_to_segment(idx_buf[k])
j = i_first_interior + k -1# >>> type-specific polynomial eval on (idx, tt[j]) <<<
out[j] =...endelselet i_local = i_first_interior, idx =1@inboundswhile i_local <= i_last_interior
ttt = tt[i_local]
while idx < n -1&& ttt > t[idx +1]
idx +=1end# >>> type-specific polynomial eval on (idx, ttt) <<<
out[i_local] =...
i_local +=1endendendend
i = i_last_interior +1# right extrapolation block (type-specific)...end
Across 8 piecewise interpolation types in #529 the only thing that varies is the two >>> ... <<< blocks (the polynomial eval) and the extrapolation handling. The rest — boundary detection, adaptive dispatch, dense/sparse selection — is identical.
Proposal
Lift the adaptive dispatch into FFF so consumers don't re-implement it.
Shape
\"\"\" segment_apply!(f, out, v, queries; strategy = Auto())Walk `queries` in lockstep with `v`'s segments and apply `f(idx, j, q)` foreach query `q = queries[j]` whose value lands in segment `idx`(i.e. `v[idx] ≤ q < v[idx+1]`, with `idx` clamped to `firstindex(v):lastindex(v)-1`).`out[j]` is the natural place to write the result.For sorted `queries` on a long `v` (the `n ≫ m` regime), this routes through[`searchsortedlast!`](@ref) so each query is located in O(log gap) ratherthan O(gap). For dense queries it walks `v` in a tight inline loop.Queries outside `[v[1], v[end]]` are *not* visited by `f` — the caller isresponsible for handling extrapolation before/after the call. (Equivalently:`f` is only called for interior queries.) Returns `(i_first_interior, i_last_interior)`so the caller can slot extrapolation into the right ranges.\"\"\"function segment_apply!(f::F, out, v, queries; strategy = Auto()) where {F} ...end
A consumer (DataInterpolations style) would then write:
function_linear_eval_sorted!(out, A, tt)
# ... left extrapolation, sets i to first interior index
i_first, i_last =segment_apply!(out, A.t, tt) do idx, j, q
slope =get_parameters(A, idx)
@inbounds A.u[idx] + slope * (q - A.t[idx])
end# ... right extrapolation from i_last + 1end
That drops ~50 lines of scaffolding per type.
The inline footgun
While prototyping the helper in #529 I tried this shape:
function_eval_interior_adaptive!(out, t, tt, ..., eval_at::F) where {F}
# ... fill idx_buf via searchsortedlast! ...for k in1:n_interior
out[j] =eval_at(idx, q)
endend
For AkimaInterpolation (evaluator is pure array indexing + @evalpoly) it inlined perfectly and matched the hand-inlined version. For LinearInterpolation, whose eval_at calls get_parameters(A, idx) inside, Julia did not inline through the closure and I measured a ~70× slowdown on small-m sparse cases (700 ns → 44 µs at n=65 536, m=10). I ended up hand-inlining for every type except Akima.
Concrete data:
eval shape
n=65k, m=10
hand-inlined (no helper, no closure)
700 ns
via eval_at::F closure helper
44 µs
So the naive 'pass a callback' API will appear to work, then bite anyone whose per-segment evaluator does anything non-trivial (calls into a polymorphic helper, accesses a captured struct field through more than one layer, etc.).
Two API designs that side-step the footgun
Option A — strategy struct + multiple-dispatch contract. Callers define evaluate_segment(strategy, idx, q)::U on their own type, the helper dispatches on it:
This dispatches through a method, not a closure, which gives Julia a much better shot at inlining. I tried this for Akima in #529 and it worked. The downside is some ceremony (one struct per type) and it still failed for me on Linear — needs deeper investigation into whether the issue was the closure or get_parameters itself.
Option B — @segment_apply macro. Splice the body inline so there's no callable boundary:
@segment_apply!(out, A.t, tt) do idx, j, q
out[j] = A.u[idx] +get_parameters(A, idx) * (q - A.t[idx])
end
The macro expands to the same scaffolding code, with the body literally pasted in place of the inner loop. Inlining is guaranteed because there's no function call. Trade-off: macro debuggability (stack traces point at the expansion site, errors inside the body get harder to locate) and the usual macro hygiene concerns.
Concrete asks
If we agree this is worth pursuing, the work would be:
Decide between Option A (struct + multiple dispatch) and Option B (macro). Probably worth prototyping both against a representative DI type (LinearInterpolation with get_parameters) and measuring before committing.
Implement the chosen variant in FFF as a new exported function (or macro).
(Out of scope for FFF but worth tracking) port DataInterpolations.jl to the new API once it ships — drops ~50 lines × 8 types = ~400 lines of duplication.
Not blocking the merge of #65; this is a quality-of-life improvement for downstream callers.
Background
PR #65 added
searchsortedlast!/searchsortedfirst!as the batched search interface — indices in, indices out. Downstream callers like SciML/DataInterpolations.jl#529 use that interface to drive per-segment polynomial evaluation. They end up writing ~80 lines of identical scaffolding around each per-type evaluator:Across 8 piecewise interpolation types in #529 the only thing that varies is the two
>>> ... <<<blocks (the polynomial eval) and the extrapolation handling. The rest — boundary detection, adaptive dispatch, dense/sparse selection — is identical.Proposal
Lift the adaptive dispatch into FFF so consumers don't re-implement it.
Shape
A consumer (DataInterpolations style) would then write:
That drops ~50 lines of scaffolding per type.
The inline footgun
While prototyping the helper in #529 I tried this shape:
For
AkimaInterpolation(evaluator is pure array indexing +@evalpoly) it inlined perfectly and matched the hand-inlined version. ForLinearInterpolation, whoseeval_atcallsget_parameters(A, idx)inside, Julia did not inline through the closure and I measured a ~70× slowdown on small-m sparse cases (700 ns → 44 µs at n=65 536, m=10). I ended up hand-inlining for every type except Akima.Concrete data:
eval_at::Fclosure helperSo the naive 'pass a callback' API will appear to work, then bite anyone whose per-segment evaluator does anything non-trivial (calls into a polymorphic helper, accesses a captured struct field through more than one layer, etc.).
Two API designs that side-step the footgun
Option A — strategy struct + multiple-dispatch contract. Callers define
evaluate_segment(strategy, idx, q)::Uon their own type, the helper dispatches on it:This dispatches through a method, not a closure, which gives Julia a much better shot at inlining. I tried this for Akima in #529 and it worked. The downside is some ceremony (one struct per type) and it still failed for me on Linear — needs deeper investigation into whether the issue was the closure or
get_parametersitself.Option B —
@segment_applymacro. Splice the body inline so there's no callable boundary:The macro expands to the same scaffolding code, with the body literally pasted in place of the inner loop. Inlining is guaranteed because there's no function call. Trade-off: macro debuggability (stack traces point at the expansion site, errors inside the body get harder to locate) and the usual macro hygiene concerns.
Concrete asks
If we agree this is worth pursuing, the work would be:
get_parameters) and measuring before committing.Auto-strategy gains from Add SearchStrategy dispatch + in-place batched searchsorted! #65.Not blocking the merge of #65; this is a quality-of-life improvement for downstream callers.
cc @ChrisRackauckas — flagged as a follow-up while we were finishing #65 and SciML/DataInterpolations.jl#529.