Skip to content

Higher-level batched segment-eval API (follow-up to #65) #66

@ChrisRackauckas-Claude

Description

@ChrisRackauckas-Claude

Background

PR #65 added searchsortedlast! / searchsortedfirst! as the batched search interface — indices in, indices out. Downstream callers like SciML/DataInterpolations.jl#529 use that interface to drive per-segment polynomial evaluation. They end up writing ~80 lines of identical scaffolding around each per-type evaluator:

function _<type>_eval_sorted!(out, A, tt)
    # left extrapolation block (type-specific)
    ...
    i_first_interior = i
    i_last_interior = _find_last_interior_index(tt, i_first_interior, m, tn)
    n_interior = i_last_interior - i_first_interior + 1
    if n_interior > 0
        if 32 * n_interior < n
            idx_buf = Vector{Int}(undef, n_interior)
            FindFirstFunctions.searchsortedlast!(idx_buf, t, view(tt, i_first_interior:i_last_interior))
            @inbounds for k in 1:n_interior
                idx = clamp_to_segment(idx_buf[k])
                j = i_first_interior + k - 1
                # >>> type-specific polynomial eval on (idx, tt[j]) <<<
                out[j] = ...
            end
        else
            let i_local = i_first_interior, idx = 1
                @inbounds while i_local <= i_last_interior
                    ttt = tt[i_local]
                    while idx < n - 1 && ttt > t[idx + 1]
                        idx += 1
                    end
                    # >>> type-specific polynomial eval on (idx, ttt) <<<
                    out[i_local] = ...
                    i_local += 1
                end
            end
        end
    end
    i = i_last_interior + 1
    # right extrapolation block (type-specific)
    ...
end

Across 8 piecewise interpolation types in #529 the only thing that varies is the two >>> ... <<< blocks (the polynomial eval) and the extrapolation handling. The rest — boundary detection, adaptive dispatch, dense/sparse selection — is identical.

Proposal

Lift the adaptive dispatch into FFF so consumers don't re-implement it.

Shape

\"\"\"
    segment_apply!(f, out, v, queries; strategy = Auto())

Walk `queries` in lockstep with `v`'s segments and apply `f(idx, j, q)` for
each query `q = queries[j]` whose value lands in segment `idx`
(i.e. `v[idx] ≤ q < v[idx+1]`, with `idx` clamped to `firstindex(v):lastindex(v)-1`).
`out[j]` is the natural place to write the result.

For sorted `queries` on a long `v` (the `n ≫ m` regime), this routes through
[`searchsortedlast!`](@ref) so each query is located in O(log gap) rather
than O(gap). For dense queries it walks `v` in a tight inline loop.

Queries outside `[v[1], v[end]]` are *not* visited by `f` — the caller is
responsible for handling extrapolation before/after the call. (Equivalently:
`f` is only called for interior queries.) Returns `(i_first_interior, i_last_interior)`
so the caller can slot extrapolation into the right ranges.
\"\"\"
function segment_apply!(f::F, out, v, queries; strategy = Auto()) where {F}
    ...
end

A consumer (DataInterpolations style) would then write:

function _linear_eval_sorted!(out, A, tt)
    # ... left extrapolation, sets i to first interior index
    i_first, i_last = segment_apply!(out, A.t, tt) do idx, j, q
        slope = get_parameters(A, idx)
        @inbounds A.u[idx] + slope * (q - A.t[idx])
    end
    # ... right extrapolation from i_last + 1
end

That drops ~50 lines of scaffolding per type.

The inline footgun

While prototyping the helper in #529 I tried this shape:

function _eval_interior_adaptive!(out, t, tt, ..., eval_at::F) where {F}
    # ... fill idx_buf via searchsortedlast! ...
    for k in 1:n_interior
        out[j] = eval_at(idx, q)
    end
end

For AkimaInterpolation (evaluator is pure array indexing + @evalpoly) it inlined perfectly and matched the hand-inlined version. For LinearInterpolation, whose eval_at calls get_parameters(A, idx) inside, Julia did not inline through the closure and I measured a ~70× slowdown on small-m sparse cases (700 ns → 44 µs at n=65 536, m=10). I ended up hand-inlining for every type except Akima.

Concrete data:

eval shape n=65k, m=10
hand-inlined (no helper, no closure) 700 ns
via eval_at::F closure helper 44 µs

So the naive 'pass a callback' API will appear to work, then bite anyone whose per-segment evaluator does anything non-trivial (calls into a polymorphic helper, accesses a captured struct field through more than one layer, etc.).

Two API designs that side-step the footgun

Option A — strategy struct + multiple-dispatch contract. Callers define evaluate_segment(strategy, idx, q)::U on their own type, the helper dispatches on it:

abstract type SegmentEvaluator end

function segment_apply!(eval::SegmentEvaluator, out, v, queries; ...) end

# Caller side:
struct LinearEval{A}; A::A; end
@inline evaluate_segment(e::LinearEval, idx, q) = e.A.u[idx] + get_parameters(e.A, idx) * (q - e.A.t[idx])

segment_apply!(LinearEval(A), out, A.t, tt)

This dispatches through a method, not a closure, which gives Julia a much better shot at inlining. I tried this for Akima in #529 and it worked. The downside is some ceremony (one struct per type) and it still failed for me on Linear — needs deeper investigation into whether the issue was the closure or get_parameters itself.

Option B — @segment_apply macro. Splice the body inline so there's no callable boundary:

@segment_apply!(out, A.t, tt) do idx, j, q
    out[j] = A.u[idx] + get_parameters(A, idx) * (q - A.t[idx])
end

The macro expands to the same scaffolding code, with the body literally pasted in place of the inner loop. Inlining is guaranteed because there's no function call. Trade-off: macro debuggability (stack traces point at the expansion site, errors inside the body get harder to locate) and the usual macro hygiene concerns.

Concrete asks

If we agree this is worth pursuing, the work would be:

  1. Decide between Option A (struct + multiple dispatch) and Option B (macro). Probably worth prototyping both against a representative DI type (LinearInterpolation with get_parameters) and measuring before committing.
  2. Implement the chosen variant in FFF as a new exported function (or macro).
  3. Add tests covering: dense vs sparse query patterns; correct extrapolation-range reporting; preservation of the Auto-strategy gains from Add SearchStrategy dispatch + in-place batched searchsorted! #65.
  4. (Out of scope for FFF but worth tracking) port DataInterpolations.jl to the new API once it ships — drops ~50 lines × 8 types = ~400 lines of duplication.

Not blocking the merge of #65; this is a quality-of-life improvement for downstream callers.

cc @ChrisRackauckas — flagged as a follow-up while we were finishing #65 and SciML/DataInterpolations.jl#529.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions