Higher-level batched segment-eval API (follow-up to #65)

## Background

PR #65 added `searchsortedlast!` / `searchsortedfirst!` as the batched search interface — indices in, indices out. Downstream callers like SciML/DataInterpolations.jl#529 use that interface to drive per-segment polynomial evaluation. They end up writing ~80 lines of identical scaffolding around each per-type evaluator:

```julia
function _<type>_eval_sorted!(out, A, tt)
    # left extrapolation block (type-specific)
    ...
    i_first_interior = i
    i_last_interior = _find_last_interior_index(tt, i_first_interior, m, tn)
    n_interior = i_last_interior - i_first_interior + 1
    if n_interior > 0
        if 32 * n_interior < n
            idx_buf = Vector{Int}(undef, n_interior)
            FindFirstFunctions.searchsortedlast!(idx_buf, t, view(tt, i_first_interior:i_last_interior))
            @inbounds for k in 1:n_interior
                idx = clamp_to_segment(idx_buf[k])
                j = i_first_interior + k - 1
                # >>> type-specific polynomial eval on (idx, tt[j]) <<<
                out[j] = ...
            end
        else
            let i_local = i_first_interior, idx = 1
                @inbounds while i_local <= i_last_interior
                    ttt = tt[i_local]
                    while idx < n - 1 && ttt > t[idx + 1]
                        idx += 1
                    end
                    # >>> type-specific polynomial eval on (idx, ttt) <<<
                    out[i_local] = ...
                    i_local += 1
                end
            end
        end
    end
    i = i_last_interior + 1
    # right extrapolation block (type-specific)
    ...
end
```

Across 8 piecewise interpolation types in #529 the only thing that varies is the two `>>> ... <<<` blocks (the polynomial eval) and the extrapolation handling. The rest — boundary detection, adaptive dispatch, dense/sparse selection — is identical.

## Proposal

Lift the adaptive dispatch into FFF so consumers don't re-implement it.

### Shape

```julia
\"\"\"
    segment_apply!(f, out, v, queries; strategy = Auto())

Walk `queries` in lockstep with `v`'s segments and apply `f(idx, j, q)` for
each query `q = queries[j]` whose value lands in segment `idx`
(i.e. `v[idx] ≤ q < v[idx+1]`, with `idx` clamped to `firstindex(v):lastindex(v)-1`).
`out[j]` is the natural place to write the result.

For sorted `queries` on a long `v` (the `n ≫ m` regime), this routes through
[`searchsortedlast!`](@ref) so each query is located in O(log gap) rather
than O(gap). For dense queries it walks `v` in a tight inline loop.

Queries outside `[v[1], v[end]]` are *not* visited by `f` — the caller is
responsible for handling extrapolation before/after the call. (Equivalently:
`f` is only called for interior queries.) Returns `(i_first_interior, i_last_interior)`
so the caller can slot extrapolation into the right ranges.
\"\"\"
function segment_apply!(f::F, out, v, queries; strategy = Auto()) where {F}
    ...
end
```

A consumer (DataInterpolations style) would then write:

```julia
function _linear_eval_sorted!(out, A, tt)
    # ... left extrapolation, sets i to first interior index
    i_first, i_last = segment_apply!(out, A.t, tt) do idx, j, q
        slope = get_parameters(A, idx)
        @inbounds A.u[idx] + slope * (q - A.t[idx])
    end
    # ... right extrapolation from i_last + 1
end
```

That drops ~50 lines of scaffolding per type.

### The inline footgun

While prototyping the helper in #529 I tried this shape:

```julia
function _eval_interior_adaptive!(out, t, tt, ..., eval_at::F) where {F}
    # ... fill idx_buf via searchsortedlast! ...
    for k in 1:n_interior
        out[j] = eval_at(idx, q)
    end
end
```

For `AkimaInterpolation` (evaluator is pure array indexing + `@evalpoly`) it inlined perfectly and matched the hand-inlined version. For `LinearInterpolation`, whose `eval_at` calls `get_parameters(A, idx)` inside, Julia did **not** inline through the closure and I measured a **~70× slowdown** on small-m sparse cases (700 ns → 44 µs at n=65 536, m=10). I ended up hand-inlining for every type except Akima.

Concrete data:

| eval shape                          | n=65k, m=10 |
|-------------------------------------|-------------|
| hand-inlined (no helper, no closure) | 700 ns      |
| via `eval_at::F` closure helper      | 44 µs       |

So the naive 'pass a callback' API will appear to work, then bite anyone whose per-segment evaluator does anything non-trivial (calls into a polymorphic helper, accesses a captured struct field through more than one layer, etc.).

### Two API designs that side-step the footgun

**Option A — strategy struct + multiple-dispatch contract.** Callers define `evaluate_segment(strategy, idx, q)::U` on their own type, the helper dispatches on it:

```julia
abstract type SegmentEvaluator end

function segment_apply!(eval::SegmentEvaluator, out, v, queries; ...) end

# Caller side:
struct LinearEval{A}; A::A; end
@inline evaluate_segment(e::LinearEval, idx, q) = e.A.u[idx] + get_parameters(e.A, idx) * (q - e.A.t[idx])

segment_apply!(LinearEval(A), out, A.t, tt)
```

This dispatches through a method, not a closure, which gives Julia a much better shot at inlining. I tried this for Akima in #529 and it worked. The downside is some ceremony (one struct per type) and it still failed for me on Linear — needs deeper investigation into whether the issue was the closure or `get_parameters` itself.

**Option B — `@segment_apply` macro.** Splice the body inline so there's no callable boundary:

```julia
@segment_apply!(out, A.t, tt) do idx, j, q
    out[j] = A.u[idx] + get_parameters(A, idx) * (q - A.t[idx])
end
```

The macro expands to the same scaffolding code, with the body literally pasted in place of the inner loop. Inlining is guaranteed because there's no function call. Trade-off: macro debuggability (stack traces point at the expansion site, errors inside the body get harder to locate) and the usual macro hygiene concerns.

## Concrete asks

If we agree this is worth pursuing, the work would be:

1. Decide between Option A (struct + multiple dispatch) and Option B (macro). Probably worth prototyping both against a representative DI type (LinearInterpolation with `get_parameters`) and measuring before committing.
2. Implement the chosen variant in FFF as a new exported function (or macro).
3. Add tests covering: dense vs sparse query patterns; correct extrapolation-range reporting; preservation of the `Auto`-strategy gains from #65.
4. (Out of scope for FFF but worth tracking) port DataInterpolations.jl to the new API once it ships — drops ~50 lines × 8 types = ~400 lines of duplication.

Not blocking the merge of #65; this is a quality-of-life improvement for downstream callers.

cc @ChrisRackauckas — flagged as a follow-up while we were finishing #65 and SciML/DataInterpolations.jl#529.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Higher-level batched segment-eval API (follow-up to #65) #66

Background

Proposal

Shape

The inline footgun

Two API designs that side-step the footgun

Concrete asks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

eval shape	n=65k, m=10
hand-inlined (no helper, no closure)	700 ns
via `eval_at::F` closure helper	44 µs

Uh oh!

Higher-level batched segment-eval API (follow-up to #65) #66

Description

Background

Proposal

Shape

The inline footgun

Two API designs that side-step the footgun

Concrete asks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions