Refactor MIRK k_discrete from Matrix to Vector{Vector} storage by ChrisRackauckas-Claude · Pull Request #458 · SciML/BoundaryValueDiffEq.jl

ChrisRackauckas-Claude · 2026-03-25T12:20:17Z

Summary

Refactors k_discrete storage from Vector{DiffCache{Matrix{T}}} (N×stage matrix per mesh interval) to Vector{Vector{DiffCache{Vector{T}}}} (vector of per-stage DiffCache vectors). This eliminates all SubArray view allocations from K[:, r] column access in the hot collocation inner loop.

Depends on #453 (allocation fixes) — this branch includes those commits.

Before (matrix columns)

K = get_tmp(k_discrete[i], u)     # returns N×stage Matrix
f!(K[:, r], tmp, p, t)            # K[:, r] creates a SubArray view
__maybe_matmul!(tmp, K[:, 1:(r-1)], x[r, 1:(r-1)], h, T(1))  # more views

After (vector of vectors)

K_r = get_tmp(k_discrete[i][r], u)  # returns N-Vector directly
f!(K_r, tmp, p, t)                   # no view needed
# Inline matmul loop — no views
for j in 1:(r-1)
    Kⱼ = get_tmp(k_discrete[i][j], u)
    @simd ivdep for k in 1:N
        @inbounds tmp[k] += Kⱼ[k] * h * x[r, j]
    end
end

Changes

mirk.jl: k_discrete allocation uses nested vector comprehension
collocation.jl: All 5 Φ!/Φ methods rewritten with inline matmul loops
adaptivity.jl: interp_setup! and sum_stages! use VoV patterns
interpolation.jl: sum_stages! and EvalSol use VoV patterns
utils.jl: Add __resize! for Vector{Vector{DiffCache}} and __maybe_matmul! for AbstractVector{<:AbstractVector}

Results

Loss function: 0 bytes/call (verified via @timed over 100k calls)
All MIRK2/4/5/6 solvers pass for StandardBVP, TwoPointBVP, and adaptive modes

Test plan

MIRK2/4/5/6 × StandardBVP/TwoPointBVP × adaptive/non-adaptive all pass
Solutions match expected values (cos(1.0) ≈ 0.540302)
Zero per-step allocations verified
CI

🤖 Generated with Claude Code

Three key changes to reduce allocations in the hot path (loss function and Jacobian evaluation called every Newton iteration): 1. Remove Logging.with_logger(NullLogger()) wrapper from get_tmp - The wrapper allocated a closure + task-local context on EVERY call - get_tmp is called 50+ times per loss evaluation - The warning it suppressed only occurs during adaptive cache resize - This single change reduced total solve allocations by ~60% 2. Eliminate array comprehensions in MIRK loss functions - Replaced `[get_tmp(r, u) for r in residual]` with direct DiffCache passing to Φ! and recursive_flatten! - Added _maybe_get_tmp helper for Φ! to handle both DiffCache and plain array residuals - Added recursive_flatten!/recursive_flatten_twopoint! overloads for AbstractVector{<:DiffCache} 3. Fix NoDiffCacheNeeded Φ! to reuse fᵢ_cache instead of similar() - `similar(fᵢ_cache)` allocated a new vector every call - fᵢ_cache is a scratch buffer that can be used directly Results (dt=0.1, N=2, MIRK4, non-adaptive): - Total solve: 3,290 → 1,180 allocations (64% reduction) - Loss per call: 13,552 → 4,080 bytes (70% reduction) - Jacobian per call: 20,448 → 8,784 bytes (57% reduction) Adds allocation regression tests to verify loss function allocations stay bounded. Reference: https://discourse.julialang.org/t/boundaryvaluediffeq-jl-reducing-allocations/136255 Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…IRKN The `EvalSol.u[1:end] .= x` and `EvalSol.cache.k_discrete[1:end] .=` patterns create unnecessary SubArray views under @views. Using copyto! avoids the view allocation entirely. Applied to MIRK, FIRK, and MIRKN loss functions. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@timed

… test Add y_cache field to MIRKCache that pre-allocates the Vector{Vector{T}} needed by recursive_unflatten! for the primal (Float64) path. This eliminates the last 144 bytes/call from get_tmp. broadcast allocation. For the Dual path (ForwardDiff Jacobian computation), falls back to the existing broadcast since element types differ. Loss function is now fully allocation-free at runtime (0 bytes/call verified via @timed over 100k calls). Add zero-allocation QA test in the qa test group. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@timed

Change k_discrete from Vector{DiffCache{Matrix{T}}} (N×stage matrix per mesh interval) to Vector{Vector{DiffCache{Vector{T}}}} (vector of per-stage DiffCache vectors per mesh interval). This eliminates all SubArray view allocations from K[:, r] column access in the collocation inner loop. Stage vectors are now accessed directly as k_discrete[i][r] instead of via matrix column views. Changes: - mirk.jl: k_discrete allocation uses nested vector comprehension - collocation.jl: All 5 Φ!/Φ methods rewritten with inline matmul loops using per-stage get_tmp(k_discrete[i][r], u) access - adaptivity.jl: interp_setup! and sum_stages! use VoV patterns - interpolation.jl: sum_stages! and EvalSol use VoV patterns - utils.jl: Add __resize! for Vector{Vector{DiffCache}} and __maybe_matmul! for AbstractVector{<:AbstractVector} Loss function is fully allocation-free (0 bytes/call verified via @timed over 100k calls). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-25T13:42:39Z

Benchmark Results

Click to check benchmark results

	master	`d0092fb`...	master / `d0092fb`...
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK2()	0.569 ± 0.028 s	0.644 ± 0.076 s	0.883 ± 0.11
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK3()	12.1 ± 0.99 ms	22.2 ± 3.8 ms	0.546 ± 0.1
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK4()	2.41 ± 0.16 ms	2.6 ± 0.34 ms	0.928 ± 0.14
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK5()	2.88 ± 0.62 ms	2.85 ± 0.95 ms	1.01 ± 0.4
Simple Pendulum/IIP/BoundaryValueDiffEqMIRK.MIRK6()	1.18 ± 0.34 ms	1.14 ± 0.24 ms	1.04 ± 0.37
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = false)	2.58 ± 0.76 ms	3.59 ± 2.4 ms	0.719 ± 0.53
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = true)	4.73 ± 1.2 ms	5.28 ± 1.4 ms	0.896 ± 0.33
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = false)	0.067 ± 0.011 s	0.0948 ± 0.034 s	0.707 ± 0.28
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = true)	0.102 ± 0.033 s	0.202 ± 0.11 s	0.506 ± 0.32
Simple Pendulum/IIP/Shooting(Tsit5())	0.314 ± 0.1 ms	0.315 ± 0.097 ms	0.996 ± 0.45
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK2()	0.758 ± 0.055 s	0.881 ± 0.14 s	0.861 ± 0.15
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK3()	15.8 ± 7.3 ms	20.9 ± 12 ms	0.758 ± 0.55
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK4()	2.66 ± 0.22 ms	4.57 ± 1.3 ms	0.582 ± 0.18
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK5()	3.55 ± 0.42 ms	7.26 ± 1.3 ms	0.489 ± 0.1
Simple Pendulum/OOP/BoundaryValueDiffEqMIRK.MIRK6()	1.4 ± 0.29 ms	2.44 ± 1.4 ms	0.573 ± 0.36
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = false)	6.24 ± 3.7 ms	6.02 ± 3.9 ms	1.04 ± 0.91
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = true)	14.9 ± 15 ms	8.43 ± 8 ms	1.76 ± 2.5
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = false)	0.0838 ± 0.0067 s	0.14 ± 0.024 s	0.597 ± 0.11
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = true)	0.136 ± 0.027 s	0.189 ± 0.1 s	0.719 ± 0.42
Simple Pendulum/OOP/Shooting(Tsit5())	0.613 ± 0.22 ms	1.31 ± 0.23 ms	0.467 ± 0.19
time_to_load	7.45 ± 0.16 s	9.83 ± 3.5 s	0.758 ± 0.27

### Benchmark Plots A plot of the benchmark results has been uploaded as an artifact to the workflow run for this PR. Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

ChrisRackauckas and others added 4 commits March 25, 2026 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor MIRK k_discrete from Matrix to Vector{Vector} storage#458

Refactor MIRK k_discrete from Matrix to Vector{Vector} storage#458
ChrisRackauckas-Claude wants to merge 4 commits into
SciML:masterfrom
ChrisRackauckas-Claude:cr/mirk-voa-k-discrete

ChrisRackauckas-Claude commented Mar 25, 2026

Uh oh!

github-actions Bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

ChrisRackauckas-Claude commented Mar 25, 2026

Summary

Before (matrix columns)

After (vector of vectors)

Changes

Results

Test plan

Uh oh!

github-actions Bot commented Mar 25, 2026

Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants