Skip to content

CompatHelper: bump compat for FastBroadcast to 1, (keep existing compat)#192

Merged
ranocha merged 4 commits into
mainfrom
compathelper/new_version/2026-03-31-00-44-53-208-03761993723
Apr 22, 2026
Merged

CompatHelper: bump compat for FastBroadcast to 1, (keep existing compat)#192
ranocha merged 4 commits into
mainfrom
compathelper/new_version/2026-03-31-00-44-53-208-03761993723

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

This pull request changes the compat entry for the FastBroadcast package from 0.3.5 to 0.3.5, 1.
This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry.
It is your responsibility to make sure that your package tests pass before you merge this pull request.

@SKopecz SKopecz force-pushed the compathelper/new_version/2026-03-31-00-44-53-208-03761993723 branch from 4df2438 to f2b9de0 Compare March 31, 2026 00:44
@codecov

codecov Bot commented Mar 31, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coveralls

coveralls commented Mar 31, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 24713319125

Coverage remained the same at 97.305%

Details

  • Coverage remained the same as the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • No coverage regressions found.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 1707
Covered Lines: 1661
Line Coverage: 97.31%
Coverage Strength: 113406360.48 hits per line

💛 - Coveralls

@ranocha ranocha left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshuaLampert

Copy link
Copy Markdown
Member

This requires oxfordcontrol/Clarabel.jl#218.

@JoshuaLampert

Copy link
Copy Markdown
Member

CI is using FastBroadcast.jl v3 now. However, there are a few failing tests. They do not seem to be related to the update of FastBroadcast.jl though, but rather look like something changed in the OrdinaryDiffEq*.jl packages.

@JoshuaLampert

Copy link
Copy Markdown
Member

I have reduced the issue (for at least three of the five test failures) to the following MWE:

using PositiveIntegrators, OrdinaryDiffEqLowOrderRK
prod1! = (P, u, p, t) -> begin
    P[1, 1] = 0
    P[1, 2] = u[2]
    P[2, 1] = u[1]
    P[2, 2] = 0
    return nothing
end
dest1! = (D, u, p, t) -> begin
    fill!(D, 0)
    return nothing
end
u0 = [1.0, 0.0]
tspan = (0.0, 1.0)
prob_default = PDSProblem(prod1!, dest1!, u0, tspan)
solve(prob_default, Euler(); dt = 0.1)

With OrdinaryDiffEqCore.jl v3.11 this returns

retcode: Success
Interpolation: 3rd order Hermite
t: 11-element Vector{Float64}:
 0.0
 0.1
 0.2
 0.30000000000000004
 0.4
 0.5
 0.6
 0.7
 0.7999999999999999
 0.8999999999999999
 1.0
u: 11-element Vector{Vector{Float64}}:
 [1.0, 0.0]
 [0.9, 0.1]
 [0.8200000000000001, 0.18000000000000002]
 [0.756, 0.24400000000000002]
 [0.7048, 0.2952]
 [0.66384, 0.33616]
 [0.631072, 0.36892800000000003]
 [0.6048576, 0.3951424]
 [0.58388608, 0.41611392]
 [0.5671088639999999, 0.432891136]
 [0.5536870911999999, 0.4463129088]

and with OrdinaryDiffEqCore.jl v3.12 and newer it returns

retcode: Success
Interpolation: 3rd order Hermite
t: 12-element Vector{Float64}:
 0.0
 0.1
 0.2
 0.30000000000000004
 0.4
 0.5
 0.6
 0.7
 0.7999999999999999
 0.8999999999999999
 0.9999999999999999
 1.0
u: 12-element Vector{Vector{Float64}}:
 [1.0, 0.0]
 [0.9, 0.1]
 [0.8200000000000001, 0.18000000000000002]
 [0.756, 0.24400000000000002]
 [0.7048, 0.2952]
 [0.66384, 0.33616]
 [0.631072, 0.36892800000000003]
 [0.6048576, 0.3951424]
 [0.58388608, 0.41611392]
 [0.5671088639999999, 0.432891136]
 [0.5536870911999999, 0.4463129088]
 [0.5536870911999999, 0.4463129088]

i.e., there is an additional (unexpected) time step at 0.9999999999999999. Since OrdinaryDiffEqCore.jl is part of a monorepo, which does not have tags and releases for the sublibraries it is not so easy (I don't know of an easy way) to see the diff between OrdinaryDiffEqCore.jl v3.11 and v3.12 or the PRs, which were part of the release v3.12. The downstream tests in OrdinaryDiffEq.jl for PositiveIntegrators.jl are also not really useful because they usually don't run because of incompatibilities (like https://github.com/SciML/OrdinaryDiffEq.jl/actions/runs/24119695833/job/70370910063#step:6:19).
So to conclude the test failures are independent of this PR and only surfaced now because incompatibilities were resolved, such that for the first time we run tests with newer versions of OrdinaryDiffEqCore.jl.
@ChrisRackauckas, can you help here?

@ChrisRackauckas

Copy link
Copy Markdown
Contributor

oh, that's probably related to SciML/OrdinaryDiffEq.jl#2869. We made a change in order to make sure we very clearly hit every floating point exactly for superdense time, so multiple tstops at the same time and eps apart are handled well (this is required for some types of multi-event scenarios). But yeah this looks like an odd side effect of that. I think the best solution is to simply expand tstops to tspan[1]:dt:tspan[2] here, since Julia's range semantics has some complex logic that makes that take into account floating point error and give accurate tstops. We were relying on a bit of a hack before that tspan[1] + dt + dt + dt + dt + ... if it gets within 100eps(tstop) that it would snap, but that hack was removed (and there are ways to break it, just by doing this beyond 1e6 steps), so this should be strictly more robust. Let me turn this into a test case and get this in. I think this wasn't caught because there's tests that sol.t[end] is floating point exact, but not that there's a small step in the end.

ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/OrdinaryDiffEq.jl that referenced this pull request Apr 15, 2026
When solving with a fixed-dt method (e.g. `solve(prob, Euler(); dt = 0.1)`),
the accumulated `t + dt + dt + ...` drifts and used to produce a spurious
trailing step at `1 - eps` followed by `1.0`. PR SciML#2869 removed the
`100eps(tstop)` snap hack that previously masked this in
`fixed_t_for_floatingpoint_error!`.

Two coordinated changes restore the expected 11-step result without
reintroducing the old hack:

1. `_ode_init` expands `tstops` to `tspan[1]:dt:tspan[end]` when the user
   specifies `dt` for a fixed-time method and did not supply their own
   `tstops`/`d_discontinuities`. Julia's range semantics use TwicePrecision
   internally, so each range element is the exact floating-point
   representative of `k*dt` and lands on `tspan[end]` cleanly. The
   expansion is skipped whenever the user supplied any tstops, which
   preserves the existing "continue at dtcache between user tstops"
   behavior that tests like `sol.t == [0, 1/3, 1/2, 5/6, 1]` depend on.

2. `modify_dt_for_tstops!` compares `dt` against `distance_to_tstop` with
   a small floating-point tolerance (`100 * eps(max(t, tstop))`). Without
   this tolerance, `dtcache = 0.1` reads as strictly less than a
   `distance_to_tstop` of `0.10000000000000009` and the integrator takes
   a full step that overshoots the tstop by one ulp, then takes a
   matching tiny corrective step. The tolerance guard is gated on
   `isfinite(tdir_tstop) && isfinite(integrator.t)` so that semi-infinite
   `tspan = (0.0, Inf)` integrations still work.

Adds a regression test covering forward, reverse, and non-evenly-dividing
`dt` on `tspan = (0.0, 1.0)`.

Fixes the PositiveIntegrators.jl CI failure noted in
NumericalMathematics/PositiveIntegrators.jl#192 where
`solve(prob, Euler(); dt = 0.1)` began returning 12 steps instead of 11.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/OrdinaryDiffEq.jl that referenced this pull request Apr 16, 2026
When solving with a fixed-dt method (e.g. `solve(prob, Euler(); dt = 0.1)`),
the accumulated `t + dt + dt + ...` drifts past `tspan[end]` by one ulp,
producing a spurious trailing micro-step. PR SciML#2869 removed the
`100eps(tstop)` snap hack that previously masked this.

Fix: for non-adaptive, dtchangeable algorithms with no user-supplied
tstops / d_discontinuities / callbacks, expand `tstops` to the range
`tspan[1]:dt:tspan[end]` whose TwicePrecision arithmetic gives exact
floating-point tstops, and inflate `dt` by 10 ulps so that
`modify_dt_for_tstops!` always takes the tstop branch and snaps `t`
exactly via `fixed_t_for_tstop_error!`.

The expansion is intentionally skipped for adaptive algorithms (even
when used with `adaptive=false`), `CompositeAlgorithm`, non-finite
tspans, and any solve with callbacks — preserving all existing
stepping semantics in those cases.

Adds a regression test covering forward, reverse, and non-evenly-
dividing `dt` on `tspan = (0.0, 1.0)`.

Fixes the PositiveIntegrators.jl CI failure noted in
NumericalMathematics/PositiveIntegrators.jl#192.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/OrdinaryDiffEq.jl that referenced this pull request Apr 16, 2026
When solving with a fixed-dt method (e.g. `solve(prob, Euler(); dt = 0.1)`),
the accumulated `t + dt + dt + ...` drifts past `tspan[end]` by one ulp,
producing a spurious trailing micro-step.  PR SciML#2869 removed the old
`100eps(tstop)` snap in `fixed_t_for_floatingpoint_error!` that masked this.

Fix: add a floating-point tolerance (`100 * eps(max(|t|, |tstop|))`) to
the `dt < distance_to_tstop` comparison in `modify_dt_for_tstops!`.
When `dt ≈ distance` within rounding the integrator now takes the tstop
branch, and `fixed_t_for_tstop_error!` snaps `t` to the exact tstop
value — eliminating the extra step.

Adds a regression test covering forward, reverse, and non-evenly-
dividing `dt` on `tspan = (0.0, 1.0)`.

Fixes the PositiveIntegrators.jl CI failure noted in
NumericalMathematics/PositiveIntegrators.jl#192.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ChrisRackauckas added a commit to SciML/OrdinaryDiffEq.jl that referenced this pull request Apr 16, 2026
When solving with a fixed-dt method (e.g. `solve(prob, Euler(); dt = 0.1)`),
the accumulated `t + dt + dt + ...` drifts past `tspan[end]` by one ulp,
producing a spurious trailing micro-step.  PR #2869 removed the old
`100eps(tstop)` snap in `fixed_t_for_floatingpoint_error!` that masked this.

Fix: add a floating-point tolerance (`100 * eps(max(|t|, |tstop|))`) to
the `dt < distance_to_tstop` comparison in `modify_dt_for_tstops!`.
When `dt ≈ distance` within rounding the integrator now takes the tstop
branch, and `fixed_t_for_tstop_error!` snaps `t` to the exact tstop
value — eliminating the extra step.

Adds a regression test covering forward, reverse, and non-evenly-
dividing `dt` on `tspan = (0.0, 1.0)`.

Fixes the PositiveIntegrators.jl CI failure noted in
NumericalMathematics/PositiveIntegrators.jl#192.

Co-authored-by: ChrisRackauckas-Claude <accounts@chrisrackauckas.com>
@JoshuaLampert

Copy link
Copy Markdown
Member

Thanks @ChrisRackauckas for the fix in OrdinaryDiffEqCore.jl, which fixed 3 of the 5 test failures here. For the two remaining ones here is an MWE:

using PositiveIntegrators
using OrdinaryDiffEqTsit5
using Test

prob = prob_pds_npzd
alg = Tsit5()

dt = (last(prob.tspan) - first(prob.tspan)) / 1e4
sol = solve(prob, alg; dt, isoutofdomain = isnegative) # use explicit f
sol2 = solve(ConservativePDSProblem(prob.f.p, prob.u0, prob.tspan), alg; dt,
             isoutofdomain = isnegative) # use p and d to compute f

@test sol.t  sol2.t

This test passes until OrdinaryDiffEqCore.jl v3.9 and fails with OrdinaryDiffEqCore.jl v3.10 and up. Any ideas what might cause this and how to fix it? A similar test for other time integrators is also successful with newer versions of OrdinaryDiffEqCore.jl, but it is the version change from v3.9 to v3.10 in OrdinaryDiffEqCore.jl, which makes the Tsit5() one fail.
The following gives the same as sol:

sol3 = solve(ODEProblem(prob.f.std_rhs, prob.u0, prob.tspan), alg; dt,
                             isoutofdomain = isnegative) # use f to create ODEProblem

as expected. So it is sol2, which unexpectedly gives another result.

@ChrisRackauckas

Copy link
Copy Markdown
Contributor

Okay this case is pretty understood, though I don't know what you want to do with it so I'll just give you the information. Your two problems are not identical, your fs are not identical. Problem 1 computes via dpn + dzn + ddn - dnp ≈ ((dpn + dzn) + ddn) - dnp while Problem 2 uses (((-dnp) + dpn) + dzn) + ddn with @fastmath @inbounds @simd (which allows reordering). Because floating point is not associative, these differ by 2.22e-16, so for the stepper EEst1 = 7.012e-17 and EEst2 = 6.858e-17. That leads to different time steps. So in a strict sense, your test of "are they floating point the same" is simply false because your f is not floating point the same.

So then your question will be, why were they same before? Good question. Before the qmax acceleration PR, the maximum q was always 10, i.e. dt_new = q*dt had a maximum of 10 per step. So basically what would happen on this problem, because it is trivial enough that the ODE solver solves it to effectively floating point tolerance (i.e. EEst < eps(Float64)), you effectively have 0 error in each step because the solver is hitting the analytical solution, and so it grows by 10 each step. Because they match to 16 digits, dt would then always be the same because they would match to effectively floating point accuracy + 1 digit, and multiplying by 10 shifts the digit, so it is exactly the same. But then with qmax acceleration, the first step now is able to be larger, to 10_000. That shifts by 5 digits, and so because they are the same to 17 digits (the absolute highest possible in 64-bit floats), this means dt is the same to 12 digits, meaning that difference of EEst1 = 7.012e-17 and EEst2 = 6.858e-17 shows up in the 12th digit.

So you could pass in controller = NewPIController(Float64, alg; qmax_first_step = 10) to remove qmax acceleration (and then update it in v7 to just PIController) and that would give you the old behavior. Though I'd argue that the real issue here is that you have a test that things are floating point exact, when it's not the ODE solver that is the cause of the difference but your problem definitions, because they are not accumulating in the same order. So I'd either relax that to an approximate equality, understanding that they will differ in the 12th digit due to a fundamental accuracy limit in floating point accumulation, or flip the definition of one of the problems so that they associate in the same order and thus are actually floating point the same. But this mixture of them not being floating point the same but expecting the ODE solver to give floating point the same time steps is not going to be very robust for what I hope is clear reasons.

@JoshuaLampert

Copy link
Copy Markdown
Member

Thanks for the explanation! What do you think, @ranocha, @SKopecz?

@JoshuaLampert

JoshuaLampert commented Apr 19, 2026

Copy link
Copy Markdown
Member

So just to be clear two clarifications:

So I'd either relax that to an approximate equality

We do that already (we use as you see from the MWE I posted above).

understanding that they will differ in the 12th digit due to a fundamental accuracy limit in floating point accumulation

We are not talking about differences in the 12th digit, but differences of the order of 1e-4 to 1e-2.
From the MWE above:

julia> sol.t .- sol2.t
421-element Vector{Float64}:
  0.0
  0.0
 -0.0002446514639678071
 -0.0005306752357510658
 -0.0006660263933044863
 -0.0008960739422696484
  0.009118037912073884
  0.009250671172130298
  0.00860234178977426
  0.0032171897383779235
  0.0021411554781194386
  0.0016643597373147134
  0.0006729809782541896
 -2.0721123352496207e-5
  0.00032009719766801226
  4.709706773042832e-5
  0.00017986037279871248
  9.638448074444916e-5
  
  0.006162175210169707
  0.006898746292788083
  0.00779818506312413
  0.008905355941839943
  0.010282000114095524
  0.012011926086444191
  0.014213548895940953
  0.017046988673076413
  0.020728832292429722
  0.025517463366548476
  0.031671296450147324
  0.039366079933075504
  0.04866429536678396
  0.059558828030842115
  0.07190310256362853
  0.08538681438158413
  0.09985857665201259
  0.0

@ChrisRackauckas

Copy link
Copy Markdown
Contributor

The first step is 1e-12 different, but yes over 1e4 steps that will compound until you have a point where adaptivity takes a different branch.

... Is there a reason to not just fix f if that's what you're trying to test?

@JoshuaLampert

JoshuaLampert commented Apr 19, 2026

Copy link
Copy Markdown
Member

I didn't write this test. I'm just trying to help debugging. So I'll wait for @ranocha or @SKopecz to chime in.
The time steps (if that is what you mean with "over 1e4 steps") aren't of the order of 1e4 though (the whole time span is only (0, 10)) and we have 421 time steps.

@ranocha

ranocha commented Apr 20, 2026

Copy link
Copy Markdown
Member

PositiveIntegrators.jl requires a special form of the ODE to apply positivity-preserving time integration methods. To make it easy to use, we can deduce the standard RHS from the form we need here. To allow a fair comparison, we also have the option to implement the standard ODE RHS directly. The tests we are talking about shall check whether the automatic computation of the RHS and the specialized implementation will give approximately the same results.
We know that both ways of computing the RHS will differ up to floating-point errors (say 1.0e-16). An ODE solver should represent a continuous mapping (since otherwise tiny errors in the initial data coming from arbitrarily small measurement errors may change the result drastically). Before the change, this expectation was met by the implementation from OrdinaryDiffEq.jl; the time series was approximately the same (when measured with isapprox with default tolerances for a few dozen values between 1.0e-3 and 10.0). However, after the update, this is not the case anymore; we see relative differences in the times of order 0.005.

This kind of difference persists even if we use very strict tolerances: building on the MWE of @JoshuaLampert above, I get

julia> using PositiveIntegrators, OrdinaryDiffEqTsit5, Test

julia> begin
       prob = prob_pds_npzd
       alg = Tsit5()
       abstol = 1.0e-14
       reltol = 1.0e-14

       dt = (last(prob.tspan) - first(prob.tspan)) / 1e4
       sol1 = solve(prob, alg; dt, abstol, reltol) # use explicit f
       sol2 = solve(ConservativePDSProblem(prob.f.p, prob.u0, prob.tspan), alg; dt, abstol, reltol) # use p and d to compute f

       @test sol1.t  sol2.t
       end
Test Failed at REPL[4]:11

julia> maximum(filter(isfinite, @. abs(sol1.t - sol2.t) / sol1.t))
0.004146117491549062

Thus, differences of the order of floating-point accuracy in evaluating the RHS lead to differences of order 1.0e-3 in the times, even for extremely strict tolerances. I do not understand this.

@ChrisRackauckas

Copy link
Copy Markdown
Contributor

Before the change, this expectation was met by the implementation from OrdinaryDiffEq.jl; the time series was approximately the same (when measured with isapprox with default tolerances for a few dozen values between 1.0e-3 and 10.0).

I don't know if this was an error from an AI hallucination or something, but this test does not test the difference in the time series. The test is not on the solution or it's continuous convergence but on what time points are used. The solution had effectively zero error in the steps, it satisfies a relative tolerance of 2e-12, which is the maximum accuracy that can be achieved in 1e4 steps due to the 2e-16 tolerance of 64-bit floating point error. The maximum error of the interpolated solution is 8e-14 and at all grid points is approximately 1e-15, mostly due to just pure floating point errors at that level. It hits this level because the solver effectively has super convergence on the problem.

So you have a continuous solution that has a relative continuous accuracy of 1e-12 at any point evaluated.

This kind of difference persists even if we use very strict tolerances:

Yes, because as described the solution is already at the edge of floating point accuracy, so the tolerance isn't doing much.

But it is still interesting that the steps diverge that much. I'll find the exact line

@ChrisRackauckas-Claude

Copy link
Copy Markdown

Full reconstruction of the sol.t divergence

I traced this end-to-end. The solutions are identical — only the adaptive time grid differs, and that difference is a consequence of floating-point noise in the error estimate, not a solver bug.

1. The solutions agree to machine precision

Interpolating both solutions at 1000 common time points:

t_common = range(prob.tspan[1], prob.tspan[2], length=1000)
u1 = reduce(hcat, sol1.(t_common))
u2 = reduce(hcat, sol2.(t_common))
maximum(abs.(u1 .- u2) ./ max.(abs.(u1), 1e-30))  # 2.35e-12
isapprox(u1, u2; rtol=1e-12)  # true

The sol.t grid differs by up to 0.4% relative, but the underlying continuous solution trajectories match to ~2e-12.

2. Root cause: 1 ULP difference in one RK stage, amplified by catastrophic cancellation

The two f implementations evaluate identically at u0 and at every solution knot point. But at one intermediate Runge-Kutta stage point (stage 6 of Tsit5, component 1), they differ by exactly 1 ULP due to non-associative floating-point accumulation in the RHS:

k6_prob1 = -1.956863951793336
k6_prob2 = -1.9568639517933362   # differs by eps(k6) = 2.2e-16

All other stages (k1–k5, k7) are bit-identical.

3. The amplification chain

Tsit5's error estimate computes utilde = dt * Σ btilde_i * k_i using @muladd (fused multiply-add). Here is the exact reconstruction for component 1:

using MuladdMacro

# The accumulator after stages 1–5 (identical for both problems):
s_after_k5 = -0.8667549061757538

btilde6 = -0.45808210592918697
btilde7 =  0.015151515151515152
k7 = -1.956863955990825

# Stage 6 FMA — the 1-ULP diff enters:
s6_prob1 = muladd(btilde6, -1.956863951793336,  s_after_k5)  # 0.029649453878648542
s6_prob2 = muladd(btilde6, -1.9568639517933362, s_after_k5)  # 0.029649453878648643
# absolute diff: 1.006e-16,  relative diff: 3.4e-15 — tiny

# Stage 7 FMA — catastrophic cancellation:
# btilde7 * k7 ≈ -0.02965 nearly cancels s6 ≈ +0.02965
s7_prob1 = muladd(btilde7, k7, s6_prob1)  # -3.219e-16
s7_prob2 = muladd(btilde7, k7, s6_prob2)  # -2.213e-16
# absolute diff: 1.006e-16 (UNCHANGED)
# relative diff: 31%  ← magnitude dropped by 9.2e13×

# Without FMA, the diff is exactly zero:
s6_nofma_1 = btilde6 * -1.956863951793336  + s_after_k5
s6_nofma_2 = btilde6 * -1.9568639517933362 + s_after_k5
s6_nofma_1 - s6_nofma_2  # 0.0

The btilde coefficients encode the difference between the 4th and 5th order Tsit5 solutions — they're designed to cancel, producing O(dt⁶) ≈ 1e-18 for dt=0.001. That means the sum cancels terms of magnitude ~1 down to ~1e-16: 16 digits of cancellation. The 1e-16 absolute diff from the 1-ULP k6 difference is unchanged by the cancellation, but becomes 31% of the near-zero result.

FMA makes this visible because muladd(btilde6, k6, accumulator) preserves the full precision of btilde6 * k6 before adding. Without FMA, the intermediate round after btilde6 * k6 absorbs the 1-ULP difference and both problems give identical results.

4. From 31% utilde to 4% EEst to 0.6% dt

utilde[1]: -3.22e-19 vs -2.21e-19  (31% relative diff, ~1e-19 absolute)
   ÷ tolerance (1e-14 + 1e-14*|u|) ≈ 9e-14
   = atmp[1]: -3.58e-6 vs -2.46e-6

EEst = norm(atmp): 4.59e-6 vs 4.40e-6  (4.1% diff)

PI controller: EEst^0.14 / errold^0.08
   → q ratio ≈ 1.006
   → dt_new: 0.002408 vs 0.002422  (0.59% diff)

The 0.59% dt difference on step 2 then compounds — not chaotically (the controller is stable and the offset damps: 2806 shrinks vs 1641 grows over 4448 steps), but enough to produce a 0.4% peak time-grid divergence.

5. Why there's no fix in the summation

  • Reordering moves the cancellation point but doesn't eliminate it — the btilde coefficients must cancel by construction.
  • Compensated summation (Kahan) makes it worse: computes the near-zero sum more accurately (~1e-18), but the btilde6 × Δk6 ≈ 1e-16 contribution stays, so the relative error goes from 31% to ~10000%.
  • Removing @muladd from the error estimate fixes this specific case (shown above: diff becomes exactly zero) but is fragile — a 2-ULP diff or different problem would break through.

The fundamental issue: when local error is at machine epsilon, the error estimate is pure noise. No summation trick can extract signal from noise when 16 digits have cancelled.

6. Recommendation

The solutions are correct and agree to ~1e-12. I'd suggest comparing interpolated u at shared time points rather than comparing sol.t:

@test sol1.(experiment.t, idxs=1)  sol2.(experiment.t, idxs=1)

The adaptive time grid is an implementation detail of the step size controller responding to noise-floor error estimates — it's not part of the solution in the mathematical sense.

@ranocha

ranocha commented Apr 20, 2026

Copy link
Copy Markdown
Member

I don't know if this was an error from an AI hallucination or something, but this test does not test the difference in the time series.

Just to clarify: sol.t was what I referred to as "time series".

Thanks for your input. We will see what we get from CI when checking interpolated values.

@JoshuaLampert

Copy link
Copy Markdown
Member

CI is still failing, but with explicit tolerances it passes locally. So let's try that.

julia> using PositiveIntegrators, OrdinaryDiffEqTsit5, Test

julia> begin
       prob = prob_pds_npzd
       alg = Tsit5()
       
       dt = (last(prob.tspan) - first(prob.tspan)) / 1e4
       sol1 = solve(prob, alg; dt, isoutofdomain = isnegative) # use explicit f
       sol2 = solve(ConservativePDSProblem(prob.f.p, prob.u0, prob.tspan), alg; dt, isoutofdomain = isnegative) # use p and d to compute f
       
       t = range(prob.tspan..., length = 10_000)
       @test sol1.(t)  sol2.(t)
       end
Test Failed at REPL[2]:10

julia> begin
       prob = prob_pds_npzd
       alg = Tsit5()
       abstol = 1.0e-5
       reltol = 1.0e-5
       
       dt = (last(prob.tspan) - first(prob.tspan)) / 1e4
       sol1 = solve(prob, alg; dt, abstol, reltol, isoutofdomain = isnegative) # use explicit f
       sol2 = solve(ConservativePDSProblem(prob.f.p, prob.u0, prob.tspan), alg; dt, abstol, reltol, isoutofdomain = isnegative) # use p and d to compute f
       
       t = range(prob.tspan..., length = 10_000)
       @test sol1.(t)  sol2.(t)
       end
Test Passed

@JoshuaLampert JoshuaLampert requested a review from ranocha April 21, 2026 10:03

@ranocha ranocha left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to everybody involved for their help.

@ranocha ranocha merged commit 795f2d1 into main Apr 22, 2026
11 checks passed
@ranocha ranocha deleted the compathelper/new_version/2026-03-31-00-44-53-208-03761993723 branch April 22, 2026 06:49
@ChrisRackauckas

Copy link
Copy Markdown
Contributor

I had to sit on this a bit to better understand why this property exists, and I think there's a very good way to see that it's fundamental to the system if you think about it as a classical control problem. First of all, a high order integrator with a high order interpolation is another way of stating that the system is insensitive to the chosen dts, since the solution is more accurate and the interpolation is accurate, moving a chosen t over and interpolating only effects the solution a tiny bit, since the solution is already so accurate. So then by classical control arguments, the optimal feedforward controller, which is the inverse of the plant model, is highly sensitive.

It's similar to how if you have a controller for a system that quickly goes to equilibrium (it's highly insensitive), the inverse of the system is highly sensitive since the history of what things looked like before equilibrium is very quickly dampened out and forgotten. Due to this, the control strategy ends up being bang-bang controls and very sensitive to the current values, since any errors are dampened out quickly, the controller should act hard.

In the same sense, "medium" (1e-3) differences in what t is chosen changes something in the 13th digit here, so the controller strategy ends up being highly sensitive because the system is not sensitive to the controller's response. Basically, it doesn't have much to sense: when it makes mistakes, there's not really telling it to correct to anything, because the controller doing random things does not make much of a difference to what it is sensing. Therefore it effectively ends up noisy.

So I think this is pretty fundamental: high order methods (in lower tolerance limits) will effectively give noisy t choices. Low order methods and higher tolerances will be much more sensitive to hitting the correct values based on dt, so dt is much more constrained and so the solver will more readily be required to hit "the right" dt. But in this case, there is no "right" dt, since pretty everything gives the same solution, all being below tolerance, so the controller is basically spitting out noise that is filtered out in the plant, the plant model not being the solution but the error of the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants