Skip to content

Widen UJacobianWrapper.p for nested ForwardDiff through Rosenbrock (#3381)#3389

Closed
ChrisRackauckas-Claude wants to merge 1 commit intoSciML:masterfrom
ChrisRackauckas-Claude:fix-nested-forwarddiff-uf-p
Closed

Widen UJacobianWrapper.p for nested ForwardDiff through Rosenbrock (#3381)#3389
ChrisRackauckas-Claude wants to merge 1 commit intoSciML:masterfrom
ChrisRackauckas-Claude:fix-nested-forwarddiff-uf-p

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Summary

  • Fixes ForwardDiff errors for NonlinearSolve over ODE solve #3381. When a Rosenbrock integrator is invoked with a Vector{<:Dual} p (because we are inside an outer ForwardDiff layer — e.g. a NonlinearLeastSquares parameter fit), the inner Jacobian widens u into a deeper nested-Dual type via its JacobianConfig, but the stored p in UJacobianWrapper is still at the outer Dual level. The user body then multiplies p[i] * u[i] across two different Dual nesting levels, which dispatches through ForwardDiff's tagcount-based tag precedence. That precedence is a @generated function whose literal value is baked at first compile, so its ordering depends on which package's precompile cache first instantiated each tag type — in this nested-AD scenario it inverts the nesting and produces a triple-nested Dual that crashes setindex!(du, ...) with MethodError: no method matching Float64(::Dual{..., Dual{..., Float64, 2}, 2}).
  • jacobian! now inspects the prepared ForwardDiff.JacobianConfig to recover the inner xdual buffer's element type and lifts uf.p into that nested-Dual type via a fresh UJacobianWrapper before delegating to DI.jacobian!. The widened p carries zero inner partials (correct — p does not depend on u). The fast path (_widen_uf_p_for_jac(f, prep) = f) is a single type-stable dispatch so non-nested calls incur no overhead.
  • Adds a regression test that builds an outer Dual p with a throwaway tag type and solves a Rosenbrock23 ODE, exercising the exact call graph that crashes pre-fix and succeeds post-fix. The test is self-contained within OrdinaryDiffEqDifferentiation's test deps (no NonlinearSolve needed).
  • Bumps OrdinaryDiffEqDifferentiation patch version 2.8.0 → 2.8.1.

Root cause (abbreviated)

The user's MWE from #3381:

using OrdinaryDiffEqRosenbrock, NonlinearSolve
function ode(du, u, p, t)
    du[1] = -p[1]*u[1]
    du[2] = -u[1] - p[2]*u[2]
end
ode_f = ODEFunction{true, SciMLBase.FullSpecialize}(ode)
solve_ode(p) = solve(ODEProblem(ode_f, [1.0, 1.0], (0.0, 10.0), p),
                     Rosenbrock23(autodiff=AutoForwardDiff(chunksize=2)))
experiment = solve_ode([1.5, 2.0])
function resid!(du, p, _)
    du .= sol.(experiment.t, idxs=1) .- experiment[1, :] where sol = solve_ode(p)
end
nls = NonlinearFunction{true, SciMLBase.FullSpecialize}(resid!, resid_prototype=zeros(length(experiment.t)))
solve(NonlinearLeastSquaresProblem(nls, [1.0, 1.0]), LevenbergMarquardt())

fails with FirstAutodiffJacError wrapping
MethodError: no method matching Float64(::ForwardDiff.Dual{Tag{OrdinaryDiffEqTag, Dual{NonlinearSolveTag, Float64, 2}}, Dual{NonlinearSolveTag, Float64, 2}, 2})

The nested dual ends up being assigned into a vector whose eltype is only the outer single-level dual. Bisecting shows the broken tagcount literal order for the outer Tag{NonlinearSolveTag, Float64} vs. the inner Tag{OrdinaryDiffEqTag, Dual{NonlinearSolveTag, Float64, 2}} — the former is baked by NonlinearSolveBase's precompile workload at precompile time while the latter is generated lazily at runtime, which inverts for the cross-tag p[i]*u[i] multiplication inside ode. That inversion is what this patch defends against by normalizing p on the inner solver's side so the cross-tag arithmetic never happens.

The NonlinearSolveBase-side root cause (hardcoded Tag{NonlinearSolveTag, Float64} stamped via isa Vector{Float64} specialization) is addressed in a companion PR to SciML/NonlinearSolve.jl. This OrdinaryDiffEq PR is the robust bug-class fix: it protects any caller that hands UJacobianWrapper a Vector{<:Dual} p, regardless of how that caller's tag was registered.

Test plan

  • Added lib/OrdinaryDiffEqDifferentiation/test/nested_forwarddiff_tests.jl. Reproduces the crash pre-fix (FirstAutodiffJacError on an unpatched repro env) and passes post-fix on the same env with just the derivative_wrappers.jl change developed in.
  • Existing Autodiff Error Tests / Differentiation Trait Tests / No Jac Tests still pass locally (no behavior change for single-level AD — _widen_uf_p_for_jac returns f unchanged in that path).
  • CI on OrdinaryDiffEqDifferentiation test group.

🤖 Generated with Claude Code

ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/NonlinearSolve.jl that referenced this pull request Apr 11, 2026
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Combined with the precompile workload in
`NonlinearSolveBaseForwardDiffExt` exercising `Dual{Tag{NonlinearSolveTag,
Float64}, Float64, 1}`, that canonical tag's `@generated tagcount`
literal ended up baked into the NSB precompile cache at a low value.

When a nonlinear solve is nested inside a deeper ForwardDiff layer (or
vice versa — e.g. a NonlinearLeastSquares parameter fit whose residual
calls into a Rosenbrock ODE solve), the inner AD layer creates a
compound tag like `Tag{OrdinaryDiffEqTag, Dual{NonlinearSolveTag,
Float64, N}}` lazily at runtime. That runtime-generated tagcount is
higher than the precompile-baked `Tag{NonlinearSolveTag, Float64}`
literal, so ForwardDiff's `tagcount`-based `≺` declares the outer tag
"newer" than the inner — inverting nesting and producing a triple-nested
`Dual` that crashes the user body's `setindex!(du, p[i]*u[i])` with a
`MethodError: no method matching Float64(::nested_dual)`.

This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` (ForwardDiff and PolyesterForwardDiff
dispatches) and `maybe_wrap_nonlinear_f`, so the tag is stamped if and
only if the user function was actually wrapped. Under FullSpecialize the
AD backend is now returned unchanged, letting DifferentiationInterface
generate a fresh runtime tag from the function type — which orders
correctly against inner-solve tags generated later.

The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)` so the canonical tag is parameterized by the actual
problem eltype rather than always `Float64`.

The precompile workload's `const dualT = Dual{Tag{NonlinearSolveTag,
Float64}, Float64, 1}` is replaced by a throwaway `_PrecompileTag`
struct local to the workload. This preserves the precompile speedups
for Dual arithmetic / broadcast / SubArray patterns while keeping
`NonlinearSolveTag`-parameterized tagcounts runtime-lazy, so they are
ordered relative to whatever outer AD layer actually calls into NSB.

Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize NonlinearFunction
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff). This test fails on unpatched master and
passes post-fix.

Addresses the NonlinearSolveBase side of SciML/OrdinaryDiffEq.jl#3381.
A companion PR to SciML/OrdinaryDiffEq.jl#3389 adds defense in depth on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type before calling `DI.jacobian!`.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/NonlinearSolve.jl that referenced this pull request Apr 11, 2026
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Under FullSpecialize — where no wrapping
happens and there is no reason to substitute a canonical tag — this
still replaced the user's AD backend with the pre-baked
`Tag{NonlinearSolveTag, Float64}` from the precompile workload, which
carries a stale `@generated tagcount` literal relative to tags generated
lazily at runtime for inner ForwardDiff layers. The cross-tag `p[i] *
u[i]` dispatch in the user body then inverts nesting and crashes
`setindex!(du, ...)` with `Float64(::nested_dual)` / MethodError, as
reported in SciML/OrdinaryDiffEq.jl#3381.

This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` dispatches (AutoForwardDiff and
AutoPolyesterForwardDiff) and from `maybe_wrap_nonlinear_f`. The tag is
now stamped if and only if the user function was actually wrapped.
Under FullSpecialize (or any path where wrapping was opted out), the AD
backend is returned unchanged, letting DifferentiationInterface generate
a fresh runtime tag from the function type — which orders correctly
against inner-solve tags generated later.

The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)`, so the stamped tag is parameterized by the actual
problem eltype rather than always Float64.

The existing `const dualT = Dual{Tag{NonlinearSolveTag, Float64}, Float64,
1}` precompile workload is kept as-is. It is only reached via the
AutoSpecialize-wrapped path, whose nested-AD failure mode is addressed on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type in SciML/OrdinaryDiffEq.jl#3389 — which sidesteps the
cross-tag multiplication entirely, so the precompile-baked tagcount
literal is no longer load-bearing for correctness and the TTFX win from
precompiling the real `NonlinearSolveTag` Dual operations is preserved.

Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize `NonlinearFunction`
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff dispatches). Confirmed to fail on unpatched
master (1/18 fail in the testset) and pass post-fix.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a Rosenbrock integrator runs inside an outer ForwardDiff layer, the
stored `p` in `UJacobianWrapper` is a `Vector{<:Dual}` at the outer Dual
level, but the inner Jacobian computation widens `u` into a deeper
nested-Dual type via its prepared `JacobianConfig`. The user
`f(du, u, p, t)` body then multiplies `p[i] * u[i]` across two different
Dual nesting levels, which dispatches through ForwardDiff's
`tagcount`-based tag precedence. That precedence is a `@generated`
function whose literal value is baked at first compile, so its ordering
depends on which package's precompile cache first instantiated a given
tag type — in nested scenarios this can invert nesting and produce a
triple-nested `Dual` that crashes `setindex!(du, ...)` with
`Float64(::nested_dual)`.

Fix: in `jacobian!`, inspect the prepared `ForwardDiff.JacobianConfig`
for the inner xdual buffer's element type and lift `uf.p` into that
nested-Dual type ahead of time via a fresh `UJacobianWrapper`. The
widened `p` carries zero inner partials (correct — `p` does not depend
on `u`), and the fast path is a single type-stable dispatch returning
`f` unchanged so there is no overhead for non-nested calls.

Add a regression test exercising the precise call graph
(outer-Dual `p`, inner Rosenbrock Jacobian) that crashes pre-fix with a
`FirstAutodiffJacError(MethodError(Float64, (nested_dual,)))` and
succeeds post-fix.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChrisRackauckas-Claude ChrisRackauckas-Claude force-pushed the fix-nested-forwarddiff-uf-p branch from 4d3f0d6 to d72f69b Compare April 12, 2026 01:44
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Triaged the CI failures on the initial push. Breakdown:

Caused by this PR (fixed in force-push):

  • test (AD, 1.11)test/ad/ad_tests.jl:301, @test !iszero(DI.gradient(of_a, AutoForwardDiff(), [1.0])). This is a nested ForwardDiff over an ODE solve where the inner ODEProblem(f_a, u0, tspan, p) uses the default DEFAULT_SPECIALIZATION === AutoSpecialize, so prob.f.f is a FunctionWrappersWrapper. DiffEqBase pre-compiles the FWW slot (Vector{nested_u_dual}, Vector{nested_du_dual}, Vector{outer_p_dual}, Float64) exactly for this case, and the unwidened call path matches it. My helper was widening uf.p to a nested Dual, which produced (nested_u, nested_du, *nested_p*, Float64) — a signature no FWW slot covers — which FWW v1's AllowNonIsBits policy then rejects with NoFunctionWrapperFoundError (surfacing as the FirstAutodiffJacError in the test output).

    Fix: gate _widen_uf_p_for_jac on f.f not being a FunctionWrappersWrapper. Under AutoSpecialize we leave uf.p alone and let the DiffEqBase-precompiled FWW slot take the call; under FullSpecialize (the call path from ForwardDiff errors for NonlinearSolve over ODE solve #3381 MWE and from the new regression test, where ode_f.f is a plain function) we still widen and defuse the cross-tag multiplication. Both repro_fix's MWE run and the nested_forwarddiff_tests.jl regression test still pass locally, and the minimized ad_tests.jl:301 repro now returns g = [-0.3487867031164756] instead of throwing.

Pre-existing, not caused by this PR:

  • runic — drift in lib/OrdinaryDiffEqAMF/test/test_adjoint_fd2d.jl, lib/OrdinaryDiffEqCore/src/alg_utils.jl, lib/OrdinaryDiffEqNonlinearSolve/test/nested_ad_nlsolvealg_tests.jl, lib/OrdinaryDiffEqRosenbrock/test/ode_rosenbrock_tests.jl. None touched by this PR.
  • test (Regression_II, *) — same 4 versions failing on current master (run 24293597671 on a6c6e192).
  • test (OrdinaryDiffEqRosenbrock, 1) / test (OrdinaryDiffEqNonlinearSolve, 1)@inferred failures of the shape return type ODESolution{...,Nothing} does not match inferred return type Union{ODESolution{...,Nothing}, ODESolution{...,JacReuseState{Float64}}}. The JacReuseState Union is inferred from the Add Jacobian reuse for Rosenbrock-W methods commit (Add Jacobian reuse for Rosenbrock-W methods #3075 / 689d0b7), not from anything in this PR.
  • ModelingToolkit.jl/*, test (Integrators_II, *), test (OrdinaryDiffEqSDIRK*, *), test (StochasticDiffEq_WeakConvergence1) — from a quick pass these are either the same pre-existing inference regressions or master flakes; I have not seen this PR's diff reach any of those code paths in a local bisect. If any of them are still failing on the new run and do look PR-adjacent I'll dig in further.

Force-pushed d72f69b8a with the gated widener + the updated inline comment explaining the AutoSpecialize short-circuit.

ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/NonlinearSolve.jl that referenced this pull request Apr 15, 2026
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Under FullSpecialize — where no wrapping
happens and there is no reason to substitute a canonical tag — this
still replaced the user's AD backend with the pre-baked
`Tag{NonlinearSolveTag, Float64}` from the precompile workload, which
carries a stale `@generated tagcount` literal relative to tags generated
lazily at runtime for inner ForwardDiff layers. The cross-tag `p[i] *
u[i]` dispatch in the user body then inverts nesting and crashes
`setindex!(du, ...)` with `Float64(::nested_dual)` / MethodError, as
reported in SciML/OrdinaryDiffEq.jl#3381.

This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` dispatches (AutoForwardDiff and
AutoPolyesterForwardDiff) and from `maybe_wrap_nonlinear_f`. The tag is
now stamped if and only if the user function was actually wrapped.
Under FullSpecialize (or any path where wrapping was opted out), the AD
backend is returned unchanged, letting DifferentiationInterface generate
a fresh runtime tag from the function type — which orders correctly
against inner-solve tags generated later.

The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)`, so the stamped tag is parameterized by the actual
problem eltype rather than always Float64.

The existing `const dualT = Dual{Tag{NonlinearSolveTag, Float64}, Float64,
1}` precompile workload is kept as-is. It is only reached via the
AutoSpecialize-wrapped path, whose nested-AD failure mode is addressed on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type in SciML/OrdinaryDiffEq.jl#3389 — which sidesteps the
cross-tag multiplication entirely, so the precompile-baked tagcount
literal is no longer load-bearing for correctness and the TTFX win from
precompiling the real `NonlinearSolveTag` Dual operations is preserved.

Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize `NonlinearFunction`
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff dispatches). Confirmed to fail on unpatched
master (1/18 fail in the testset) and pass post-fix.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChrisRackauckas added a commit to SciML/NonlinearSolve.jl that referenced this pull request Apr 16, 2026
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Under FullSpecialize — where no wrapping
happens and there is no reason to substitute a canonical tag — this
still replaced the user's AD backend with the pre-baked
`Tag{NonlinearSolveTag, Float64}` from the precompile workload, which
carries a stale `@generated tagcount` literal relative to tags generated
lazily at runtime for inner ForwardDiff layers. The cross-tag `p[i] *
u[i]` dispatch in the user body then inverts nesting and crashes
`setindex!(du, ...)` with `Float64(::nested_dual)` / MethodError, as
reported in SciML/OrdinaryDiffEq.jl#3381.

This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` dispatches (AutoForwardDiff and
AutoPolyesterForwardDiff) and from `maybe_wrap_nonlinear_f`. The tag is
now stamped if and only if the user function was actually wrapped.
Under FullSpecialize (or any path where wrapping was opted out), the AD
backend is returned unchanged, letting DifferentiationInterface generate
a fresh runtime tag from the function type — which orders correctly
against inner-solve tags generated later.

The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)`, so the stamped tag is parameterized by the actual
problem eltype rather than always Float64.

The existing `const dualT = Dual{Tag{NonlinearSolveTag, Float64}, Float64,
1}` precompile workload is kept as-is. It is only reached via the
AutoSpecialize-wrapped path, whose nested-AD failure mode is addressed on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type in SciML/OrdinaryDiffEq.jl#3389 — which sidesteps the
cross-tag multiplication entirely, so the precompile-baked tagcount
literal is no longer load-bearing for correctness and the TTFX win from
precompiling the real `NonlinearSolveTag` Dual operations is preserved.

Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize `NonlinearFunction`
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff dispatches). Confirmed to fail on unpatched
master (1/18 fail in the testset) and pass post-fix.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ForwardDiff errors for NonlinearSolve over ODE solve

2 participants