Widen UJacobianWrapper.p for nested ForwardDiff through Rosenbrock (#3381)#3389
Conversation
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Combined with the precompile workload in
`NonlinearSolveBaseForwardDiffExt` exercising `Dual{Tag{NonlinearSolveTag,
Float64}, Float64, 1}`, that canonical tag's `@generated tagcount`
literal ended up baked into the NSB precompile cache at a low value.
When a nonlinear solve is nested inside a deeper ForwardDiff layer (or
vice versa — e.g. a NonlinearLeastSquares parameter fit whose residual
calls into a Rosenbrock ODE solve), the inner AD layer creates a
compound tag like `Tag{OrdinaryDiffEqTag, Dual{NonlinearSolveTag,
Float64, N}}` lazily at runtime. That runtime-generated tagcount is
higher than the precompile-baked `Tag{NonlinearSolveTag, Float64}`
literal, so ForwardDiff's `tagcount`-based `≺` declares the outer tag
"newer" than the inner — inverting nesting and producing a triple-nested
`Dual` that crashes the user body's `setindex!(du, p[i]*u[i])` with a
`MethodError: no method matching Float64(::nested_dual)`.
This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` (ForwardDiff and PolyesterForwardDiff
dispatches) and `maybe_wrap_nonlinear_f`, so the tag is stamped if and
only if the user function was actually wrapped. Under FullSpecialize the
AD backend is now returned unchanged, letting DifferentiationInterface
generate a fresh runtime tag from the function type — which orders
correctly against inner-solve tags generated later.
The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)` so the canonical tag is parameterized by the actual
problem eltype rather than always `Float64`.
The precompile workload's `const dualT = Dual{Tag{NonlinearSolveTag,
Float64}, Float64, 1}` is replaced by a throwaway `_PrecompileTag`
struct local to the workload. This preserves the precompile speedups
for Dual arithmetic / broadcast / SubArray patterns while keeping
`NonlinearSolveTag`-parameterized tagcounts runtime-lazy, so they are
ordered relative to whatever outer AD layer actually calls into NSB.
Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize NonlinearFunction
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff). This test fails on unpatched master and
passes post-fix.
Addresses the NonlinearSolveBase side of SciML/OrdinaryDiffEq.jl#3381.
A companion PR to SciML/OrdinaryDiffEq.jl#3389 adds defense in depth on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type before calling `DI.jacobian!`.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Under FullSpecialize — where no wrapping
happens and there is no reason to substitute a canonical tag — this
still replaced the user's AD backend with the pre-baked
`Tag{NonlinearSolveTag, Float64}` from the precompile workload, which
carries a stale `@generated tagcount` literal relative to tags generated
lazily at runtime for inner ForwardDiff layers. The cross-tag `p[i] *
u[i]` dispatch in the user body then inverts nesting and crashes
`setindex!(du, ...)` with `Float64(::nested_dual)` / MethodError, as
reported in SciML/OrdinaryDiffEq.jl#3381.
This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` dispatches (AutoForwardDiff and
AutoPolyesterForwardDiff) and from `maybe_wrap_nonlinear_f`. The tag is
now stamped if and only if the user function was actually wrapped.
Under FullSpecialize (or any path where wrapping was opted out), the AD
backend is returned unchanged, letting DifferentiationInterface generate
a fresh runtime tag from the function type — which orders correctly
against inner-solve tags generated later.
The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)`, so the stamped tag is parameterized by the actual
problem eltype rather than always Float64.
The existing `const dualT = Dual{Tag{NonlinearSolveTag, Float64}, Float64,
1}` precompile workload is kept as-is. It is only reached via the
AutoSpecialize-wrapped path, whose nested-AD failure mode is addressed on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type in SciML/OrdinaryDiffEq.jl#3389 — which sidesteps the
cross-tag multiplication entirely, so the precompile-baked tagcount
literal is no longer load-bearing for correctness and the TTFX win from
precompiling the real `NonlinearSolveTag` Dual operations is preserved.
Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize `NonlinearFunction`
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff dispatches). Confirmed to fail on unpatched
master (1/18 fail in the testset) and pass post-fix.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a Rosenbrock integrator runs inside an outer ForwardDiff layer, the
stored `p` in `UJacobianWrapper` is a `Vector{<:Dual}` at the outer Dual
level, but the inner Jacobian computation widens `u` into a deeper
nested-Dual type via its prepared `JacobianConfig`. The user
`f(du, u, p, t)` body then multiplies `p[i] * u[i]` across two different
Dual nesting levels, which dispatches through ForwardDiff's
`tagcount`-based tag precedence. That precedence is a `@generated`
function whose literal value is baked at first compile, so its ordering
depends on which package's precompile cache first instantiated a given
tag type — in nested scenarios this can invert nesting and produce a
triple-nested `Dual` that crashes `setindex!(du, ...)` with
`Float64(::nested_dual)`.
Fix: in `jacobian!`, inspect the prepared `ForwardDiff.JacobianConfig`
for the inner xdual buffer's element type and lift `uf.p` into that
nested-Dual type ahead of time via a fresh `UJacobianWrapper`. The
widened `p` carries zero inner partials (correct — `p` does not depend
on `u`), and the fast path is a single type-stable dispatch returning
`f` unchanged so there is no overhead for non-nested calls.
Add a regression test exercising the precise call graph
(outer-Dual `p`, inner Rosenbrock Jacobian) that crashes pre-fix with a
`FirstAutodiffJacError(MethodError(Float64, (nested_dual,)))` and
succeeds post-fix.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4d3f0d6 to
d72f69b
Compare
|
Triaged the CI failures on the initial push. Breakdown: Caused by this PR (fixed in force-push):
Pre-existing, not caused by this PR:
Force-pushed |
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Under FullSpecialize — where no wrapping
happens and there is no reason to substitute a canonical tag — this
still replaced the user's AD backend with the pre-baked
`Tag{NonlinearSolveTag, Float64}` from the precompile workload, which
carries a stale `@generated tagcount` literal relative to tags generated
lazily at runtime for inner ForwardDiff layers. The cross-tag `p[i] *
u[i]` dispatch in the user body then inverts nesting and crashes
`setindex!(du, ...)` with `Float64(::nested_dual)` / MethodError, as
reported in SciML/OrdinaryDiffEq.jl#3381.
This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` dispatches (AutoForwardDiff and
AutoPolyesterForwardDiff) and from `maybe_wrap_nonlinear_f`. The tag is
now stamped if and only if the user function was actually wrapped.
Under FullSpecialize (or any path where wrapping was opted out), the AD
backend is returned unchanged, letting DifferentiationInterface generate
a fresh runtime tag from the function type — which orders correctly
against inner-solve tags generated later.
The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)`, so the stamped tag is parameterized by the actual
problem eltype rather than always Float64.
The existing `const dualT = Dual{Tag{NonlinearSolveTag, Float64}, Float64,
1}` precompile workload is kept as-is. It is only reached via the
AutoSpecialize-wrapped path, whose nested-AD failure mode is addressed on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type in SciML/OrdinaryDiffEq.jl#3389 — which sidesteps the
cross-tag multiplication entirely, so the precompile-baked tagcount
literal is no longer load-bearing for correctness and the TTFX win from
precompiling the real `NonlinearSolveTag` Dual operations is preserved.
Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize `NonlinearFunction`
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff dispatches). Confirmed to fail on unpatched
master (1/18 fail in the testset) and pass post-fix.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`standardize_forwarddiff_tag` previously stamped `Tag{NonlinearSolveTag,
Float64}` on any `AutoForwardDiff{CS, Nothing}` backend whose problem had
`Vector{Float64}` state, regardless of whether `prob.f.f` was actually
wrapped via `AutoSpecialize`. Under FullSpecialize — where no wrapping
happens and there is no reason to substitute a canonical tag — this
still replaced the user's AD backend with the pre-baked
`Tag{NonlinearSolveTag, Float64}` from the precompile workload, which
carries a stale `@generated tagcount` literal relative to tags generated
lazily at runtime for inner ForwardDiff layers. The cross-tag `p[i] *
u[i]` dispatch in the user body then inverts nesting and crashes
`setindex!(du, ...)` with `Float64(::nested_dual)` / MethodError, as
reported in SciML/OrdinaryDiffEq.jl#3381.
This commit removes the `isa Vector{Float64}` gates from both
`standardize_forwarddiff_tag` dispatches (AutoForwardDiff and
AutoPolyesterForwardDiff) and from `maybe_wrap_nonlinear_f`. The tag is
now stamped if and only if the user function was actually wrapped.
Under FullSpecialize (or any path where wrapping was opted out), the AD
backend is returned unchanged, letting DifferentiationInterface generate
a fresh runtime tag from the function type — which orders correctly
against inner-solve tags generated later.
The hardcoded `Float64` in `_wrapped_forwarddiff_ad()` is replaced with
`eltype(prob.u0)`, so the stamped tag is parameterized by the actual
problem eltype rather than always Float64.
The existing `const dualT = Dual{Tag{NonlinearSolveTag, Float64}, Float64,
1}` precompile workload is kept as-is. It is only reached via the
AutoSpecialize-wrapped path, whose nested-AD failure mode is addressed on
the inner-solver side by widening `UJacobianWrapper.p` to the nested
Dual type in SciML/OrdinaryDiffEq.jl#3389 — which sidesteps the
cross-tag multiplication entirely, so the precompile-baked tagcount
literal is no longer load-bearing for correctness and the TTFX win from
precompiling the real `NonlinearSolveTag` Dual operations is preserved.
Adds a regression test that verifies `standardize_forwarddiff_tag`
returns the backend unchanged for a FullSpecialize `NonlinearFunction`
with `Vector{Float64}` state (AutoForwardDiff and
AutoPolyesterForwardDiff dispatches). Confirmed to fail on unpatched
master (1/18 fail in the testset) and pass post-fix.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Vector{<:Dual}p(because we are inside an outer ForwardDiff layer — e.g. a NonlinearLeastSquares parameter fit), the inner Jacobian widensuinto a deeper nested-Dual type via itsJacobianConfig, but the storedpinUJacobianWrapperis still at the outer Dual level. The user body then multipliesp[i] * u[i]across two different Dual nesting levels, which dispatches through ForwardDiff'stagcount-based tag precedence. That precedence is a@generatedfunction whose literal value is baked at first compile, so its ordering depends on which package's precompile cache first instantiated each tag type — in this nested-AD scenario it inverts the nesting and produces a triple-nestedDualthat crashessetindex!(du, ...)withMethodError: no method matching Float64(::Dual{..., Dual{..., Float64, 2}, 2}).jacobian!now inspects the preparedForwardDiff.JacobianConfigto recover the inner xdual buffer's element type and liftsuf.pinto that nested-Dual type via a freshUJacobianWrapperbefore delegating toDI.jacobian!. The widenedpcarries zero inner partials (correct —pdoes not depend onu). The fast path (_widen_uf_p_for_jac(f, prep) = f) is a single type-stable dispatch so non-nested calls incur no overhead.pwith a throwaway tag type and solves a Rosenbrock23 ODE, exercising the exact call graph that crashes pre-fix and succeeds post-fix. The test is self-contained withinOrdinaryDiffEqDifferentiation's test deps (no NonlinearSolve needed).OrdinaryDiffEqDifferentiationpatch version 2.8.0 → 2.8.1.Root cause (abbreviated)
The user's MWE from #3381:
fails with
FirstAutodiffJacErrorwrappingMethodError: no method matching Float64(::ForwardDiff.Dual{Tag{OrdinaryDiffEqTag, Dual{NonlinearSolveTag, Float64, 2}}, Dual{NonlinearSolveTag, Float64, 2}, 2})The nested dual ends up being assigned into a vector whose eltype is only the outer single-level dual. Bisecting shows the broken
tagcountliteral order for the outerTag{NonlinearSolveTag, Float64}vs. the innerTag{OrdinaryDiffEqTag, Dual{NonlinearSolveTag, Float64, 2}}— the former is baked byNonlinearSolveBase's precompile workload at precompile time while the latter is generated lazily at runtime, which inverts≺for the cross-tagp[i]*u[i]multiplication insideode. That inversion is what this patch defends against by normalizingpon the inner solver's side so the cross-tag arithmetic never happens.The NonlinearSolveBase-side root cause (hardcoded
Tag{NonlinearSolveTag, Float64}stamped viaisa Vector{Float64}specialization) is addressed in a companion PR toSciML/NonlinearSolve.jl. This OrdinaryDiffEq PR is the robust bug-class fix: it protects any caller that handsUJacobianWrapperaVector{<:Dual}p, regardless of how that caller's tag was registered.Test plan
lib/OrdinaryDiffEqDifferentiation/test/nested_forwarddiff_tests.jl. Reproduces the crash pre-fix (FirstAutodiffJacErroron an unpatchedreproenv) and passes post-fix on the same env with just thederivative_wrappers.jlchange developed in.Autodiff Error Tests/Differentiation Trait Tests/No Jac Testsstill pass locally (no behavior change for single-level AD —_widen_uf_p_for_jacreturnsfunchanged in that path).OrdinaryDiffEqDifferentiationtest group.🤖 Generated with Claude Code