feat: AutoSpecialize norecompile infrastructure for NonlinearSolveBase#838
Conversation
CI Fix: ImmutableNonlinearProblem wrappingThe adjoint test in SimpleNonlinearSolve failed because Fix: Skip FunctionWrapper wrapping for The other CI failures (runic, alloc_check, wrappers) are pre-existing/infrastructure issues:
|
Reverted automatic wrapping - infrastructure only for nowThe
These bypass Current state: The AutoSpecialize infrastructure is fully implemented and ready:
What's needed for automatic wrapping: All ForwardDiff call sites need to be coordinated to use chunksize=1 when the function is wrapped. This is analogous to how DiffEqBase/OrdinaryDiffEq controls all internal ForwardDiff calls. |
Update: AutoSpecialize wrapping now fully activatedThis commit activates the wrapping infrastructure end-to-end. Key design decisions: Architecture
What gets wrapped
Tag standardization
Known trade-off
Local test results (all clean)
|
Adjoint/Reverse-Mode AD Fix (commit 7abe63b)The Root CauseWhen FixTwo-pronged approach:
Test Results (local, Julia 1.10)
Also updated the IIP |
5f8115f to
b6917fb
Compare
Fix: Extend Enzyme unwrap to JacobianOperator pathThe previous push fixed Enzyme compatibility for the concrete Jacobian path ( Root cause analysis from CI logs: When Fix: In Verified locally:
|
Additional Enzyme fix: TrustRegion VecJac/JacVec operatorsThe previous commit fixed the Root cause from CI log: Fix: Added Enzyme unwrap check before creating VecJac/JacVec operators in |
Note: Enzyme workaround simplificationThe Enzyme-specific workarounds in this PR ( A companion PR has been opened at EnzymeAD/Enzyme.jl#2980 that adds an Once that Enzyme PR is merged and released, the following code can be removed from this PR (~73 lines):
All references to |
|
Waiting on the Enzyme PR |
Session State Summary (for continuation)What was doneEnzyme.jl PR #2980 — EnzymeAD/Enzyme.jl#2980
NonlinearSolve.jl PR #838 — #838
Simplification attempt (reverted)Attempted to remove ~73 lines of Enzyme workaround code from 4 files:
Result: 116 Dependency chain: Enzyme PR #2980 must be merged and released BEFORE these workarounds can be removed. Key technical details
RemotesNonlinearSolve.jl:
Enzyme.jl:
Next steps
Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com |
5796679 to
a7d0c8f
Compare
Rebase and fix: Julia 1.10 precompilation failureRebased onto current master (59 commits behind). The rebase was clean — 1 commit dropped as already upstream (Runic formatting for bounds_transform.jl). Bug found and fixedAfter rebase, master commit 487c839 ("Add precompile workload for Dual and SubArray broadcast operations") introduced a On Julia 1.10 this causes a hard error: Fix: Removed the duplicate Local test results (Julia 1.10.11)
|
Review changes appliedAddressed all review feedback:
Net: -48 lines. All tests pass locally (Julia 1.10). |
|
FunctionWrappersWrappers v1.0.0 has been registered (JuliaRegistries/General#151705). Bumped the compat bound in |
…Base
Port the FunctionWrappersWrappers-based norecompile pattern from DiffEqBase
to NonlinearSolveBase. For standard problem types (Vector{Float64} state,
Vector{Float64} or NullParameters parameters), the problem function is
wrapped in a FunctionWrappersWrapper with precompiled type signatures for
both Float64 and ForwardDiff.Dual arguments, avoiding recompilation for
each unique user function type.
Key components:
- src/autospecialize.jl: NonlinearSolveTag, wrapfun_iip/oop base methods,
maybe_wrap_nonlinear_f, standardize_forwarddiff_tag fallback
- ForwardDiff extension: dual-aware wrapfun dispatches with 6 type
combinations (Float64, Dual, NullParameters), tag standardization that
stamps NonlinearSolveTag on AutoForwardDiff and forces chunksize=1 when
the function is wrapped
- solve.jl: maybe_wrap_f wired into get_concrete_problem for all problem
types (NonlinearProblem, NonlinearLeastSquaresProblem,
ImmutableNonlinearProblem), using EvalFunc wrapper for invokelatest
- jacobian.jl: standardize_forwarddiff_tag called in
construct_jacobian_cache so DI produces correctly-tagged duals
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ImmutableNonlinearProblem (used by SimpleNonlinearSolve) doesn't support Setfield reconstruction with wrapped function types. Skip wrapping since SimpleNonlinearSolve's lighter solvers don't benefit from the norecompile pathway. Fixes CI adjoint test failure in SimpleNonlinearSolve. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The FunctionWrapper wrapping cannot be automatically applied at solve time because multiple code paths (∂f/∂p, ∂f/∂u, bounds transform) call ForwardDiff directly with default chunk sizes, bypassing the standardized chunksize=1 path. This caused "No matching function wrapper found!" errors whenever ForwardDiff used chunksize > 1. The infrastructure (autospecialize.jl, extension wrappers, tag standardization) remains available for targeted use. Automatic wrapping requires coordinating ALL ForwardDiff call sites to use chunksize=1. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The standardize_forwarddiff_tag calls in autodiff.jl and jacobian.jl cause dual tag ordering errors when nested ForwardDiff is used (e.g., NLLS sensitivity + inner VJP). Remove these call sites and the unused maybe_wrap_f function since automatic wrapping is not yet active. The autospecialize infrastructure (NonlinearSolveTag, wrapfun_iip/oop, ForwardDiff extension wrappers) remains available for future activation when all direct ForwardDiff call sites are standardized. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Wire `maybe_wrap_f` into `get_concrete_problem` for NonlinearProblem and
NonlinearLeastSquaresProblem (IIP). Functions are wrapped in
`AutoSpecializeCallable{FW}` which holds a `FunctionWrappersWrapper` for
precompiled dispatch and the original function (type-erased as `Any`) for
try-catch fallback when dual tags mismatch (JVP paths, external packages).
Key changes:
- AutoSpecializeCallable uses `orig::Any` for type erasure (no EvalFunc)
- Skip OOP NLLS wrapping (return type may differ from u0)
- Standardize JVP/VJP autodiff tags in construct_jacobian_cache
- Replace AutoPolyesterForwardDiff with AutoForwardDiff{1,tag} when wrapped
- Use get_raw_f for nested ForwardDiff in NLLS VJP generation
- ForwardDiff sensitivity functions use chunksize=1 + tag when wrapped
Tests: core 727/0/0, wrapper 195/0/0, ForwardDiff 135636/0/0
OOP @inferred regresses (expected, same trade-off as DiffEqBase)
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Reverse-mode AD backends (Zygote, Mooncake, Enzyme) cannot differentiate through FunctionWrapper internals (llvmcall). This adds: - ChainRulesCore rrule for AutoSpecializeCallable that redirects reverse-mode AD through the original unwrapped callable - _DISABLE_AUTOSPECIALIZE flag set in the solve_up rrule to prevent wrapping entirely during the adjoint code path - @test_broken for IIP @inferred (same wrapping-induced regression as OOP case) Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…type - Remove task-local `_DISABLE_AUTOSPECIALIZE` flag entirely - Replace with `@set prob.f.f = get_raw_f(prob.f.f)` unwrapping in rrule - Remove parameter type restriction (any p works, mismatches fall back) - Add idempotency check to prevent double-wrapping - Remove `_DISABLE_AUTOSPECIALIZE` from public API Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
These are internal implementation details, not public API. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
OOP wrapping requires guessing return types which doesn't always work. Only wrap IIP functions where the return type is always Nothing. IIP TTFX improvement (2nd/3rd function, same types): - NewtonRaphson: 2.2-2.5x faster - TrustRegion: 18x faster Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The existing workload used scalar p=2.0, which produces different
FunctionWrapper types than the common user case of Vector{Float64}
parameters. This caused the precompiled wrappers to miss the user path.
TTFX for IIP Vector{Float64} first solve: 2.7s → 1.0s
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The try-catch in AutoSpecializeCallable prevented inlining and added
~32 bytes per call, exceeding the 64-byte @ballocated budget in
NonlinearSolveFirstOrder, QuasiNewton, and SpectralMethods tests.
Replace with explicit dispatch methods for known argument types
(Vector{Float64}, Float64, NullParameters, and ForwardDiff duals),
routing to f.fw for zero-allocation calls. Unsupported types fall
back to f.orig via vararg dispatch. Also fix @test_broken -> @test
for @inferred solve(prob) which now passes.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Enzyme cannot differentiate through FunctionWrappers' llvmcall, causing
EnzymeMutabilityException in all IIP Vector{Float64} tests with AutoEnzyme.
Unwrap the function in construct_jacobian_cache when the AD backend is
Enzyme-based (including AutoSparse(AutoEnzyme(...))), so DI sees the raw
user function. Also apply Runic formatting to SCCNonlinearSolve files.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d type stability
- Remove `<: Function` from `AutoSpecializeCallable` — it was preventing
specialization. The struct now dispatches directly through `fw` (the
FunctionWrappersWrapper) without a varargs fallback to `orig`.
- Remove all explicit fast-path dispatch methods (Float64 and Dual variants)
from both autospecialize.jl and NonlinearSolveBaseForwardDiffExt.jl.
These only existed to avoid the `<: Function` varargs fallback.
- Remove redundant `wrapfun_iip` overloads for Vector{Float64}/NullParameters
that duplicated the generic typed method.
- Only wrap when `AutoSpecialize` specialization is requested (opt-in).
`FullSpecialize` (the default) preserves the exact function type.
- Fix bounds test: verify FullSpecialize preserves identity, AutoSpecialize
with bounds errors (user should use FullSpecialize).
- Restore strict `@inferred` checks in core_tests.jl (no longer regressed
since `<: Function` is removed).
- Fix type stability in QuasiNewton: replace `Any[]` array + splat with
direct varargs call to `maybe_unwrap_prob_for_enzyme`.
Local test results (Julia 1.10):
| Suite | Pass | Broken | Fail | Error |
|----------|---------|--------|------|-------|
| Core | 748 | 86 | 0 | 0 |
| Bounds | 45 | 0 | 0 | 0 |
| Wrappers | 195 | 7 | 0 | 0 |
| nopre | 135,757 | 0 | 0 | 0 |
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…{Float64}
Exercises the FunctionWrapper + ForwardDiff dual code paths at precompile
time for both parameter types. When users call solve() with AutoSpecialize
and a specific algorithm (e.g., NewtonRaphson), the first solve drops from
~4s to ~0.8s because the solver infrastructure is already compiled.
The key benefit remains: each new function with AutoSpecialize costs ~0.4s
vs ~2.5s with FullSpecialize (84% reduction in per-function recompilation).
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two changes:
1. Add NLLS default algorithm (FastShortcutNLLSPolyalg) to precompile
workload, plus AutoSpecialize NLLS with Vector{Float64} params.
Previously NLLS only precompiled explicit algorithms (GaussNewton,
TrustRegion, LevenbergMarquardt), not the default `solve(prob)` path.
2. Fix NoFunctionWrapperFoundError in TrustRegion with Bastin scheme:
The VecJac/JacVec operators created by Bastin received unstandardized
vjp_autodiff/jvp_autodiff, so ForwardDiff used its default tag instead
of NonlinearSolveTag. Now standardize_forwarddiff_tag is called before
passing to InternalAPI.init for the trust region.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The explicit dispatch methods that used AutoSpecializeCallable were removed in the review cleanup, but the import was left behind. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stants These constants were leftovers from the removed explicit dispatch methods. The generic wrapfun_iip method builds its own arglists from input types. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SciMLBase v2.153.0 makes DEFAULT_SPECIALIZATION (AutoSpecialize) the
default for all SciMLFunction constructors. Update accordingly:
- Bump SciMLBase compat to 2.153 across all sub-packages
- Simplify precompile workloads: use NonlinearFunction{true}(f!) instead
of explicit AutoSpecialize annotation (it's now the default)
- Update bounds test: FullSpecialize is now the opt-out for bounds
compatibility, default (AutoSpecialize) correctly errors with bounds
- Update comment in autospecialize.jl
Core test time improved from ~2m40s to ~1m56s thanks to reduced
recompilation when AutoSpecialize is active by default.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…loads All precompile workloads now use the default specialization (AutoSpecialize) instead of NoSpecialize, so the precompiled code matches what users hit at runtime. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ialize
With AutoSpecialize as the default, the FunctionWrapper must handle
argument types not in the precompiled wrapper signatures (scalar params,
non-standard dual tags, bounds transforms, etc.). Use FunctionWrappersWrapper's
built-in fallback mode (Val{true}()) which calls the original function
when no matching wrapper is found, instead of throwing
NoFunctionWrapperFoundError.
This makes AutoSpecialize fully transparent — supported types get
zero-allocation precompiled dispatch, unsupported types fall back to
the original function with normal Julia specialization.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ensions
Instead of relying on FunctionWrappersWrapper fallback mode, explicitly
unwrap AutoSpecializeCallable in all code paths where ForwardDiff uses
non-standard tags:
1. ForwardDiff sensitivity path (nonlinearsolve_forwarddiff_solve):
unwrap prob.f.f before computing ∂f/∂p and ∂f/∂u Jacobians, which
use closure-based ForwardDiff tags incompatible with the wrappers.
Removes the now-unnecessary _is_wrapped_nlf checks and _nls_tag
config overrides.
2. External wrapper extensions (LeastSquaresOptim, NLsolve, NLSolvers,
SIAM, FastLevenbergMarquardt, MINPACK, PETSc, FixedPointAcceleration,
SpeedMapping, Sundials): unwrap at the top of __solve since these
packages do their own AD with arbitrary tags.
3. Bounds transform: unwrap before wrapping in BoundedWrapper since the
transform changes argument types.
4. Restrict wrapping to supported param types only (Vector{Float64},
NullParameters) — scalar params skip wrapping entirely.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The switch from remake() to @set for unwrapping left unused remake imports in 9 extension files. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FunctionWrappersWrappers v1.0.0 has been registered (JuliaRegistries/General#151705). Update the lower bound to allow both 0.1 and 1.x series. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1 has a bug where sparsity detection can fail. v1.0 includes the fix, so set the lower bound to 1. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1da1c90 to
b198cfc
Compare
DiffEqBase v6.213.0 (JuliaRegistries/General#151730) includes the FunctionWrappersWrappers v1 compat fix needed for sparsity detection. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
OrdinaryDiffEqCore still requires FunctionWrappersWrappers 0.1, which conflicts with NonlinearSolveBase requiring FWW v1. Move the shooting test that uses OrdinaryDiffEqTsit5 to the downstream group and add it via dynamic Pkg.add so it doesn't block resolution for other test groups. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
SciMLBase 2.154 is the first version that allows FunctionWrappersWrappers v1. Without this bump, transitive deps (SciMLSensitivity -> DiffEqBase -> OrdinaryDiffEqCore) could resolve to versions requiring FWW 0.1, causing resolver conflicts in nopre/adjoint CI groups. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…gistry Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…27.0 FWW v1 did not toggle strict mode, so it is backwards-compatible. This unblocks NonlinearSolve's AutoSpecialize support (SciML/NonlinearSolve.jl#838) which requires FWW v1 via NonlinearSolveBase. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
PR #838 introduced AutoSpecialize infrastructure (maybe_unwrap_prob_for_enzyme, AutoSpecializeCallable, etc.) in NonlinearSolveBase v2.20.0. The sub-packages that use these functions had a lower bound of 2.2, which allowed the resolver to pick incompatible older versions. Fixes #893 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #838 introduced AutoSpecialize infrastructure (maybe_unwrap_prob_for_enzyme, AutoSpecializeCallable, etc.) in NonlinearSolveBase v2.20.0. The sub-packages that use these functions had a lower bound of 2.2, which allowed the resolver to pick incompatible older versions. Fixes #893 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Ports the FunctionWrappersWrappers-based norecompile infrastructure from DiffEqBase to NonlinearSolveBase, following the approach described in https://sciml.ai/news/2022/09/21/compile_time/.
This PR adds the infrastructure only — the wrappers, tag types, and extension methods needed for the norecompile/AutoSpecialize pattern. The automatic wrapping at solve-time is not yet activated because NonlinearSolve has several direct
ForwardDiff.jacobiancall sites (sensitivity analysis df/dp, df/du, bounds transforms) that bypass the DI-based Jacobian path and would receive duals with mismatched chunk sizes.What's included
src/autospecialize.jl(new):NonlinearSolveTag,wrapfun_iip/wrapfun_oopstub methods,maybe_wrap_nonlinear_f,standardize_forwarddiff_tagfallbackNonlinearSolveBaseForwardDiffExt.jl): Dual-awarewrapfun_iip/wrapfun_oopdispatches with 6 type combinations each (Float64,Dual{NonlinearSolveTag}, NullParameters). Tag standardization that stampsNonlinearSolveTagonAutoForwardDiffand forceschunksize=1when the function is wrapped viaEvalFunc.NonlinearSolveBase.jl: FunctionWrappers/FunctionWrappersWrappers imports, exports for the new public APIProject.toml: FunctionWrappers and FunctionWrappersWrappers dependenciesWhat's NOT included (deferred)
get_concrete_problem/ solve path — requires standardizing all direct ForwardDiff call sites first (nonlinearsolve_∂f_∂p,nonlinearsolve_∂f_∂u, bounds transform code)Design notes
The infrastructure mirrors DiffEqBase's pattern:
Vector{Float64}state,Vector{Float64}orNullParametersparameters),maybe_wrap_nonlinear_fwraps the function in aFunctionWrappersWrapperwith precompiled dual type signaturesstandardize_forwarddiff_tagcoordinates the ForwardDiff tag (NonlinearSolveTag) and chunk size (N=1) so duals match the wrapper signaturesNext steps to activate
ForwardDiff.jacobian/derivative/gradientcalls through DI or standardize their tag/chunksizemaybe_wrap_fintoget_concrete_problemstandardize_forwarddiff_tagbeforeconstruct_concrete_adtypein jacobian.jl