Skip to content

feat: AutoSpecialize norecompile infrastructure for NonlinearSolveBase#838

Merged
ChrisRackauckas merged 33 commits into
SciML:masterfrom
ChrisRackauckas-Claude:autospecialize
Apr 2, 2026
Merged

feat: AutoSpecialize norecompile infrastructure for NonlinearSolveBase#838
ChrisRackauckas merged 33 commits into
SciML:masterfrom
ChrisRackauckas-Claude:autospecialize

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

@ChrisRackauckas-Claude ChrisRackauckas-Claude commented Feb 19, 2026

Summary

Ports the FunctionWrappersWrappers-based norecompile infrastructure from DiffEqBase to NonlinearSolveBase, following the approach described in https://sciml.ai/news/2022/09/21/compile_time/.

This PR adds the infrastructure only — the wrappers, tag types, and extension methods needed for the norecompile/AutoSpecialize pattern. The automatic wrapping at solve-time is not yet activated because NonlinearSolve has several direct ForwardDiff.jacobian call sites (sensitivity analysis df/dp, df/du, bounds transforms) that bypass the DI-based Jacobian path and would receive duals with mismatched chunk sizes.

What's included

  • src/autospecialize.jl (new): NonlinearSolveTag, wrapfun_iip/wrapfun_oop stub methods, maybe_wrap_nonlinear_f, standardize_forwarddiff_tag fallback
  • ForwardDiff extension (NonlinearSolveBaseForwardDiffExt.jl): Dual-aware wrapfun_iip/wrapfun_oop dispatches with 6 type combinations each (Float64, Dual{NonlinearSolveTag}, NullParameters). Tag standardization that stamps NonlinearSolveTag on AutoForwardDiff and forces chunksize=1 when the function is wrapped via EvalFunc.
  • NonlinearSolveBase.jl: FunctionWrappers/FunctionWrappersWrappers imports, exports for the new public API
  • Project.toml: FunctionWrappers and FunctionWrappersWrappers dependencies

What's NOT included (deferred)

  • Automatic wrapping in get_concrete_problem / solve path — requires standardizing all direct ForwardDiff call sites first (nonlinearsolve_∂f_∂p, nonlinearsolve_∂f_∂u, bounds transform code)
  • Tag standardization at Jacobian construction or NLLS VJP generation — same reason

Design notes

The infrastructure mirrors DiffEqBase's pattern:

  • For standard problem types (Vector{Float64} state, Vector{Float64} or NullParameters parameters), maybe_wrap_nonlinear_f wraps the function in a FunctionWrappersWrapper with precompiled dual type signatures
  • standardize_forwarddiff_tag coordinates the ForwardDiff tag (NonlinearSolveTag) and chunk size (N=1) so duals match the wrapper signatures
  • Non-standard types (Float32, StaticArray, scalar) pass through unchanged

Next steps to activate

  1. Route all direct ForwardDiff.jacobian/derivative/gradient calls through DI or standardize their tag/chunksize
  2. Wire maybe_wrap_f into get_concrete_problem
  3. Add standardize_forwarddiff_tag before construct_concrete_adtype in jacobian.jl

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

CI Fix: ImmutableNonlinearProblem wrapping

The adjoint test in SimpleNonlinearSolve failed because @set! (Setfield) couldn't reconstruct ImmutableNonlinearProblem with the wrapped function type. The constructor doesn't accept EvalFunc{FunctionWrappersWrapper{...}}.

Fix: Skip FunctionWrapper wrapping for ImmutableNonlinearProblem. SimpleNonlinearSolve's lighter solvers don't benefit from the norecompile pathway anyway.

The other CI failures (runic, alloc_check, wrappers) are pre-existing/infrastructure issues:

  • runic: Formatting issue in lib/SCCNonlinearSolve/ (not our files)
  • alloc_check, wrappers: Unable to locate executable file: julia (CI runner issue)

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Reverted automatic wrapping - infrastructure only for now

The core and wrappers CI tests failed with No matching function wrapper was found! because the automatic wrapping at get_concrete_problem time applies the N=1 FunctionWrappers to ALL code paths, but several paths call ForwardDiff directly with default chunk sizes:

  1. nonlinearsolve_∂f_∂pForwardDiff.jacobian(f2, p) (direct call)
  2. nonlinearsolve_∂f_∂uForwardDiff.jacobian(...) (direct call)
  3. Bounds transform code wraps and applies ForwardDiff

These bypass standardize_forwarddiff_tag, so they use chunksize based on problem dimension (N=2 for 2-element vectors), which doesn't match the N=1 wrappers.

Current state: The AutoSpecialize infrastructure is fully implemented and ready:

  • autospecialize.jl: NonlinearSolveTag, maybe_wrap_nonlinear_f, wrapfun_iip/oop
  • ForwardDiff extension: Dual-aware wrappers, standardize_forwarddiff_tag
  • Jacobian cache: Tag standardization integrated

What's needed for automatic wrapping: All ForwardDiff call sites need to be coordinated to use chunksize=1 when the function is wrapped. This is analogous to how DiffEqBase/OrdinaryDiffEq controls all internal ForwardDiff calls.

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Update: AutoSpecialize wrapping now fully activated

This commit activates the wrapping infrastructure end-to-end. Key design decisions:

Architecture

  • AutoSpecializeCallable{FW} holds both the FunctionWrappersWrapper (precompiled dispatch) and the original function (orig::Any, type-erased for precompilation)
  • No EvalFunc/invokelatest — wrapping sets prob.f.f directly, avoiding the world-age indirection that would break IIP @inferred
  • Try-catch fallback in AutoSpecializeCallable.__call__: when FunctionWrappersWrapper throws NoFunctionWrapperFoundError (mismatched dual tags), falls back to the original function

What gets wrapped

  • IIP NonlinearProblem: wrapped (both Vector{Float64} state + params)
  • OOP NonlinearProblem: wrapped
  • IIP NonlinearLeastSquaresProblem: wrapped
  • OOP NonlinearLeastSquaresProblem: NOT wrapped (return type may differ from u0)
  • ImmutableNonlinearProblem: NOT wrapped

Tag standardization

  • Main autodiffstandardize_forwarddiff_tag stamps NonlinearSolveTag + forces chunksize=1 when wrapped
  • jvp_autodiff/vjp_autodiff → also standardized in construct_jacobian_cache before passing to JacobianOperator
  • AutoPolyesterForwardDiff → replaced with AutoForwardDiff{1, tag} when wrapped (no custom tag support)
  • Sensitivity analysis (∂f/∂u, ∂f/∂p) → use ForwardDiff.JacobianConfig with chunksize=1 + NonlinearSolveTag when wrapped

Known trade-off

  • OOP @inferred regresses — try-catch with orig::Any makes return type Any. Same trade-off as DiffEqBase. Updated test to @test_broken.

Local test results (all clean)

Suite Pass Broken Fail Error
Core 727 88 0 0
Wrapper 195 7 0 0
ForwardDiff 135,636 0 0 0

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Adjoint/Reverse-Mode AD Fix (commit 7abe63b)

The nopre CI jobs were failing with llvmcall must be compiled to be called because reverse-mode AD backends (Zygote, Mooncake, Enzyme) cannot differentiate through FunctionWrapper internals.

Root Cause

When solve_up is called inside an rrule, SciMLSensitivity._concrete_solve_adjoint internally calls solve, which goes through get_concrete_problemmaybe_wrap_f → wraps with FunctionWrapper again. Mooncake in particular compiles tangent rules for ALL types during the forward pass, so the FunctionWrapper type must never appear in the computation graph.

Fix

Two-pronged approach:

  1. ChainRulesCore.rrule for AutoSpecializeCallable: Redirects reverse-mode AD through f.orig (the unwrapped callable) instead of f.fw (the FunctionWrapper)
  2. _DISABLE_AUTOSPECIALIZE flag: Set to true in the solve_up rrule before calling _solve_adjoint. This prevents maybe_wrap_nonlinear_f from wrapping during the entire adjoint code path, ensuring FunctionWrapper types never enter the AD computation graph.

Test Results (local, Julia 1.10)

Suite Pass Broken Fail Error
Core 727 89 0 0
Wrapper 195 7 0 0
Adjoint 3 0 0 0
ForwardDiff 135,636 0 0 0

Also updated the IIP @inferred test to @test_broken (same wrapping-induced type inference regression as OOP).

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Fix: Extend Enzyme unwrap to JacobianOperator path

The previous push fixed Enzyme compatibility for the concrete Jacobian path (DI.prepare_jacobian/DI.jacobian), but CI still had 112 errors in NonlinearSolveFirstOrder and NonlinearSolveQuasiNewton (ubuntu 1.10/1.11) from the JacobianOperator path.

Root cause analysis from CI logs:
The EnzymeMutabilityException stacktrace showed:

FunctionWrappers.jl:137 → AutoSpecializeCallable → NonlinearFunction → 
SciMLJacobianOperators.jl:404 (DI.pushforward!) → JacobianOperator → 
mul! → Krylov.gmres!

When concrete_jac=false (used with Krylov solvers like KrylovJL_GMRES() and \ linsolve), construct_jacobian_cache creates a JacobianOperator by passing prob directly. The SciMLJacobianOperators package then calls DI.pushforward!/DI.pullback! with prob.f (still containing AutoSpecializeCallable), triggering Enzyme's mutability exception.

Fix: In construct_jacobian_cache's !needs_jac branch, check if either jvp_autodiff or vjp_autodiff uses Enzyme, and if so, create a modified prob with the unwrapped raw function via @set prob.f.f = get_raw_f(f.f) before passing to JacobianOperator.

Verified locally:

  • NewtonRaphson(linsolve=\, autodiff=AutoEnzyme(), concrete_jac=Val(false)) → Success ✓
  • NewtonRaphson(linsolve=KrylovJL_GMRES(), autodiff=AutoEnzyme(), concrete_jac=Val(false)) → Success ✓
  • TrustRegion(linsolve=KrylovJL_GMRES(), autodiff=AutoEnzyme()) → Success ✓

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Additional Enzyme fix: TrustRegion VecJac/JacVec operators

The previous commit fixed the JacobianOperator path in construct_jacobian_cache, but CI still showed Enzyme failures because the TrustRegion scheme creates its own VecJacOperator and JacVecOperator directly from the problem at trust_region.jl:227-230, bypassing construct_jacobian_cache entirely.

Root cause from CI log:

SciMLJacobianOperators.jl:335  # VJP closure with AutoSpecializeCallable
  ← trust_region.jl:233        # StatefulJacobianOperator * fu
  ← trust_region.jl:227        # VecJacOperator(prob, fu, u; autodiff = vjp_autodiff)

Fix: Added Enzyme unwrap check before creating VecJac/JacVec operators in trust_region.jl, using the same pattern as in jacobian.jl.

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Note: Enzyme workaround simplification

The Enzyme-specific workarounds in this PR (_uses_enzyme_ad, maybe_unwrap_prob_for_enzyme, and the _ad_prob unwrapping in __init methods) exist because Enzyme cannot differentiate through FunctionWrappers' llvmcall/ccall.

A companion PR has been opened at EnzymeAD/Enzyme.jl#2980 that adds an EnzymeFunctionWrappersExt extension. This extension defines EnzymeRules for FunctionWrapper that automatically extract the wrapped function and delegate to autodiff_deferred, making Enzyme transparently differentiate through FunctionWrappers.

Once that Enzyme PR is merged and released, the following code can be removed from this PR (~73 lines):

  • _uses_enzyme_ad() and maybe_unwrap_prob_for_enzyme() in autospecialize.jl
  • Two Enzyme unwrap blocks in jacobian.jl
  • _ad_prob construction in NonlinearSolveFirstOrder/src/solve.jl
  • _ad_autodiffs array + _ad_prob construction in NonlinearSolveQuasiNewton/src/solve.jl

All references to _ad_prob would revert back to prob.

@ChrisRackauckas
Copy link
Copy Markdown
Member

Waiting on the Enzyme PR

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Session State Summary (for continuation)

What was done

Enzyme.jl PR #2980EnzymeAD/Enzyme.jl#2980

  • Created EnzymeFunctionWrappersExt extension that defines EnzymeRules for FunctionWrapper
  • Forward mode rule: extracts fw.obj[] and delegates to Enzyme.autodiff_deferred
  • Reverse mode rules: augmented_primal + reverse that extract and differentiate the unwrapped function
  • Handles both IIP (FunctionWrapper{Nothing,...}) and OOP (FunctionWrapper{Ret,...}) patterns
  • All 8 tests pass locally (IIP/OOP × Forward/Reverse)
  • Files: ext/EnzymeFunctionWrappersExt.jl, test/ext/functionwrappers.jl, Project.toml edits
  • Branch: functionwrappers-ext on EnzymeAD/Enzyme.jl (pushed from ChrisRackauckas-Claude fork)
  • CI status: action_required — needs maintainer to approve workflow run
  • Local repo: /home/crackauc/sandbox/tmp_20260218_173108_96644/Enzyme.jl

NonlinearSolve.jl PR #838#838

  • Branch: autospecialize on SciML/NonlinearSolve.jl (pushed via ChrisRackauckas-Claude fork remote botfork)
  • Adds AutoSpecializeCallable / FunctionWrappersWrapper infrastructure for norecompile mode
  • Includes Enzyme-specific workarounds: _uses_enzyme_ad(), maybe_unwrap_prob_for_enzyme(), _ad_prob indirection in FirstOrder/QuasiNewton solvers
  • 105/105 test jobs passing on CI (3 "failures" are infrastructure: doc deploy, codecov, trim — not code)
  • Local repo: /home/crackauc/sandbox/tmp_20260218_173108_96644/NonlinearSolve.jl
  • Working tree: clean

Simplification attempt (reverted)

Attempted to remove ~73 lines of Enzyme workaround code from 4 files:

  1. lib/NonlinearSolveBase/src/autospecialize.jl — remove _uses_enzyme_ad() (3 methods) and maybe_unwrap_prob_for_enzyme()
  2. lib/NonlinearSolveBase/src/jacobian.jl — remove 2 Enzyme unwrap blocks
  3. lib/NonlinearSolveFirstOrder/src/solve.jl — remove _ad_prob construction
  4. lib/NonlinearSolveQuasiNewton/src/solve.jl — remove _ad_autodiffs/_ad_prob construction

Result: 116 EnzymeMutabilityException failures across ALL AD backends (not just Enzyme) because BackTracking{AutoEnzyme} line search uses Enzyme even when the main Jacobian uses ForwardDiff/FiniteDiff. Reverted all changes with git checkout --.

Dependency chain: Enzyme PR #2980 must be merged and released BEFORE these workarounds can be removed.

Key technical details

  • FunctionWrapper stores original function in fw.obj[] (a Ref)
  • Enzyme cannot differentiate through ccall/llvmcall used by FunctionWrappers — throws EnzymeMutabilityException
  • The workarounds detect Enzyme AD via _uses_enzyme_ad() and unwrap AutoSpecializeCallable back to the raw function
  • BackTracking{AutoEnzyme} is the default line search, so ALL solver configurations (even ForwardDiff Jacobians) hit the Enzyme code path
  • ForwardDiff-only tests pass fine without workarounds (ForwardDiff doesn't use ccall)

Remotes

NonlinearSolve.jl:

  • origin = SciML/NonlinearSolve.jl
  • botfork = ChrisRackauckas-Claude/NonlinearSolve.jl

Enzyme.jl:

  • origin = ChrisRackauckas-Claude/Enzyme.jl
  • upstream = EnzymeAD/Enzyme.jl

Next steps

  1. Wait for Enzyme PR #2980 to get CI approved and reviewed/merged
  2. Once merged and a new Enzyme.jl version is released, re-apply the simplification to NonlinearSolve PR feat: AutoSpecialize norecompile infrastructure for NonlinearSolveBase #838 (remove the ~73 lines from the 4 files listed above)
  3. Both PRs need maintainer review

Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Rebase and fix: Julia 1.10 precompilation failure

Rebased onto current master (59 commits behind). The rebase was clean — 1 commit dropped as already upstream (Runic formatting for bounds_transform.jl).

Bug found and fixed

After rebase, master commit 487c839 ("Add precompile workload for Dual and SubArray broadcast operations") introduced a struct NonlinearSolveTag end + const dualT = ... in the precompile workload section of NonlinearSolveBaseForwardDiffExt.jl. This conflicts with the NonlinearSolveTag imported from NonlinearSolveBase by this PR's autospecialize infrastructure.

On Julia 1.10 this causes a hard error: cannot assign a value to imported variable NonlinearSolveBaseForwardDiffExt.NonlinearSolveTag, breaking all ForwardDiff-based tests.

Fix: Removed the duplicate struct NonlinearSolveTag end and const dualT definitions from the precompile workload block, since both are already defined at the top of the extension module.

Local test results (Julia 1.10.11)

Suite Pass Broken Fail Error
Core 749 86 0 0
Bounds 44 0 0 0
Wrappers 195 7 0 0
nopre (ForwardDiff + Enzyme + Adjoint) 135,757 0 0 0

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Review changes applied

Addressed all review feedback:

  1. Removed <: Function from AutoSpecializeCallable — this was preventing specialization. Now dispatches directly through fw without a varargs fallback to orig.

  2. Removed all explicit fast-path dispatch methods — the Float64 and Dual-typed methods in both autospecialize.jl and NonlinearSolveBaseForwardDiffExt.jl only existed to avoid the <: Function varargs fallback. With <: Function gone, they're unnecessary (-47 lines from the extension).

  3. Removed redundant wrapfun_iip overloads — the Vector{Float64} and NullParameters overloads duplicated the generic typed method.

  4. Only wrap when AutoSpecialize is requestedFullSpecialize (the default) now correctly skips wrapping. Bounds test updated to verify: FullSpecialize preserves identity, AutoSpecialize with bounds errors (user should use FullSpecialize).

  5. Restored strict @inferred checks — no longer regressed since <: Function is removed.

  6. Fixed type stability in QuasiNewton — replaced _ad_autodiffs = Any[...] array + splat with direct varargs call to maybe_unwrap_prob_for_enzyme.

Net: -48 lines. All tests pass locally (Julia 1.10).

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

FunctionWrappersWrappers v1.0.0 has been registered (JuliaRegistries/General#151705). Bumped the compat bound in lib/NonlinearSolveBase/Project.toml to "0.1, 1" to allow the new version.

ChrisRackauckas and others added 12 commits March 30, 2026 16:54
…Base

Port the FunctionWrappersWrappers-based norecompile pattern from DiffEqBase
to NonlinearSolveBase. For standard problem types (Vector{Float64} state,
Vector{Float64} or NullParameters parameters), the problem function is
wrapped in a FunctionWrappersWrapper with precompiled type signatures for
both Float64 and ForwardDiff.Dual arguments, avoiding recompilation for
each unique user function type.

Key components:
- src/autospecialize.jl: NonlinearSolveTag, wrapfun_iip/oop base methods,
  maybe_wrap_nonlinear_f, standardize_forwarddiff_tag fallback
- ForwardDiff extension: dual-aware wrapfun dispatches with 6 type
  combinations (Float64, Dual, NullParameters), tag standardization that
  stamps NonlinearSolveTag on AutoForwardDiff and forces chunksize=1 when
  the function is wrapped
- solve.jl: maybe_wrap_f wired into get_concrete_problem for all problem
  types (NonlinearProblem, NonlinearLeastSquaresProblem,
  ImmutableNonlinearProblem), using EvalFunc wrapper for invokelatest
- jacobian.jl: standardize_forwarddiff_tag called in
  construct_jacobian_cache so DI produces correctly-tagged duals

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ImmutableNonlinearProblem (used by SimpleNonlinearSolve) doesn't support
Setfield reconstruction with wrapped function types. Skip wrapping since
SimpleNonlinearSolve's lighter solvers don't benefit from the norecompile
pathway.

Fixes CI adjoint test failure in SimpleNonlinearSolve.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The FunctionWrapper wrapping cannot be automatically applied at solve time
because multiple code paths (∂f/∂p, ∂f/∂u, bounds transform) call
ForwardDiff directly with default chunk sizes, bypassing the standardized
chunksize=1 path. This caused "No matching function wrapper found!" errors
whenever ForwardDiff used chunksize > 1.

The infrastructure (autospecialize.jl, extension wrappers, tag
standardization) remains available for targeted use. Automatic wrapping
requires coordinating ALL ForwardDiff call sites to use chunksize=1.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The standardize_forwarddiff_tag calls in autodiff.jl and jacobian.jl
cause dual tag ordering errors when nested ForwardDiff is used (e.g.,
NLLS sensitivity + inner VJP). Remove these call sites and the unused
maybe_wrap_f function since automatic wrapping is not yet active.

The autospecialize infrastructure (NonlinearSolveTag, wrapfun_iip/oop,
ForwardDiff extension wrappers) remains available for future activation
when all direct ForwardDiff call sites are standardized.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Wire `maybe_wrap_f` into `get_concrete_problem` for NonlinearProblem and
NonlinearLeastSquaresProblem (IIP). Functions are wrapped in
`AutoSpecializeCallable{FW}` which holds a `FunctionWrappersWrapper` for
precompiled dispatch and the original function (type-erased as `Any`) for
try-catch fallback when dual tags mismatch (JVP paths, external packages).

Key changes:
- AutoSpecializeCallable uses `orig::Any` for type erasure (no EvalFunc)
- Skip OOP NLLS wrapping (return type may differ from u0)
- Standardize JVP/VJP autodiff tags in construct_jacobian_cache
- Replace AutoPolyesterForwardDiff with AutoForwardDiff{1,tag} when wrapped
- Use get_raw_f for nested ForwardDiff in NLLS VJP generation
- ForwardDiff sensitivity functions use chunksize=1 + tag when wrapped

Tests: core 727/0/0, wrapper 195/0/0, ForwardDiff 135636/0/0
OOP @inferred regresses (expected, same trade-off as DiffEqBase)

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Reverse-mode AD backends (Zygote, Mooncake, Enzyme) cannot
differentiate through FunctionWrapper internals (llvmcall). This adds:

- ChainRulesCore rrule for AutoSpecializeCallable that redirects
  reverse-mode AD through the original unwrapped callable
- _DISABLE_AUTOSPECIALIZE flag set in the solve_up rrule to prevent
  wrapping entirely during the adjoint code path
- @test_broken for IIP @inferred (same wrapping-induced regression
  as OOP case)

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…type

- Remove task-local `_DISABLE_AUTOSPECIALIZE` flag entirely
- Replace with `@set prob.f.f = get_raw_f(prob.f.f)` unwrapping in rrule
- Remove parameter type restriction (any p works, mismatches fall back)
- Add idempotency check to prevent double-wrapping
- Remove `_DISABLE_AUTOSPECIALIZE` from public API

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
These are internal implementation details, not public API.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
OOP wrapping requires guessing return types which doesn't always work.
Only wrap IIP functions where the return type is always Nothing.

IIP TTFX improvement (2nd/3rd function, same types):
- NewtonRaphson: 2.2-2.5x faster
- TrustRegion: 18x faster

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The existing workload used scalar p=2.0, which produces different
FunctionWrapper types than the common user case of Vector{Float64}
parameters. This caused the precompiled wrappers to miss the user path.

TTFX for IIP Vector{Float64} first solve: 2.7s → 1.0s

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The try-catch in AutoSpecializeCallable prevented inlining and added
~32 bytes per call, exceeding the 64-byte @ballocated budget in
NonlinearSolveFirstOrder, QuasiNewton, and SpectralMethods tests.

Replace with explicit dispatch methods for known argument types
(Vector{Float64}, Float64, NullParameters, and ForwardDiff duals),
routing to f.fw for zero-allocation calls. Unsupported types fall
back to f.orig via vararg dispatch. Also fix @test_broken -> @test
for @inferred solve(prob) which now passes.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Enzyme cannot differentiate through FunctionWrappers' llvmcall, causing
EnzymeMutabilityException in all IIP Vector{Float64} tests with AutoEnzyme.
Unwrap the function in construct_jacobian_cache when the AD backend is
Enzyme-based (including AutoSparse(AutoEnzyme(...))), so DI sees the raw
user function. Also apply Runic formatting to SCCNonlinearSolve files.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ChrisRackauckas and others added 12 commits March 30, 2026 16:54
…d type stability

- Remove `<: Function` from `AutoSpecializeCallable` — it was preventing
  specialization. The struct now dispatches directly through `fw` (the
  FunctionWrappersWrapper) without a varargs fallback to `orig`.
- Remove all explicit fast-path dispatch methods (Float64 and Dual variants)
  from both autospecialize.jl and NonlinearSolveBaseForwardDiffExt.jl.
  These only existed to avoid the `<: Function` varargs fallback.
- Remove redundant `wrapfun_iip` overloads for Vector{Float64}/NullParameters
  that duplicated the generic typed method.
- Only wrap when `AutoSpecialize` specialization is requested (opt-in).
  `FullSpecialize` (the default) preserves the exact function type.
- Fix bounds test: verify FullSpecialize preserves identity, AutoSpecialize
  with bounds errors (user should use FullSpecialize).
- Restore strict `@inferred` checks in core_tests.jl (no longer regressed
  since `<: Function` is removed).
- Fix type stability in QuasiNewton: replace `Any[]` array + splat with
  direct varargs call to `maybe_unwrap_prob_for_enzyme`.

Local test results (Julia 1.10):
| Suite    | Pass    | Broken | Fail | Error |
|----------|---------|--------|------|-------|
| Core     | 748     | 86     | 0    | 0     |
| Bounds   | 45      | 0      | 0    | 0     |
| Wrappers | 195     | 7      | 0    | 0     |
| nopre    | 135,757 | 0      | 0    | 0     |

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…{Float64}

Exercises the FunctionWrapper + ForwardDiff dual code paths at precompile
time for both parameter types. When users call solve() with AutoSpecialize
and a specific algorithm (e.g., NewtonRaphson), the first solve drops from
~4s to ~0.8s because the solver infrastructure is already compiled.

The key benefit remains: each new function with AutoSpecialize costs ~0.4s
vs ~2.5s with FullSpecialize (84% reduction in per-function recompilation).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two changes:

1. Add NLLS default algorithm (FastShortcutNLLSPolyalg) to precompile
   workload, plus AutoSpecialize NLLS with Vector{Float64} params.
   Previously NLLS only precompiled explicit algorithms (GaussNewton,
   TrustRegion, LevenbergMarquardt), not the default `solve(prob)` path.

2. Fix NoFunctionWrapperFoundError in TrustRegion with Bastin scheme:
   The VecJac/JacVec operators created by Bastin received unstandardized
   vjp_autodiff/jvp_autodiff, so ForwardDiff used its default tag instead
   of NonlinearSolveTag. Now standardize_forwarddiff_tag is called before
   passing to InternalAPI.init for the trust region.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The explicit dispatch methods that used AutoSpecializeCallable were removed
in the review cleanup, but the import was left behind.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stants

These constants were leftovers from the removed explicit dispatch methods.
The generic wrapfun_iip method builds its own arglists from input types.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SciMLBase v2.153.0 makes DEFAULT_SPECIALIZATION (AutoSpecialize) the
default for all SciMLFunction constructors. Update accordingly:

- Bump SciMLBase compat to 2.153 across all sub-packages
- Simplify precompile workloads: use NonlinearFunction{true}(f!) instead
  of explicit AutoSpecialize annotation (it's now the default)
- Update bounds test: FullSpecialize is now the opt-out for bounds
  compatibility, default (AutoSpecialize) correctly errors with bounds
- Update comment in autospecialize.jl

Core test time improved from ~2m40s to ~1m56s thanks to reduced
recompilation when AutoSpecialize is active by default.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…loads

All precompile workloads now use the default specialization
(AutoSpecialize) instead of NoSpecialize, so the precompiled code
matches what users hit at runtime.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ialize

With AutoSpecialize as the default, the FunctionWrapper must handle
argument types not in the precompiled wrapper signatures (scalar params,
non-standard dual tags, bounds transforms, etc.). Use FunctionWrappersWrapper's
built-in fallback mode (Val{true}()) which calls the original function
when no matching wrapper is found, instead of throwing
NoFunctionWrapperFoundError.

This makes AutoSpecialize fully transparent — supported types get
zero-allocation precompiled dispatch, unsupported types fall back to
the original function with normal Julia specialization.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ensions

Instead of relying on FunctionWrappersWrapper fallback mode, explicitly
unwrap AutoSpecializeCallable in all code paths where ForwardDiff uses
non-standard tags:

1. ForwardDiff sensitivity path (nonlinearsolve_forwarddiff_solve):
   unwrap prob.f.f before computing ∂f/∂p and ∂f/∂u Jacobians, which
   use closure-based ForwardDiff tags incompatible with the wrappers.
   Removes the now-unnecessary _is_wrapped_nlf checks and _nls_tag
   config overrides.

2. External wrapper extensions (LeastSquaresOptim, NLsolve, NLSolvers,
   SIAM, FastLevenbergMarquardt, MINPACK, PETSc, FixedPointAcceleration,
   SpeedMapping, Sundials): unwrap at the top of __solve since these
   packages do their own AD with arbitrary tags.

3. Bounds transform: unwrap before wrapping in BoundedWrapper since the
   transform changes argument types.

4. Restrict wrapping to supported param types only (Vector{Float64},
   NullParameters) — scalar params skip wrapping entirely.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The switch from remake() to @set for unwrapping left unused remake
imports in 9 extension files.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FunctionWrappersWrappers v1.0.0 has been registered (JuliaRegistries/General#151705).
Update the lower bound to allow both 0.1 and 1.x series.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1 has a bug where sparsity detection can fail. v1.0 includes the fix,
so set the lower bound to 1.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChrisRackauckas and others added 2 commits March 30, 2026 18:30
DiffEqBase v6.213.0 (JuliaRegistries/General#151730) includes the
FunctionWrappersWrappers v1 compat fix needed for sparsity detection.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
OrdinaryDiffEqCore still requires FunctionWrappersWrappers 0.1, which
conflicts with NonlinearSolveBase requiring FWW v1. Move the shooting
test that uses OrdinaryDiffEqTsit5 to the downstream group and add it
via dynamic Pkg.add so it doesn't block resolution for other test groups.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
SciMLBase 2.154 is the first version that allows FunctionWrappersWrappers
v1. Without this bump, transitive deps (SciMLSensitivity -> DiffEqBase ->
OrdinaryDiffEqCore) could resolve to versions requiring FWW 0.1, causing
resolver conflicts in nopre/adjoint CI groups.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…gistry

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas ChrisRackauckas merged commit f47d870 into SciML:master Apr 2, 2026
116 of 128 checks passed
ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/ModelingToolkit.jl that referenced this pull request Apr 2, 2026
…27.0

FWW v1 did not toggle strict mode, so it is backwards-compatible.
This unblocks NonlinearSolve's AutoSpecialize support (SciML/NonlinearSolve.jl#838)
which requires FWW v1 via NonlinearSolveBase.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ChrisRackauckas added a commit that referenced this pull request Apr 4, 2026
PR #838 introduced AutoSpecialize infrastructure (maybe_unwrap_prob_for_enzyme,
AutoSpecializeCallable, etc.) in NonlinearSolveBase v2.20.0. The sub-packages
that use these functions had a lower bound of 2.2, which allowed the resolver
to pick incompatible older versions.

Fixes #893

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChrisRackauckas added a commit that referenced this pull request Apr 4, 2026
PR #838 introduced AutoSpecialize infrastructure (maybe_unwrap_prob_for_enzyme,
AutoSpecializeCallable, etc.) in NonlinearSolveBase v2.20.0. The sub-packages
that use these functions had a lower bound of 2.2, which allowed the resolver
to pick incompatible older versions.

Fixes #893

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants