Use native gradient API for ForwardDiff, Enzyme, Mooncake#458
Use native gradient API for ForwardDiff, Enzyme, Mooncake#458
Conversation
- Remove DifferentiationInterface from [deps]; add ADTypes - Move Enzyme to [weakdeps]; add BijectorsEnzymeExt extension - Add src/ad_utils.jl defining _value_and_gradient/_value_and_jacobian generic functions - Implement native backends in each pkg ext: ForwardDiff, ReverseDiff (compiled + non-compiled), Mooncake (reverse + forward JVP), Enzyme (reverse + forward) - Update src/vector/test_utils.jl to use ADTypes backend types and B._value_and_* API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Bijectors.jl documentation for PR #458 is available at: |
- Avoid double f(x) evaluation in gradient/jacobian for ForwardDiff and ReverseDiff by using DiffResults (GradientResult, JacobianResult) with the in-place ! variants - For Enzyme reverse mode, use autodiff(ReverseWithPrimal, ...) to get value and gradient in one pass instead of calling f(x) separately - Fix _enzyme_mode to guard against mode=nothing (AutoEnzyme() default) which previously threw a MethodError from set_runtime_activity(::Nothing) - Pre-allocate dy/dx tangent buffers outside loops in Mooncake implementations and use fill! to zero them, avoiding one heap allocation per iteration - Add fallback _value_and_gradient/_value_and_jacobian methods with a clear error message for backends without a loaded extension Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add AutoEnzyme{Nothing} to the ReverseWithPrimal dispatch union so the
default (mode=nothing) backend also avoids double-evaluating f
- Remove redundant `return` before `error(...)` in ad_utils.jl fallback
methods; error() returns Union{} so return is a no-op
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add return before error() in ad_utils.jl (JuliaFormatter) - Use ReverseDiff.DiffResults instead of ForwardDiff.DiffResults so the extension triggers on ReverseDiff alone - Keep Bijectors in test/Project.toml for B._value_and_jacobian calls Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b3577f0 to
669d4a8
Compare
gdalle
left a comment
There was a problem hiding this comment.
There seems to be a lot of duplication of efforts compared to DI, along with some forgotten aspects. All in all, I'm not sure what we gain here
| } | ||
|
|
||
| function _annotate_function(f, backend::AutoEnzyme, mode) | ||
| annotation = typeof(backend).parameters[2] |
There was a problem hiding this comment.
Accessing type parameters this way is not recommended since the field is internal (AFAICT)
| EnzymeCore.Duplicated,EnzymeCore.DuplicatedNoNeed,EnzymeCore.MixedDuplicated | ||
| } | ||
|
|
||
| function _annotate_function(f, backend::AutoEnzyme, mode) |
There was a problem hiding this comment.
This looks a lot like https://github.com/JuliaDiff/DifferentiationInterface.jl/blob/a5ecbe0b2bc97eaaac53a1c5c0b13c17f22f1ae9/DifferentiationInterface/ext/DifferentiationInterfaceEnzymeExt/utils.jl#L42-L62, so I'm not sure where we save engineering effort
| backend::Union{AutoEnzyme{Nothing},AutoEnzyme{<:EnzymeCore.ReverseMode}}, | ||
| x::AbstractVector, | ||
| ) | ||
| mode = if backend isa AutoEnzyme{Nothing} |
| for i in eachindex(x) | ||
| dx = zero(x) | ||
| dx[i] = one(eltype(x)) | ||
| directional, primal = Enzyme.autodiff(mode, annotated_f, Enzyme.Duplicated(x, dx)) | ||
| grad[i] = directional | ||
| if i == firstindex(x) | ||
| value = primal | ||
| end | ||
| end |
There was a problem hiding this comment.
Enzyme has a built-in forward-mode gradient function, which DI already uses in such cases. Any reason not to use it here too?
Ping @wsmoses
| for i in eachindex(x) | ||
| dx = zero(x) | ||
| dx[i] = one(eltype(x)) | ||
| directional, primal = Enzyme.autodiff(mode, annotated_f, Enzyme.Duplicated(x, dx)) | ||
| if i == firstindex(x) | ||
| value = primal isa AbstractArray ? copy(primal) : primal | ||
| J = Matrix{eltype(directional)}(undef, length(directional), length(x)) | ||
| end | ||
| J[:, i] .= directional | ||
| end |
There was a problem hiding this comment.
Enzyme has a built-in forward Jacobian function, which DI already uses in such cases. Any reason not to use it here too?
| if T === Nothing | ||
| ForwardDiff.checktag(config, f, x) | ||
| end | ||
| ForwardDiff.gradient!(result, f, x, config, Val(false)) |
| function _mooncake_zero_tangent_or_primal( | ||
| x, backend::Union{AutoMooncake,AutoMooncakeForward} | ||
| ) | ||
| if _mooncake_config(backend).friendly_tangents |
| return f(x), similar(x, 0) | ||
| end | ||
| tape = ReverseDiff.GradientTape(f, x) | ||
| compiled = ReverseDiff.compile(tape) |
There was a problem hiding this comment.
Is it really worth compiling a tape you will only use once? I predict this slows things down significantly
|
|
||
| function _value_and_jacobian(f, ::AutoReverseDiff{true}, x::AbstractVector) | ||
| tape = ReverseDiff.JacobianTape(f, x) | ||
| compiled = ReverseDiff.compile(tape) |
|
|
||
| Implementations are provided by package extensions for each AD backend. | ||
| """ | ||
| function _value_and_gradient(f, backend::ADTypes.AbstractADType, x::AbstractVector) |
Sister PR for TuringLang/DynamicPPL.jl#1354