Use native gradient API for ForwardDiff, Enzyme, Mooncake by yebai · Pull Request #1354 · TuringLang/DynamicPPL.jl

yebai · 2026-04-12T18:24:43Z

Calls ForwardDiff, Enzyme, and Mooncake API directly, as DI has robustness
issues with both Enzyme and Mooncake.

This PR improves the performance of all benchmarking cases.

To support backend-specific prep and evaluation, _prepare_gradient and
_value_and_gradient are extracted into overridable dispatch methods.
The ADP type parameter on LogDensityFunction is unconstrained
accordingly, since AutoMooncakeForward's prep is a NamedTuple rather
than a GradientPrep. A tangent_type(LogDensityAt) = NoTangent
declaration tells Mooncake to treat the function object as a constant.

- Refactor `_prepare_gradient` and `_value_and_gradient` into overridable dispatch methods so backends can bypass DI entirely - Implement AutoMooncakeForward in the Mooncake extension using Mooncake's native derivative cache and a column-by-column sweep - Force `friendly_tangents=false` in `_cache_config` for both AutoMooncake and AutoMooncakeForward to keep caches valid across calls - Declare `tangent_type(LogDensityAt) = NoTangent` so Mooncake treats the function object as a constant - Relax ADP type parameter on LogDensityFunction from `Union{Nothing,DI.GradientPrep}` to unconstrained, to accommodate custom prep objects (e.g. the NamedTuple used by AutoMooncakeForward) - Add AutoMooncakeForward to the precompile workload and test suite, including a test that friendly_tangents=true config is handled correctly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions · 2026-04-12T18:41:18Z

Benchmark Report

this PR's head: 8be907f517400a2443ec93bcbf27e0faac19813b
base branch: c636aaaa564ddf38de2823ea869542edfa4d8af2

Computer Information

Julia Version 1.11.9
Commit 53a02c0720c (2026-02-06 00:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 9V74 80-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬────────┬───────────────────────────────┬────────────────────────────┬─────────────────────────────────┐
│                       │       │             │        │       t(eval) / t(ref)        │     t(grad) / t(eval)      │        t(grad) / t(ref)         │
│                       │       │             │        │ ─────────┬──────────┬──────── │ ───────┬─────────┬──────── │ ──────────┬───────────┬──────── │
│                 Model │   Dim │  AD Backend │ Linked │     base │  this PR │ speedup │   base │ this PR │ speedup │      base │   this PR │ speedup │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│               Dynamic │    10 │    mooncake │   true │   298.22 │   252.08 │    1.18 │   7.47 │    5.98 │    1.25 │   2226.41 │   1508.19 │    1.48 │
│                   LDA │    12 │ reversediff │   true │  2653.14 │  2006.77 │    1.32 │   2.01 │    2.08 │    0.97 │   5341.90 │   4170.91 │    1.28 │
│   Loop univariate 10k │ 10000 │    mooncake │   true │ 30480.32 │ 27449.35 │    1.11 │   7.43 │    6.78 │    1.09 │ 226363.47 │ 186231.48 │    1.22 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│    Loop univariate 1k │  1000 │    mooncake │   true │  4175.06 │  3612.79 │    1.16 │   5.49 │    5.73 │    0.96 │  22937.72 │  20684.20 │    1.11 │
│      Multivariate 10k │ 10000 │    mooncake │   true │ 35785.74 │ 28379.58 │    1.26 │   8.81 │    8.62 │    1.02 │ 315398.16 │ 244673.66 │    1.29 │
│       Multivariate 1k │  1000 │    mooncake │   true │  3971.96 │  3532.87 │    1.12 │   8.48 │    7.33 │    1.16 │  33689.69 │  25907.11 │    1.30 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│ Simple assume observe │     1 │ forwarddiff │  false │     0.93 │     5.67 │    0.16 │  11.25 │    1.36 │    8.26 │     10.44 │      7.72 │    1.35 │
│           Smorgasbord │   201 │ forwarddiff │  false │   964.64 │   786.01 │    1.23 │  84.86 │   77.19 │    1.10 │  81856.53 │  60673.14 │    1.35 │
│           Smorgasbord │   201 │      enzyme │   true │  1392.77 │  1058.73 │    1.32 │   4.40 │    4.51 │    0.98 │   6128.58 │   4772.68 │    1.28 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │   true │  1342.58 │  1077.43 │    1.25 │  70.09 │   77.68 │    0.90 │  94097.19 │  83693.33 │    1.12 │
│           Smorgasbord │   201 │    mooncake │   true │  1778.52 │  1073.46 │    1.66 │   4.00 │    4.45 │    0.90 │   7113.71 │   4772.68 │    1.49 │
│           Smorgasbord │   201 │ reversediff │   true │  1328.12 │  1087.35 │    1.22 │ 127.27 │  117.72 │    1.08 │ 169031.73 │ 128001.64 │    1.32 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│              Submodel │     1 │    mooncake │   true │     0.92 │     2.47 │    0.37 │  29.26 │    8.56 │    3.42 │     27.04 │     21.13 │    1.28 │
└───────────────────────┴───────┴─────────────┴────────┴──────────┴──────────┴─────────┴────────┴─────────┴─────────┴───────────┴───────────┴─────────┘

github-actions · 2026-04-12T18:49:41Z

DynamicPPL.jl documentation for PR #1354 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1354/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gdalle · 2026-04-12T18:52:03Z

@yebai can you explain the rationale behind getting rid of DifferentiationInterface?

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

codecov · 2026-04-12T19:03:00Z

Codecov Report

❌ Patch coverage is 67.12329% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.30%. Comparing base (c636aaa) to head (8be907f).

Files with missing lines	Patch %	Lines
ext/DynamicPPLEnzymeExt.jl	0.00%	20 Missing ⚠️
ext/DynamicPPLMooncakeExt.jl	93.33%	2 Missing ⚠️
src/logdensityfunction.jl	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1354      +/-   ##
==========================================
- Coverage   78.62%   78.30%   -0.33%     
==========================================
  Files          50       52       +2     
  Lines        3631     3697      +66     
==========================================
+ Hits         2855     2895      +40     
- Misses        776      802      +26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter v1.0.62