Skip to content

OptimizationAdjoint for constrained optimization solution sensitivities #1444

Draft
jClugstor wants to merge 19 commits into
SciML:masterfrom
jClugstor:full_optimization_adjoint
Draft

OptimizationAdjoint for constrained optimization solution sensitivities #1444
jClugstor wants to merge 19 commits into
SciML:masterfrom
jClugstor:full_optimization_adjoint

Conversation

@jClugstor
Copy link
Copy Markdown
Member

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Add any other context about the problem here.

@jClugstor
Copy link
Copy Markdown
Member Author

jClugstor commented May 19, 2026

Summary

Adds a new OptimizationAdjoint sensitivity algorithm that computes $dG/dp$ for a downstream loss G evaluated at the optimum $x^*(p)$ of a (possibly constrained) OptimizationProblem, without re-solving the optimization. Handles equality constraints, two-sided inequality constraints (lcons/ucons), and variable box bounds (lb/ub). Sits alongside the existing UnconstrainedOptimizationAdjoint.

opt_sol = solve(prob, NLopt.LD_SLSQP())                # solve once
dgdu!(out, u, _, _, _) = (out .= dG_dx)                # cotangent of loss w.r.t. x*
dp = adjoint_sensitivities(opt_sol, nothing;
                           sensealg = OptimizationAdjoint(),
                           dgdu = dgdu!)

@jClugstor
Copy link
Copy Markdown
Member Author

jClugstor commented May 19, 2026

The Math

A simple way to say this is that this is basically the same thing as SteadyStateAdjoint, but just doing the implicit differentiation on the Nonlinear system that is made up of the gradient of the lagrangian

$$ \mathcal{L}(x,y,z,p) = f(x,p) + y^\top g(x,p) + z^\top h(x,p) $$

and the constraint equations all set to zero.

First of all though, we do need all of the KKT variables/ dual variables / lagrange multipliers, the variables that are multiplied by the constraint functions in the lagrangian that satisfy the KKT conditions. We get the optimal $x^*$ from the optimization forward pass, but not every solver will return the dual variables, so we need to recover them from the stationarity condition:

$$ [J_g; J_{h_I}]^\top \mu = -\nabla f $$

This is typically overdetermined (more x's than active constraints), so we solve via LinearSolve.QRFactorization — least-squares, which under LICQ recovers the exact multipliers. LICQ says that at the optimal point, the gradients of the active constraints are all linearly independent.

QR decomposition finds the least squares solution, so it's finding

$$ \min ||J^\top \mu^* - (-\nabla f)||^2 $$

but if we found a viable local minimum, the KKT theorem states that $\mu$ exists , so the residual there will be zero. Plus, LICQ means that the Jacobian has full column rank, so the $\mu$ is unique. So a QR solve finds the unique solution.

If any returned z_I is negative (KKT violation), the corresponding constraint was wrongly classified as active; drop it and redo the multiplier solve. The sign-check + redo handles boundary degeneracy that proximity-based active-set detection alone gets wrong.

Now for the sensitivity, the idea is that a solution of the constrained optimization problem satisfies the Karush-Kuhn Tucker (KKT) conditions:

$$\nabla_x f(x,p) + \sum_{i=1}^{m_e} y_i \nabla_x g_i(x,p) + \sum_{i=1}^{m_i}z_i \nabla_x h_i(x,p) = 0$$

$$g(x,p) = 0, \quad h(x,p) \le 0, \quad z \ge 0, \quad z^\top h(x,p) = 0$$

Taking only the set of inequality constraints that are actually active, $h_I$, the KKT conditions define a nonlinear system of equations:

$$F(w, p) = \begin{pmatrix} \nabla \mathcal{L}(w, p) \ g(w,p) \ h_\mathcal{I}(w,p) \end{pmatrix} = 0$$ where $w$ is $$w := (x, y, z_\mathcal{I})$$

The solution of the optimization problem satisfies this nonlinear system of equations.
Differentiate by $p$, using implicit differentiation:

$$ \frac{\partial F}{\partial w}\frac{\partial w}{\partial{p}} + \frac{\partial F}{\partial p } = 0 $$

Our end goal is to find $\frac{\partial w}{\partial p}$ , which is the sensitivity of the solution of the optimization problem $w$, with respect to the parameters.

Using the above,

$$\frac{\partial w}{\partial p} = -\frac{\partial F}{\partial w}^{-1}\frac{\partial F}{\partial p}$$

Now consider that we have some scalar cost function that will be a function of the solution of the optimization problem, $C(w)$.

$$\frac{dC}{dp} = \frac{\partial C}{\partial w}\frac{\partial w}{\partial p } + \frac{\partial C}{\partial p} = \frac{\partial C}{\partial w}\left(- \frac{\partial F}{\partial w}^{-1} \frac{\partial F}{\partial p} \right) + \frac{\partial C}{\partial p}$$

$$\frac{dC}{dp} = \frac{\partial C}{\partial p} - \left(\frac{\partial C}{\partial w} \frac{\partial F}{\partial w}^{-1} \right)\frac{\partial F}{\partial p} $$

But this term $\frac{\partial C}{\partial w} \frac{\partial F}{\partial w}^{-1}$ can be represented as a linear solve,

$$\frac{\partial F}{\partial w} \lambda = \frac{\partial C}{\partial w}$$

so

$$\frac{dC}{dp} = \frac{\partial C}{\partial p} - \lambda \frac{\partial F}{\partial p}$$

that $\lambda \frac{\partial F}{\partial p}$ is what vecjacobian! does.

@jClugstor jClugstor force-pushed the full_optimization_adjoint branch from 908f0f0 to 9c6375c Compare May 20, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant