following an initially-unrelated discussion with @andrewrosemberg
I think there might be room for (substantial 🤔🤞) performance improvements in the NonLinearProgram code.
Note: what I write below might apply to other classes of problems, I only checked the nonlinear code so far.
TLDR: I think we can replace {a lot of linear system solves} with {a single linear system solve} during forward/reverse diff
Medium-long explanation:
Forward/reverse diff compute Jacobian-vector products of the form J*v or J'*v, and the Jacobian J is of the form K\N, where K, N are matrices. Currently, to obtain w=J*v, we solve KJ = N (matrix RHS), then compute J*v. I believe we can instead compute w = Jv by solving Kw = (Nv), which requires a single linear system solve (with vector RHS).
Why it matters for the ACOPF example I work with, on a 300-bus case with loads as parameters, K has size 13k x 13k, and N is 13k x 200. Replacing K\N with K\(Nv) potentially yields a 200x reduction in memory and time.
Complete thought process and stack traversing.
I'm focusing on forward diff for the sake of example, same remarks apply to reverse diff
- Forward diff returns primal/dual sensitivities in a
ForwCache structure here:
|
model.forw_grad_cache = ForwCache(; |
|
primal_Δs = Dict(model.cache.primal_vars .=> primal_Δs), |
|
dual_Δs = dual_Δs, |
|
) |
- These are obtained from a left multiplication with Jacobian here
|
# Extract primal and dual sensitivities |
|
primal_Δs = Δs[1:length(model.cache.primal_vars), :] * Δp # Exclude slacks |
|
dual_Δs = Δs[cache.index_duals, :] * Δp # Includes constraints and bounds |
- Said Jacobian
Δs is computed by the _compute_sensitivity function. Stepping into that call, said jacobian is obtained from _compute_derivatives_no_relax
|
∂s, K, N = _compute_derivatives_no_relax( |
- which then points us to a linear system solve
|
∂s = zeros(size(M, 1), size(N, 2)) |
|
# ∂s = - (K \ N) # Sensitivity |
|
ldiv!(∂s, K, N) |
The alternative implementation I propose would cache the factorization of K (which seems to already be the case), and lazily compute Jacobian-vector products w=J*v by solving w = K\(Nv). The math would be similar (I think) for Jacobian-transpose-vector products
following an initially-unrelated discussion with @andrewrosemberg
I think there might be room for (substantial 🤔🤞) performance improvements in the
NonLinearProgramcode.Note: what I write below might apply to other classes of problems, I only checked the nonlinear code so far.
TLDR: I think we can replace {a lot of linear system solves} with {a single linear system solve} during forward/reverse diff
Medium-long explanation:
Forward/reverse diff compute Jacobian-vector products of the form
J*vorJ'*v, and the JacobianJis of the formK\N, whereK, Nare matrices. Currently, to obtainw=J*v, we solveKJ = N(matrix RHS), then computeJ*v. I believe we can instead computew = Jvby solvingKw = (Nv), which requires a single linear system solve (with vector RHS).Why it matters for the ACOPF example I work with, on a 300-bus case with loads as parameters,
Khas size 13k x 13k, andNis 13k x 200. ReplacingK\NwithK\(Nv)potentially yields a 200x reduction in memory and time.Complete thought process and stack traversing.
I'm focusing on forward diff for the sake of example, same remarks apply to reverse diff
ForwCachestructure here:DiffOpt.jl/src/NonLinearProgram/NonLinearProgram.jl
Lines 534 to 537 in df4a7dc
DiffOpt.jl/src/NonLinearProgram/NonLinearProgram.jl
Lines 530 to 532 in df4a7dc
Δsis computed by the_compute_sensitivityfunction. Stepping into that call, said jacobian is obtained from_compute_derivatives_no_relaxDiffOpt.jl/src/NonLinearProgram/nlp_utilities.jl
Line 457 in df4a7dc
DiffOpt.jl/src/NonLinearProgram/nlp_utilities.jl
Lines 425 to 427 in df4a7dc
The alternative implementation I propose would cache the factorization of
K(which seems to already be the case), and lazily compute Jacobian-vector productsw=J*vby solvingw = K\(Nv). The math would be similar (I think) for Jacobian-transpose-vector products