Fixing compilation times for AD#1297
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1297 +/- ##
==========================================
+ Coverage 88.90% 88.92% +0.02%
==========================================
Files 228 228
Lines 30030 30034 +4
==========================================
+ Hits 26698 26708 +10
+ Misses 3332 3326 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@JordiManyer This should be ready to go. Let me know if you want any changes |
|
Has anyone though that compilation times for AD being so high is maybe because of these tags? Isn't there a better way of doing things? Just to illustrate: By putting tags out of the inner functions and making them only once per call, we have reduced the multifield tests from 129 mins (master) down to 82 mins (in this PR). This is crazy, and we have basically only reduced the number of tags to one per AD call. Could we push this further? For instance: I don't see why Going even further: Could we fully infer the "AD level" from the code, and then create a predictable, reusable tag? Something like struct FDTag{L} end
FDTag(level::Integer) = FDTag{Val{level}}()If we can reliably predict the level, this means we would only compile things once and thats it. It would also mean that we could precompile the code up to a certain level of AD and the precompiled code would be useful, not like now. Does this make sense? @zjwegert @ConnorMallon |
| autodiff_array_gradient(a,i_to_x,tag) | ||
| end | ||
|
|
||
| function autodiff_array_gradient(a,i_to_x,tag::Function) |
There was a problem hiding this comment.
In all functions: Could we unify into a single signature? Something like
function default_tag(f,a)
tag = x -> f(a,x)
end
function autodiff_array_gradient(a,i_to_x,tag=default_tag(ForwardDiff.gradient,a))
...
endNote it is still an argument, so the call to this function stays the same. I would also not restrict the tag to be a function, it could be anything.
|
I was curious about these tags and investigated a bit. The "tag" is used as the first argument of The docstring mentions the output of |
|
Some notes after discussion with @JordiManyer:
|
|
@zjwegert I was thinking about this this weekend, and I think I found an elegant way to make this automatic. Could we add a single integer to We instantiate the original contribution when we evaluate the gradients right? That is, The only thing that we need to be careful of is Any comments? Do you see any issues with this? I think we can fully infer the level from this right? |
Hmm interesting idea. I think we can probably use a reference. I'll look into this -- hopefully this week! |
|
@JordiManyer @Antoinemarteau, I think this solution is on the right track. Let me know what you think and if there are any improvements that can be made! |
|
Minor: |
|
Some benchmarking:
|
|
Hi @zjwegert Ill probably change some stuff because I am a bit anal about things, but I would say the general gist is what I imagined. Curiously enough, the test times are all over the place (some up, some down, mostly similar) compared to the first attempt at the fix but I assume this might be because of github ci servers. I presume with precompilation we should really get a speedup now that we can mostly predict the tags. At least the goods new is that we dont seem to get confusion on nested AD calls, right? |
|
Sure! You have access to my fork so feel free to make any changes. I think this is probably because of the github servers. The tests that take a long time are generally better. Looking forward to the precompilation changes! Yes, these changes seem to handle tag confusion. I still need to test in GridapTopOpt |
Fixes and updates for #1296 as I don't have access to branch.