Add profiling/tracing module and speed improvments to correspondence function of optimizer#2409
Merged
Conversation
…s) instead of matrix multiply
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a very basic profile and trace capability to ShapeWorks. It is controlled with environment variables SW_TIME_PROFILE=1 and SW_TIME_TRACE=1. When profiling is on, a file,
profile.txtis written at the end of execution and displayed on the screen as well. It looks roughly like this:All times are in milliseconds.
Exclusive total time : Time spent in this timer and not any child timers/calls
Inclusive total time : Time spent in this timer and also any child timers/calls
#Calls : Number of times this timer/function is called
#Child : Number of child/sub-calls to other times/functions
Exclusive ms/call : Mean exclusive time per call
Inclusive ms/call : Mean inclusive time per call
The API is quite simple:
Or:
TIME_SCOPE uses RAII to stop the timer at the end of the scope.
Tracing produces a
trace.jsonfile at the end. This uses the google trace event format (https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview?tab=t.0#heading=h.yr4qxyxotyw) and can be viewed in tools such as Perfetto UI (https://ui.perfetto.dev/)Additionally, profiling of the correspondence function was performed and numerous improvements were made, primarily by converting VNL code to Eigen:
Before:
After:
This reduced the runtime by 46% (1 hour 25 minutes down to 46 minutes).
This was a particular fixed domains runtime with 2048 particles + normals, about 150-ish shapes. Similar speedups will not be seen at lower particle/shape counts and no improvement occurs for initialization/sampling.