Skip to content

Add profiling/tracing module and speed improvments to correspondence function of optimizer#2409

Merged
akenmorris merged 15 commits into
masterfrom
profiler
Aug 16, 2025
Merged

Add profiling/tracing module and speed improvments to correspondence function of optimizer#2409
akenmorris merged 15 commits into
masterfrom
profiler

Conversation

@akenmorris

@akenmorris akenmorris commented Aug 16, 2025

Copy link
Copy Markdown
Contributor
  • Add basic profiling and tracing infrastructure
  • Speed improvements for correspondence function of optimizer

This PR adds a very basic profile and trace capability to ShapeWorks. It is controlled with environment variables SW_TIME_PROFILE=1 and SW_TIME_TRACE=1. When profiling is on, a file, profile.txt is written at the end of execution and displayed on the screen as well. It looks roughly like this:

THREAD 0:
------------------------------------------------------------------------------------------------------
   %Time    Exclusive    Inclusive   #Calls   #Child    Exclusive    Inclusive Name
             total ms     total ms                        ms/call      ms/call
------------------------------------------------------------------------------------------------------
   100.0           96      2785689        1        4           96      2785689 shapeworks
    96.4         1046      2684521        1        1         1046      2684521 Optimize::RunOptimize
    96.3            4      2683476        1     5000            4      2683476 GradientDescentOptimizer
    96.3         5283      2683472     5000    10000            1          537 optimizer_iteration
    91.3           21      2542943     5000     5000            0          509 gradient_before_iteration
    91.3       743136      2542922     5000    25000          149          509 CorrespondenceFunction::ComputeUpdates
    49.0      1363987      1363987     5000        0          273          273 correspondence::covariance_multiply
    10.4       290396       290396     5000        0           58           58 correspondence::gramMat
     4.9       135245       135245     5000        0           27           27 parallel_sampling
     3.5        97351        97351        1        0        97351        97351 Optimize::SetParameters
     3.4        95425        95425     5000        0           19           19 correspondence::lhs_rhs
     1.4        40078        40078     5000        0            8            8 correspondence::svd
     0.4         9899         9899     5000        0            2            2 correspondence::pinvMat
     0.1         2216         2216        1        0         2216         2216 set_up_optimize
     0.1         1504         1504        1        0         1504         1504 Optimize::Initialize
------------------------------------------------------------------------------------------------------

All times are in milliseconds.

Exclusive total time : Time spent in this timer and not any child timers/calls
Inclusive total time : Time spent in this timer and also any child timers/calls
#Calls : Number of times this timer/function is called
#Child : Number of child/sub-calls to other times/functions
Exclusive ms/call : Mean exclusive time per call
Inclusive ms/call : Mean inclusive time per call

The API is quite simple:

    TIME_START("correspondence::gramMat");
     ...
    TIME_STOP("correspondence::gramMat");

Or:

{
TIME_SCOPE("CorrespondenceFunction::ComputeUpdates");
...
}

TIME_SCOPE uses RAII to stop the timer at the end of the scope.

Tracing produces a trace.json file at the end. This uses the google trace event format (https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview?tab=t.0#heading=h.yr4qxyxotyw) and can be viewed in tools such as Perfetto UI (https://ui.perfetto.dev/)

image

Additionally, profiling of the correspondence function was performed and numerous improvements were made, primarily by converting VNL code to Eigen:

Before:

--------------------------------------------------------------------------------------
   %Time    Exclusive    Inclusive    #Call   #Subrs    Inclusive  Name
                 msec   total msec                        ms/call
--------------------------------------------------------------------------------------
   100.0           95      5115882        1        4      5115882 shapeworks
    98.0         1025      5014928        1        1      5014928 Optimize::RunOptimize
    98.0            4      5013903        1     5000      5013903 GradientDescentOptimizer
    98.0         6766      5013899     5000    10000         1003 optimizer_iteration
    95.2           22      4871999     5000     5000          974 gradient_before_iteration
    95.2      1495283      4871977     5000    10000          974 CorrespondenceFunction::ComputeUpdates
    39.9      2040669      2040669     5000        0          408 correspondence::gramMat
    26.1      1336025      1336025     5000        0          267 correspondence::covariance_multiply
     2.6       135134       135134     5000        0           27 parallel_sampling
     1.9        97173        97173        1        0        97173 Optimize::SetParameters
     0.0         2215         2215        1        0         2215 set_up_optimize
     0.0         1471         1471        1        0         1471 Optimize::Initialize
--------------------------------------------------------------------------------------

After:

------------------------------------------------------------------------------------------------------
   %Time    Exclusive    Inclusive   #Calls   #Child    Exclusive    Inclusive Name
             total ms     total ms                        ms/call      ms/call
------------------------------------------------------------------------------------------------------
   100.0           96      2785689        1        4           96      2785689 shapeworks
    96.4         1046      2684521        1        1         1046      2684521 Optimize::RunOptimize
    96.3            4      2683476        1     5000            4      2683476 GradientDescentOptimizer
    96.3         5283      2683472     5000    10000            1          537 optimizer_iteration
    91.3           21      2542943     5000     5000            0          509 gradient_before_iteration
    91.3       743136      2542922     5000    25000          149          509 CorrespondenceFunction::ComputeUpdates
    49.0      1363987      1363987     5000        0          273          273 correspondence::covariance_multiply
    10.4       290396       290396     5000        0           58           58 correspondence::gramMat
     4.9       135245       135245     5000        0           27           27 parallel_sampling
     3.5        97351        97351        1        0        97351        97351 Optimize::SetParameters
     3.4        95425        95425     5000        0           19           19 correspondence::lhs_rhs
     1.4        40078        40078     5000        0            8            8 correspondence::svd
     0.4         9899         9899     5000        0            2            2 correspondence::pinvMat
     0.1         2216         2216        1        0         2216         2216 set_up_optimize
     0.1         1504         1504        1        0         1504         1504 Optimize::Initialize
------------------------------------------------------------------------------------------------------

This reduced the runtime by 46% (1 hour 25 minutes down to 46 minutes).

This was a particular fixed domains runtime with 2048 particles + normals, about 150-ish shapes. Similar speedups will not be seen at lower particle/shape counts and no improvement occurs for initialization/sampling.

@akenmorris akenmorris merged commit 39fc1b4 into master Aug 16, 2025
5 checks passed
@akenmorris akenmorris deleted the profiler branch August 16, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant