[tmva][sofie] Profiler code generation for SOFIE#19829
Conversation
|
Is there perhaps a way in which we could test this new nice feature? |
Test Results 22 files 22 suites 3d 10h 52m 8s ⏱️ Results for commit e82aefc. ♻️ This comment has been updated with latest results. |
|
Yes, I think we should add a tutorial for this, generting code instrumented with timers and then producing the results. |
|
Yes, thank you. I will look how to add some testing and will add a tutorial. |
1585356 to
32d38a2
Compare
a881cd3 to
f9ed58b
Compare
f9ed58b to
89635fa
Compare
guitargeek
left a comment
There was a problem hiding this comment.
I didn't look at the full diff yet, but this definitely requires a change in the RModel class number. Or - which I would prefer - remove IO capability from RModel altogether. What's the use case for that IO? If ROOT with SOFIE is installed to read a possible RModel from a file, you can also just recreate the model from ONNX on the fly. I think sticking to ONNX as the persistent model format is cleaner.
| size_t fConstantTensorSize = 0; // size (in Bytes) of the allocated constant tensors | ||
| size_t fWeightsTensorSize = 0; // size (in Bytes) of the allocated weight tensors | ||
| size_t fOtherTensorSize = 0; // size (in Bytes) of intermediate tensors which are not managed by the memory pool | ||
|
|
||
| std::string fProfilerGC = ""; |
There was a problem hiding this comment.
Adding a data member meas you have to increase the class version, otherwise you break backwards compatible IO.
But I'm happy that I have the opportunity to comment on this here! Because I think in general, supporting IO for RModel is not optimal. It bars us from changing the schema as we like, and I'm not sure what's the use case?
Furthermore, as SOFIE is still in the Experimental namespace, I think supporting persistent IO for these classes is in contradiction with the experimental namespace.
be27193 to
58b566e
Compare
- After the changes for CLAD the mlpf modek could not be parsed anymore. Handle now correctly the variable defining the number of non zero elements coming from Non_Zero - Fixes also TMVA::SOFIE::Copy for different types than float making it a template function - Add also output shape definition in generated code as it is done for the input
The casting to bool was incorrect since it was done a cast to uint8. Fix also the special case of NonZero dynamic parameter which is defindef by NonZero operator. Add at the end a Session data member for the parameter which is then used in creating the output vector Fix a bug introduced in softmax generated code in the generic case Fix the writing of the data in initializer lists for uint8_t types Add correctly new version in RModel.hxx (version 4)
- Fix Where for initialized and Shape tensors. New impelmentation was not taking into account the Shape tensors. This caused a failure to parse the ATLAS Gnn tracking model - Fix Slice for trivial copying. Use now std::copy since we cannot use alias tensor anymore after the change of using a free function with a const Session - Avoid printing tensor names in the comment of Softmax generated code. There is a issue in the function RModel::CollectTensorMemberNames used to get tensor members from Session. The problem if a tensor is gaving as name "tensor_X" and used as member "tensor_tensor_X" the function assume exists a tensor with name "X". This was causing teh Keras parser to crash. - Fix an issue writing the initialized data when are inf or NaN. Use the function from limits in this case
…pe tensor When output is a param shape tensor the tensor values were not assigned in initialization as in a constant tensor, they need to be set at run time in the infer function because they depend on the provided dynamic shale values Fix also a issue on Windows in the new COnvertValuesTOString implementation dealing with inf values. Create a specialisation for float or double which will handle the infinity values in numerical limits.
Fix some operators when input and/or output is a shape tensors Fix also in Gemm when broadcasting dynamic shape for the bias
When computing the last usage of tensors, the loop on the input tensors of the added operator was not performed correctly. The loop was stopping if an initialized tensor was an input
58b566e to
775985e
Compare
Remove check to include standard library header only if within a list of allowed ones. There is no need for that check and this was excluding addition of extra headers like chrono
Compute also the error on the average when printing results and sort them in decreasing order in time
Split generation of code of RModelProfiler in different funcitons to make it more indipendent of RModel and the rest of teh code generation. This was needed to take into account many changes in the code performed to make the Sofie generated code understantable by Clad for AD. Fix also a compiler warning due to the name of ROperator.
775985e to
e82aefc
Compare
This PR introduces a time-based profiler for the SOFIE inference engine. It provides developers with the tools to analyze the performance of generated C++ models by measuring the execution time of each individual operator, as well as the total inference time.
Changes or fixes:
A new option, SOFIE::Options::kProfile, is added to the RModel::Generate() method to enable the feature.
When enabled, a new RModelProfiler helper class instruments the generated doInfer() method with std::chrono timers.
Utility functions like PrintProfilingResults() and GetOpAvgTime() are added to the generated Session struct to access the collected timing data.
Checklist:
Tested changes locally by running inference in a loop and verifying the timing results.
updated the docs (if necessary)
This PR replaces #19558 by fixing the conflicts with rebasing to master