Skip to content

Ai perf improvement / Fix perf regressions from key support#581

Open
jmachowinski wants to merge 5 commits into
ros2:rollingfrom
cellumation:ai_perf_improvement
Open

Ai perf improvement / Fix perf regressions from key support#581
jmachowinski wants to merge 5 commits into
ros2:rollingfrom
cellumation:ai_perf_improvement

Conversation

@jmachowinski
Copy link
Copy Markdown
Contributor

@jmachowinski jmachowinski commented Apr 13, 2026

Description

This is a vibe coded rewrite of the serialization, de serialization and size computation code.

This implementation is depending on the msg 2-3x faster in serialization and de serialization,
and about 10x faster in size computation. Note that the size computation must be done before
serialization, and was taking ~50% of the cpu time in the old implementation during publish.

The bottom line here is there more nested and complex the msg the faster the new implementation.
For 'bulk types' like images there is almost no difference.

With the performance roundtrip benchmarks, using marker arrays, I can see a CPU time reduction
from ~60% to ~16%. The time it takes to call publish and CPU time usage is now close to the one of
fastRTPS.

---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
BM_OldImpl_SizeBound/Bool                    1.78 ns         1.78 ns    336018102
BM_NewImpl_SizeBound/Bool                    1.42 ns         1.42 ns    489091256
BM_OldImpl_SizeBound/Int32                   1.78 ns         1.78 ns    392841783
BM_NewImpl_SizeBound/Int32                   1.42 ns         1.42 ns    485654865
BM_OldImpl_SizeBound/String                  1.79 ns         1.79 ns    392450706
BM_NewImpl_SizeBound/String                  1.43 ns         1.43 ns    491473704
BM_OldImpl_SizeBound/Header                  1.79 ns         1.79 ns    391639186
BM_NewImpl_SizeBound/Header                  1.42 ns         1.42 ns    467252090
BM_OldImpl_SizeBound/Float64MultiArray       1.97 ns         1.96 ns    356346563
BM_NewImpl_SizeBound/Float64MultiArray       1.43 ns         1.43 ns    490866785
BM_OldImpl_SizeBound/Marker                  1.79 ns         1.79 ns    392071872
BM_NewImpl_SizeBound/Marker                  1.43 ns         1.43 ns    491146113
BM_OldImpl_SizeBound/MarkerArray             1.97 ns         1.97 ns    356228339
BM_NewImpl_SizeBound/MarkerArray             1.43 ns         1.43 ns    491258794
BM_OldImpl_SizeBound/PointCloud2             1.97 ns         1.97 ns    356165057
BM_NewImpl_SizeBound/PointCloud2             1.43 ns         1.43 ns    490669743
BM_OldImpl_SizeBound/LaserScan               1.79 ns         1.79 ns    391888883
BM_NewImpl_SizeBound/LaserScan               1.43 ns         1.43 ns    486648872
BM_OldImpl_SizeBound/Image                   1.79 ns         1.79 ns    392130868
BM_NewImpl_SizeBound/Image                   1.42 ns         1.42 ns    479932678
BM_OldImpl_SerializedSize                    9650 ns         9650 ns        70004
BM_NewImpl_SerializedSize                     716 ns          716 ns       979024
BM_OldImpl_SerializedSizeEstimate            9767 ns         9766 ns        70222
BM_NewImpl_SerializedSizeEstimate             757 ns          757 ns       891830
BM_OldImpl_Serialize                        12769 ns        12768 ns        56815
BM_NewImpl_Serialize                         5194 ns         5194 ns       134505
BM_NewImpl_Deserialize                       5259 ns         5259 ns       134461
BM_OldImpl_Deserialize                      12643 ns        12643 ns        54640
BM_Marker_Serialize_Old                       539 ns          539 ns      1320461
BM_Marker_Serialize                           205 ns          205 ns      3423652
BM_Marker_Deserialize_Old                     562 ns          562 ns      1279175
BM_Marker_Deserialize                         199 ns          199 ns      3507109
BM_TFMessage_Dynamic_Serialize_Old            167 ns          167 ns      4327852
BM_TFMessage_Dynamic_Serialize               63.8 ns         63.8 ns     11137447
BM_TFMessage_Dynamic_Deserialize_Old          165 ns          165 ns      4240901
BM_TFMessage_Dynamic_Deserialize             63.9 ns         63.9 ns     10920823
BM_TFMessage_Static_Serialize_Old            2443 ns         2442 ns       290161
BM_TFMessage_Static_Serialize                1033 ns         1033 ns       678791
BM_TFMessage_Static_Deserialize_Old          2434 ns         2434 ns       268255
BM_TFMessage_Static_Deserialize               993 ns          993 ns       707200
BM_PointCloud2_SerializedSize_Old             438 ns          438 ns      1612023
BM_PointCloud2_SerializedSize                43.9 ns         43.9 ns     16023671
BM_PointCloud2_Serialize_Old               447257 ns       447221 ns         1572 bytes_per_second=20.4718Gi/s
BM_PointCloud2_Serialize                   454261 ns       454244 ns         1538 bytes_per_second=20.1553Gi/s
BM_PointCloud2_Deserialize_Old             437504 ns       437466 ns         1602 bytes_per_second=20.9283Gi/s
BM_PointCloud2_Deserialize                 436420 ns       436368 ns         1607 bytes_per_second=20.9809Gi/s
BM_LaserScan_SerializedSize_Old               147 ns          147 ns      4866572
BM_LaserScan_SerializedSize                  12.8 ns         12.8 ns     55122595
BM_LaserScan_Serialize_Old                    221 ns          221 ns      3072972 bytes_per_second=36.6873Gi/s
BM_LaserScan_Serialize                       94.7 ns         94.7 ns      7369829 bytes_per_second=85.5507Gi/s
BM_LaserScan_Deserialize_Old                  228 ns          228 ns      3195329 bytes_per_second=35.5899Gi/s
BM_LaserScan_Deserialize                     93.3 ns         93.3 ns      7508956 bytes_per_second=86.8809Gi/s
BM_Image_SerializedSize_Old                   135 ns          135 ns      5128440
BM_Image_SerializedSize                      10.3 ns         10.3 ns     67626332
BM_Image_Serialize_Old                     116602 ns       116586 ns         6012 bytes_per_second=22.0864Gi/s
BM_Image_Serialize                         116059 ns       116050 ns         6016 bytes_per_second=22.1884Gi/s
BM_Image_Deserialize_Old                   115698 ns       115693 ns         5991 bytes_per_second=22.257Gi/s
BM_Image_Deserialize                       115590 ns       115584 ns         6031 bytes_per_second=22.2779Gi/s

Fixes # (issue)

Performance regressions introduced with the key support patches.

Is this user-facing behavior change?

No

Did you use Generative AI?

Yes, this is completely vibe coded.

Additional Information

All tests in test_communication are passing.

As to why this is actually faster:

  • The recent changes for the key support combined the type support for serialization and de serialization, making
    the types that must be loaded while traversing the data bigger. I was suspecting cache line issues, as some experiments showed, that the speed was sensitive to the size of the types of type system.
  • Therefore I went with a 'Array of Structures (AoS) and Structure of Arrays (SoA)' approach and told the AI to split the code in this way.

Note, there are still 2 places in the rmw_node implementation were the old type system is used.

I must say I am a bit torn about this. I don't fully trust AI written code, and I must admit this is a code bomb drop.
On the other hand the benchmark result speak for themself.

I wonder how we should go forward with this ?

  • Merge as is ?
  • Merge, but as a new cyclone flavor ? "rmw_cyclonedds_cpp_ai_slop_fast"

@eboasson @mjcarroll Opinions ?

@jmachowinski jmachowinski changed the title Ai perf improvement Ai perf improvement / Fix perf regressions from key support Apr 13, 2026
@@ -0,0 +1,54 @@
// Copyright 2019 Rover Robotics via Dan Rose
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this copyright generated by AI? where the heck is this come from? AI is just referring to some other source and copy it here???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants