perf(mavlink): replace forward_message mutex with atomic reader counter by dakejahl · Pull Request #27076 · PX4/PX4-Autopilot

dakejahl · 2026-04-13T17:13:45Z

Summary

Replace the global mavlink_module_mutex lock in forward_message() with a lightweight atomic reader counter (mavlink_forward_msg_active), removing per-message lock overhead from the forwarding hot path
Deletion paths (stop_command, start_helper) null the array entry under the existing mutex, then spin-wait for the counter to drain before freeing the instance
Fixes the use-after-free race in stop_command() identified in perf(mavlink): remove mutex from forward_message hot path #26927, where an instance could be freed while another receiver thread's forward_message() still held a pointer to it

Use a lightweight atomic reader counter instead of the global mavlink_module_mutex in forward_message(). Deletion paths null the array entry then spin-wait for the counter to drain before freeing. Fixes a use-after-free race in stop_command() from #26927 where an instance could be freed while another receiver thread held a pointer. Signed-off-by: Jacob Dahl <dahl.jakejacob@gmail.com>

Copilot

Pull request overview

Reworks MAVLink instance forwarding to remove mavlink_module_mutex from the forward_message() hot path by introducing an atomic “in-flight forwarders” counter, and updates instance teardown paths to wait for in-flight forwards before freeing instances (aiming to fix the stop/delete use-after-free race from #26927).

Changes:

Add global atomic mavlink_forward_msg_active counter for tracking active forward_message() iterations.
Remove global mutex from Mavlink::forward_message() and use the counter instead.
Update start_helper() and stop_command() teardown to null out the instance slot under mavlink_module_mutex, then wait for the counter to drain before delete.

Comments suppressed due to low confidence (1)

src/modules/mavlink/mavlink_main.cpp:524

mavlink_module_instances is a plain Mavlink* array, but forward_message() now iterates over it without mavlink_module_mutex while other threads can concurrently write nullptr into the array under the mutex (eg stop_command() / start_helper()). This is a C++ data race/UB regardless of pointer-size atomicity on the CPU. To make this safe, make the array entries atomic (eg px4::atomic<Mavlink*> with .load()/.store()), or otherwise ensure readers and writers synchronize via the same mechanism.

	mavlink_forward_msg_active.fetch_add(1);

	for (Mavlink *inst : mavlink_module_instances) {
		if (inst && (inst != self) && (inst->get_forwarding_on())) {
			// Pass message only if target component was seen before

Copilot · 2026-04-13T17:18:37Z

+			// Wait for any in-flight forward_message() to finish before freeing
+			while (mavlink_forward_msg_active.load() > 0) {
+				px4_usleep(100);
 			}


The global reader counter is waited on in deletion paths, but nothing prevents new forward_message() calls from other still-running MAVLink instances from incrementing mavlink_forward_msg_active while this loop is waiting. With continuous traffic this can starve and potentially block thread cleanup indefinitely. Consider adding a writer gate (eg a global "forwarding blocked" flag checked before incrementing the counter), or switch to a per-instance refcount so you only wait for readers that could still hold a pointer to the instance being deleted; also consider adding a bounded timeout/error path to avoid infinite waits.

Copilot · 2026-04-13T17:18:37Z

+				// Wait for any in-flight forward_message() to finish before freeing
+				while (mavlink_forward_msg_active.load() > 0) {
+					px4_usleep(100);
+				}


Same starvation risk as in start_helper(): this waits for the global mavlink_forward_msg_active to reach 0 while other MAVLink receiver threads can continue calling forward_message() and keep the counter non-zero. This can make mavlink stop hang under sustained traffic. Use a mechanism that prevents new readers from entering while waiting (writer gate), or a per-instance reader count that drains once the instance is removed from mavlink_module_instances, and consider a timeout to avoid blocking indefinitely.

Copilot · 2026-04-13T17:18:37Z

@@ -523,6 +527,8 @@ Mavlink::forward_message(const mavlink_message_t *msg, Mavlink *self)
 			}
 		}
 	}
+
+	mavlink_forward_msg_active.fetch_sub(1);


mavlink_forward_msg_active.fetch_add(1) / fetch_sub(1) is manually paired; if future edits add early returns or error paths inside the loop, it's easy to miss the decrement and wedge the stop/start wait loops. Consider using an RAII scope guard so the counter is decremented on all exits from forward_message().

github-actions · 2026-04-13T17:19:33Z

🔎 FLASH Analysis

px4_fmu-v5x [Total VM Diff: 152 byte (0.01 %)]

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0%     +88  +0.0%     +88    .text
    [NEW]     +60  [NEW]     +60    CSWTCH.1972
    [NEW]     +44  [NEW]     +44    CSWTCH.2769
     +15%     +32   +15%     +32    Mavlink::forward_message()
     +21%     +28   +21%     +28    Mavlink::start_helper()
    +4.9%     +20  +4.9%     +20    Mavlink::stop_command()
    [NEW]     +14  [NEW]     +14    CSWTCH.3532
    [NEW]     +14  [NEW]     +14    CSWTCH.3533
    +4.8%      +4  +4.8%      +4    McAutotuneAttitudeControl::custom_command()
    +0.2%      +4  +0.2%      +4    _GLOBAL__sub_I_mavlink_system
    [DEL]     -14  [DEL]     -14    CSWTCH.3530
    [DEL]     -14  [DEL]     -14    CSWTCH.3531
    [DEL]     -44  [DEL]     -44    CSWTCH.2767
    [DEL]     -60  [DEL]     -60    CSWTCH.1970
  [ = ]       0  +0.1%     +64    .bss
    [ = ]       0  [NEW]     +68    [section .bss]
    [ = ]       0  [NEW]      +4    mavlink_forward_msg_active
    [ = ]       0   +11%      +4    read_ok
    [ = ]       0 -75.0%     -12    g_dma_perf
  +0.0%     +55  [ = ]       0    .debug_abbrev
  -0.0%     -12  [ = ]       0    .debug_frame
  +0.0%     +90  [ = ]       0    .debug_info
  +0.0%     +71  [ = ]       0    .debug_line
  -0.0%     -14  [ = ]       0    .debug_loclists
  +0.0%     +43  [ = ]       0    .debug_rnglists
  +0.0%     +27  [ = ]       0    .debug_str
  +0.0%     +32  [ = ]       0    .strtab
    [DEL]     -12  [ = ]       0    CSWTCH.1970
    [NEW]     +12  [ = ]       0    CSWTCH.1972
    [DEL]     -12  [ = ]       0    CSWTCH.2767
    [NEW]     +12  [ = ]       0    CSWTCH.2769
    [DEL]     -24  [ = ]       0    CSWTCH.3530
    [DEL]     -24  [ = ]       0    CSWTCH.3531
    [NEW]     +24  [ = ]       0    CSWTCH.3532
    [NEW]     +24  [ = ]       0    CSWTCH.3533
     +36%     +16  [ = ]       0    ___ZL19param_get_cplusplustPf.isra.0_veneer
   -18.0%     -16  [ = ]       0    ___ZN7sensors22VehicleAngularVelocity21FilterAngularVelocityEiPfi_veneer
    [NEW]     +32  [ = ]       0    mavlink_forward_msg_active
  +0.0%     +48  [ = ]       0    .symtab
    [DEL]     -32  [ = ]       0    CSWTCH.1970
    [NEW]     +32  [ = ]       0    CSWTCH.1972
    [DEL]     -32  [ = ]       0    CSWTCH.2767
    [NEW]     +32  [ = ]       0    CSWTCH.2769
    [DEL]     -48  [ = ]       0    CSWTCH.3530
    [DEL]     -48  [ = ]       0    CSWTCH.3531
    [NEW]     +48  [ = ]       0    CSWTCH.3532
    [NEW]     +48  [ = ]       0    CSWTCH.3533
     +20%     +16  [ = ]       0    ManualControlSelector::isInputValid()
   -50.0%     -32  [ = ]       0    McAutotuneAttitudeControl::custom_command()
     +33%     +32  [ = ]       0    _GLOBAL__sub_I_mavlink_system
     +17%     +32  [ = ]       0    ___ZL19param_get_cplusplustPf.isra.0_veneer
   -40.0%     -32  [ = ]       0    ___ZN7sensors22VehicleAngularVelocity21FilterAngularVelocityEiPfi_veneer
    [NEW]     +32  [ = ]       0    mavlink_forward_msg_active
  -1.1%     -88  [ = ]       0    [Unmapped]
  +0.0%    +340  +0.0%    +152    TOTAL

px4_fmu-v6x [Total VM Diff: 88 byte (0 %)]

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0%     +88  +0.0%     +88    .text
    [NEW]     +60  [NEW]     +60    CSWTCH.1972
    [NEW]     +44  [NEW]     +44    CSWTCH.2769
     +15%     +32   +15%     +32    Mavlink::forward_message()
     +21%     +28   +21%     +28    Mavlink::start_helper()
    +4.9%     +20  +4.9%     +20    Mavlink::stop_command()
    [NEW]     +14  [NEW]     +14    CSWTCH.3532
    [NEW]     +14  [NEW]     +14    CSWTCH.3533
    +4.8%      +4  +4.8%      +4    McAutotuneAttitudeControl::custom_command()
    +0.2%      +4  +0.2%      +4    _GLOBAL__sub_I_mavlink_system
    [DEL]     -14  [DEL]     -14    CSWTCH.3530
    [DEL]     -14  [DEL]     -14    CSWTCH.3531
    [DEL]     -44  [DEL]     -44    CSWTCH.2767
    [DEL]     -60  [DEL]     -60    CSWTCH.1970
  +0.0%     +55  [ = ]       0    .debug_abbrev
  -0.0%     -12  [ = ]       0    .debug_frame
  +0.0%     +90  [ = ]       0    .debug_info
  +0.0%     +71  [ = ]       0    .debug_line
  -0.0%     -14  [ = ]       0    .debug_loclists
  +0.0%     +43  [ = ]       0    .debug_rnglists
  +0.0%     +27  [ = ]       0    .debug_str
  +0.0%     +32  [ = ]       0    .strtab
    [DEL]     -12  [ = ]       0    CSWTCH.1970
    [NEW]     +12  [ = ]       0    CSWTCH.1972
    [DEL]     -12  [ = ]       0    CSWTCH.2767
    [NEW]     +12  [ = ]       0    CSWTCH.2769
    [DEL]     -24  [ = ]       0    CSWTCH.3530
    [DEL]     -24  [ = ]       0    CSWTCH.3531
    [NEW]     +24  [ = ]       0    CSWTCH.3532
    [NEW]     +24  [ = ]       0    CSWTCH.3533
    [NEW]     +32  [ = ]       0    mavlink_forward_msg_active
  +0.0%     +48  [ = ]       0    .symtab
    [DEL]     -32  [ = ]       0    CSWTCH.1970
    [NEW]     +32  [ = ]       0    CSWTCH.1972
    [DEL]     -32  [ = ]       0    CSWTCH.2767
    [NEW]     +32  [ = ]       0    CSWTCH.2769
    [DEL]     -48  [ = ]       0    CSWTCH.3530
    [DEL]     -48  [ = ]       0    CSWTCH.3531
    [NEW]     +48  [ = ]       0    CSWTCH.3532
    [NEW]     +48  [ = ]       0    CSWTCH.3533
     +20%     +16  [ = ]       0    MavlinkReceiver::handle_messages_in_gimbal_mode()
   -50.0%     -32  [ = ]       0    McAutotuneAttitudeControl::custom_command()
     +33%     +32  [ = ]       0    _GLOBAL__sub_I_mavlink_system
    [NEW]     +32  [ = ]       0    mavlink_forward_msg_active
  -2.1%     -88  [ = ]       0    [Unmapped]
  +0.0%    +340  +0.0%     +88    TOTAL

Updated: 2026-04-13T17:19:12

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-14T01:19:49Z

+				// Wait for any in-flight forward_message() to finish before freeing
+				while (mavlink_forward_msg_active.load() > 0) {
+					px4_usleep(100);
+				}


This wait loop has no timeout and can block mavlink stop indefinitely if mavlink_forward_msg_active never reaches 0 (eg continuous forwarding load across other instances). Please add a bounded wait with a clear error path, and consider waiting only for references to this specific inst (per-instance refcount / hazard-pointer style) rather than a global counter.

Copilot · 2026-04-14T01:19:49Z

@@ -523,6 +527,8 @@ Mavlink::forward_message(const mavlink_message_t *msg, Mavlink *self)
 			}
 		}
 	}


forward_message() now iterates over mavlink_module_instances without holding mavlink_module_mutex, while other code paths (eg stop_command() / start_helper()) still write mavlink_module_instances[...]=nullptr under that mutex. Because the array elements are plain Mavlink*, this introduces a C++ data race (UB) between unsynchronized reads and writes even if the pointer write is naturally atomic on the CPU. Consider switching mavlink_module_instances to an atomic pointer array (eg px4::atomic<Mavlink*>) and using atomic load/store (or atomic exchange on removal), or otherwise ensure readers and writers use the same synchronization primitive.

onur-ozkan · 2026-04-14T05:33:56Z

+			while (mavlink_forward_msg_active.load() > 0) {
+				px4_usleep(100);


It seems like we are replacing the mutex with a spinlock which in my experience isn't a great performance trade-off. Do you have any benchmark results?

Also the complexity has increased significantly and one can easily cause critical bugs when working in this module. The syncing has to be done carefully and it's all tied to manual checks and loops.

I haven't bench marked it, and yeah I'm not thrilled with the read counter and spin lock... maybe we can resolve this without needing that or the mutex. Need to think about it more.

The issue is the mutex is in the hot path, whereas the spin lock is only on stop() which is rare

onur-ozkan · 2026-04-14T05:37:10Z

Fixes the use-after-free race in stop_command() identified in #26927, where an instance could be freed while another receiver thread's forward_message() still held a pointer to it

In my opinion, this is an architectural issue and needs broader re-design. Do we need to fix that urgently or can it wait for the mavlink re-design work?

dakejahl · 2026-04-14T05:39:57Z

@onur-ozkan It can wait 👍

dakejahl mentioned this pull request Apr 13, 2026

perf(mavlink): remove mutex from forward_message hot path #26927

Closed

dakejahl requested a review from Copilot April 13, 2026 17:14

Copilot started reviewing on behalf of dakejahl April 13, 2026 17:15 View session

dakejahl requested a review from mrpollo April 13, 2026 17:16

Copilot AI reviewed Apr 13, 2026

View reviewed changes

dakejahl requested review from Copilot and julianoes April 14, 2026 01:16

Copilot started reviewing on behalf of dakejahl April 14, 2026 01:16 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

onur-ozkan reviewed Apr 14, 2026

View reviewed changes

dakejahl closed this Apr 14, 2026

		while (mavlink_forward_msg_active.load() > 0) {
		px4_usleep(100);

Conversation

dakejahl commented Apr 13, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 13, 2026

🔎 FLASH Analysis

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

onur-ozkan Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

dakejahl Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

dakejahl Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

onur-ozkan commented Apr 14, 2026

Uh oh!

dakejahl commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants