Hello everyone,
I face an issue for a long time now, that can be explained probably by a use after free in phosphor-software-manager source code.
Context
- OpenBMC SHA1 : 4da8af5d4aaa13694ece0f6cfb25fda18be0eda8 (12/30/2025)
- phosphor-bmc-code-mgmt SHA1: e634411 (11/26/2025)
- Use case: firmware update from bmcweb interface, on an eMMC running system (ext4 mmc tarball)
Observation
When analyzing at journalctl on the openBMC running board under FW update, I sometimes see an error such as:
Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc@7d9ab9e4.service: Deactivated successfully.
Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: Finished Write image 7d9ab9e4 to BMC storage.
Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc@7d9ab9e4.service: Consumed 14.264s CPU time, 292.3M memory peak.
Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: Starting Set b as primary partition...
Mar 06 15:44:05 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc-setprimary@b.service: Deactivated successfully.
Mar 06 15:44:05 evb-stm32mp257f-dk systemd[1]: Finished Set b as primary partition.
Mar 06 15:44:07 evb-stm32mp257f-dk phosphor-software-manager[412]: BMC activation has ended - BMC reboots are re-enabled.
Mar 06 15:44:07 evb-stm32mp257f-dk phosphor-software-manager[412]: BMC image ready; need reboot to get activated.
Mar 06 15:44:07 evb-stm32mp257f-dk phosphor-software-manager[412]: corrupted size vs. prev_size
or
Mar 06 14:54:54 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc-remove@a.service: Deactivated successfully.
Mar 06 14:54:54 evb-stm32mp257f-dk systemd[1]: Finished Delete image a from BMC storage.
Mar 06 14:54:57 evb-stm32mp257f-dk systemd[1]: Created slice Slice /system/obmc-flash-mmc.
Mar 06 14:54:57 evb-stm32mp257f-dk systemd[1]: Starting Write image beecf802 to BMC storage...
Mar 06 14:55:03 evb-stm32mp257f-dk obmc-flash-bmc[527]: 131072+0 records in
Mar 06 14:55:03 evb-stm32mp257f-dk obmc-flash-bmc[527]: 131072+0 records out
Mar 06 14:55:28 evb-stm32mp257f-dk obmc-flash-bmc[544]: 566440+0 records in
Mar 06 14:55:28 evb-stm32mp257f-dk obmc-flash-bmc[544]: 566440+0 records out
Mar 06 14:55:30 evb-stm32mp257f-dk phosphor-software-manager[414]: malloc(): mismatching next->prev_size (unsorted)
In both cases, the phosphor-software-manager PID ends with a coredump, and can block the reboot if the reboot guard service did not have time to finish.
It looks like a heap memory corruption, that can be induced by another event before.
Investigations
- Debug symbol kept and strip canceled for phosphor-software-manager binary
- Core dump activated for GDB analysis
- Modification of Software Manager service to launch phosphor-software-manager with Valgrind
GDB logs and Valgrind outputs are enclosed in attachment:
valgrind.txt
gdb.log
It seems (accorded to Valgrind), that we are freeing something here:
|
updateManagers.erase(entryId); |
Seems to be allocated by phosphor::software::updater::ItemUpdater::processBMCImage(), free, then reuse by phosphor::software::update::Manager::processImage(xxx)
Just for test purpose, I decided to comment this free step and see if my issue was gone, this is the case so I think I pointed out the root cause.
Not sure that the heap memory corruption create a direct crash on every single system (by the way, often followed by a reboot directly so it can be invisible). It is possible that this bug was hidden for a long time.
Kind regards,
Erwan.
Hello everyone,
I face an issue for a long time now, that can be explained probably by a use after free in phosphor-software-manager source code.
Context
Observation
When analyzing at journalctl on the openBMC running board under FW update, I sometimes see an error such as:
or
In both cases, the phosphor-software-manager PID ends with a coredump, and can block the reboot if the reboot guard service did not have time to finish.
It looks like a heap memory corruption, that can be induced by another event before.
Investigations
GDB logs and Valgrind outputs are enclosed in attachment:
valgrind.txt
gdb.log
It seems (accorded to Valgrind), that we are freeing something here:
phosphor-bmc-code-mgmt/bmc/item_updater.cpp
Line 557 in e634411
Seems to be allocated by phosphor::software::updater::ItemUpdater::processBMCImage(), free, then reuse by phosphor::software::update::Manager::processImage(xxx)
Just for test purpose, I decided to comment this free step and see if my issue was gone, this is the case so I think I pointed out the root cause.
Not sure that the heap memory corruption create a direct crash on every single system (by the way, often followed by a reboot directly so it can be invisible). It is possible that this bug was hidden for a long time.
Kind regards,
Erwan.