Skip to content

Use after free - phosphor-software-manager ? #17

@eszSTM

Description

@eszSTM

Hello everyone,
I face an issue for a long time now, that can be explained probably by a use after free in phosphor-software-manager source code.

Context

  • OpenBMC SHA1 : 4da8af5d4aaa13694ece0f6cfb25fda18be0eda8 (12/30/2025)
  • phosphor-bmc-code-mgmt SHA1: e634411 (11/26/2025)
  • Use case: firmware update from bmcweb interface, on an eMMC running system (ext4 mmc tarball)

Observation
When analyzing at journalctl on the openBMC running board under FW update, I sometimes see an error such as:

Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc@7d9ab9e4.service: Deactivated successfully.
Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: Finished Write image 7d9ab9e4 to BMC storage.
Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc@7d9ab9e4.service: Consumed 14.264s CPU time, 292.3M memory peak.
Mar 06 15:44:04 evb-stm32mp257f-dk systemd[1]: Starting Set b as primary partition...
Mar 06 15:44:05 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc-setprimary@b.service: Deactivated successfully.
Mar 06 15:44:05 evb-stm32mp257f-dk systemd[1]: Finished Set b as primary partition.
Mar 06 15:44:07 evb-stm32mp257f-dk phosphor-software-manager[412]: BMC activation has ended - BMC reboots are re-enabled.
Mar 06 15:44:07 evb-stm32mp257f-dk phosphor-software-manager[412]: BMC image ready; need reboot to get activated.
Mar 06 15:44:07 evb-stm32mp257f-dk phosphor-software-manager[412]: corrupted size vs. prev_size

or

Mar 06 14:54:54 evb-stm32mp257f-dk systemd[1]: obmc-flash-mmc-remove@a.service: Deactivated successfully.
Mar 06 14:54:54 evb-stm32mp257f-dk systemd[1]: Finished Delete image a from BMC storage.
Mar 06 14:54:57 evb-stm32mp257f-dk systemd[1]: Created slice Slice /system/obmc-flash-mmc.
Mar 06 14:54:57 evb-stm32mp257f-dk systemd[1]: Starting Write image beecf802 to BMC storage...
Mar 06 14:55:03 evb-stm32mp257f-dk obmc-flash-bmc[527]: 131072+0 records in
Mar 06 14:55:03 evb-stm32mp257f-dk obmc-flash-bmc[527]: 131072+0 records out
Mar 06 14:55:28 evb-stm32mp257f-dk obmc-flash-bmc[544]: 566440+0 records in
Mar 06 14:55:28 evb-stm32mp257f-dk obmc-flash-bmc[544]: 566440+0 records out
Mar 06 14:55:30 evb-stm32mp257f-dk phosphor-software-manager[414]: malloc(): mismatching next->prev_size (unsorted)

In both cases, the phosphor-software-manager PID ends with a coredump, and can block the reboot if the reboot guard service did not have time to finish.
It looks like a heap memory corruption, that can be induced by another event before.

Investigations

  • Debug symbol kept and strip canceled for phosphor-software-manager binary
  • Core dump activated for GDB analysis
  • Modification of Software Manager service to launch phosphor-software-manager with Valgrind

GDB logs and Valgrind outputs are enclosed in attachment:

valgrind.txt

gdb.log

It seems (accorded to Valgrind), that we are freeing something here:

updateManagers.erase(entryId);

Seems to be allocated by phosphor::software::updater::ItemUpdater::processBMCImage(), free, then reuse by phosphor::software::update::Manager::processImage(xxx)

Just for test purpose, I decided to comment this free step and see if my issue was gone, this is the case so I think I pointed out the root cause.
Not sure that the heap memory corruption create a direct crash on every single system (by the way, often followed by a reboot directly so it can be invisible). It is possible that this bug was hidden for a long time.

Kind regards,
Erwan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions