Skip to content

feat: add zombie-reaper.py to handle zombie qemu left-overs#588

Merged
mergify[bot] merged 2 commits into
os-autoinst:masterfrom
okurz:feature/poo201144_zombie_reaper
Jun 2, 2026
Merged

feat: add zombie-reaper.py to handle zombie qemu left-overs#588
mergify[bot] merged 2 commits into
os-autoinst:masterfrom
okurz:feature/poo201144_zombie_reaper

Conversation

@okurz

@okurz okurz commented May 21, 2026

Copy link
Copy Markdown
Member

@Martchus Martchus left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script name should probably reflect that this is about s390x and qemu processes specifically.

I think it is part of the normal operation that qemu processes are in the zombie state for a very short time (until the parent process reads the exit status). It is probably unlikely that the script will take action in those cases but it isn't ideal.

@okurz

okurz commented May 22, 2026

Copy link
Copy Markdown
Member Author

The script name should probably reflect that this is about s390x and qemu processes specifically.

done

I think it is part of the normal operation that qemu processes are in the zombie state for a very short time (until the parent process reads the exit status). It is probably unlikely that the script will take action in those cases but it isn't ideal.

hm, not sure if that is realistic but ok :) Made the script reasonably more complicated by doing the following

  • look for zombies
  • if any found, sleep 10s
  • look for same zombie PIDs again, if found, trigger reboot + job restart

@perlpunk

Copy link
Copy Markdown
Contributor
  • look for same zombie PIDs again, if found, trigger reboot + job restart

and then, to be safe, check if it's not a meanwhile newly started process with the same PID ;-)

@okurz okurz force-pushed the feature/poo201144_zombie_reaper branch from 43437f6 to 97414bc Compare May 24, 2026 07:22
@okurz

okurz commented May 24, 2026

Copy link
Copy Markdown
Member Author
  • look for same zombie PIDs again, if found, trigger reboot + job restart

and then, to be safe, check if it's not a meanwhile newly started process with the same PID ;-)

done

okurz added 2 commits June 2, 2026 12:57
Motivation:
The kernel team requested a crash dump (vmcore) to debug the `exit_mmap` deadlock
on s390x. A standard reboot clears the system state but does not preserve the
memory state needed for post-mortem analysis.

Design Choices:
We replace the `sudo reboot` command with a `sysrq-trigger` kernel panic
(`echo c > /proc/sysrq-trigger`). This forces the kernel to trigger kdump, save
 the vmcore, and then automatically reboot the machine as configured in the
kdump settings.

Benefits:
Automates the collection of invaluable debug data for the kernel team while
maintaining the automated recovery of the openQA hypervisors.

Related issue: https://bugzilla.suse.com/show_bug.cgi?id=1265624
@okurz okurz force-pushed the feature/poo201144_zombie_reaper branch from 97414bc to 9b4ae46 Compare June 2, 2026 10:57
@mergify mergify Bot merged commit 95896c8 into os-autoinst:master Jun 2, 2026
7 checks passed
@okurz okurz deleted the feature/poo201144_zombie_reaper branch June 2, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants