Today we have been testing drive failing/replacement and noticed a couple short-comings in the device.pp manifest:
- When a disk is replaced, the ceph.conf will change and this results in a service restart of all the osd's in a server. (because of the subscribe => Concat /etc/ceph/ceph.conf in each osd service). These restarts result in a noticeable disruption. Ideally we want only to start the affected service, not all of them!
- Using the /dev/sdX names for disks isn't ideal, since when a replacement drive is inserted it will get a new name (e.g. today we pulled sdq, then reinserted it and it got sdab). We then need to do one of
(a) change our host manifests to add osd::device (sdab), but this isn't good since the device will return to sdq after a reboot, or
(b) reboot the server, to get the device called sdq once again.
Do people have experience already with better practices to prevent these two problems?? Help is much appreciated!
Cheers, Dan
CERN IT
Today we have been testing drive failing/replacement and noticed a couple short-comings in the device.pp manifest:
(a) change our host manifests to add osd::device (sdab), but this isn't good since the device will return to sdq after a reboot, or
(b) reboot the server, to get the device called sdq once again.
Do people have experience already with better practices to prevent these two problems?? Help is much appreciated!
Cheers, Dan
CERN IT