A while back we removed two damaged OSDs from our Ceph cluster, osd.0 and osd.8. They are now gone from most Ceph commands, but are still showing up in the CRUSH map with weird device names:
# devices
device 0 device0 <-----
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 device8 <-----
device 9 osd.9
Can someone please explain why device0 and device8 are still there, if they have any affect on the cluster, and whether or not we should remove them?
device0 and device8 do not show up anywhere else in the CRUSH map.
We used the procedure from the web site here:
http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-osds/#removing-osds-manual
Basically:
ceph osd crush remove 8
ceph auth del osd.8
ceph osd rm 8
I am mainly asking because we are dealing with some stuck PGs (incomplete) which are still referencing id "8" in various places. Wondering if this is related?
Otherwise, "ceph osd tree" looks how I would expect (no osd.8 and no osd.0):
djakubiec@dev:~$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 58.19960 root default
-2 7.27489 host node24
1 7.27489 osd.1 up 1.00000 1.00000
-3 7.27489 host node25
2 7.27489 osd.2 up 1.00000 1.00000
-4 7.27489 host node26
3 7.27489 osd.3 up 1.00000 1.00000
-5 7.27489 host node27
4 7.27489 osd.4 up 1.00000 1.00000
-6 7.27489 host node28
5 7.27489 osd.5 up 1.00000 1.00000
-7 7.27489 host node29
6 7.27489 osd.6 up 1.00000 1.00000
-8 7.27539 host node30
9 7.27539 osd.9 up 1.00000 1.00000
-9 7.27489 host node31
7 7.27489 osd.7 up 1.00000 1.00000
Thanks,
-- Dan
I had the same problem after a node failure and solved it by manually removing the extra devices from the crush map. I had already removed the osds and the failed node using the standard procedures, but for some reason, I had ghost devices left in my crush map.
Export the crush map and edit it:
This is what my crush map's devices section looked like before:
I changed it to this - note I had to renumber, not just remove the extra lines.
Then, recompile the crush map and apply it:
This kicked off the recovery process again and the ghost devices are now gone.