A while back we removed two damaged OSDs from our Ceph cluster, osd.0 and osd.8. They are now gone from most Ceph commands, but are still showing up in the CRUSH map with weird device names:

# devices
device 0 device0  <-----
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 device8  <-----
device 9 osd.9

Can someone please explain why device0 and device8 are still there, if they have any affect on the cluster, and whether or not we should remove them?

device0 and device8 do not show up anywhere else in the CRUSH map.

We used the procedure from the web site here:

http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-osds/#removing-osds-manual

Basically:

ceph osd crush remove 8
ceph auth del osd.8
ceph osd rm 8

I am mainly asking because we are dealing with some stuck PGs (incomplete) which are still referencing id "8" in various places. Wondering if this is related?

Otherwise, "ceph osd tree" looks how I would expect (no osd.8 and no osd.0):

djakubiec@dev:~$ ceph osd tree
ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 58.19960 root default
-2  7.27489     host node24
 1  7.27489         osd.1        up  1.00000          1.00000
-3  7.27489     host node25
 2  7.27489         osd.2        up  1.00000          1.00000
-4  7.27489     host node26
 3  7.27489         osd.3        up  1.00000          1.00000
-5  7.27489     host node27
 4  7.27489         osd.4        up  1.00000          1.00000
-6  7.27489     host node28
 5  7.27489         osd.5        up  1.00000          1.00000
-7  7.27489     host node29
 6  7.27489         osd.6        up  1.00000          1.00000
-8  7.27539     host node30
 9  7.27539         osd.9        up  1.00000          1.00000
-9  7.27489     host node31
 7  7.27489         osd.7        up  1.00000          1.00000

Thanks,

-- Dan

1

There are 1 best solutions below

2
On

I had the same problem after a node failure and solved it by manually removing the extra devices from the crush map. I had already removed the osds and the failed node using the standard procedures, but for some reason, I had ghost devices left in my crush map.

Export the crush map and edit it:

~# ceph osd getcrushmap -o /tmp/crushmap
~# crushtool -d /tmp/crushmap -o crush_map
~# vi crush_map

This is what my crush map's devices section looked like before:

# devices
device 0 osd.0
device 1 device1
device 2 osd.2
device 3 osd.3
device 4 device4
device 5 osd.5
device 6 osd.6
device 7 osd.7

I changed it to this - note I had to renumber, not just remove the extra lines.

# devices
device 0 osd.0
device 1 osd.2
device 2 osd.3
device 3 osd.5
device 4 osd.6
device 5 osd.7

Then, recompile the crush map and apply it:

~# crushtool -c crush_map -o /tmp/crushmap
~# ceph osd setcrushmap -i /tmp/crushmap

This kicked off the recovery process again and the ghost devices are now gone.