i have this situation and cand access to ceph dashboard.i haad 5 mon but 2 of them went down and one of them is the bootstrap mon node so that have mgr and I got this from that node.
2020-10-14T18:59:46.904+0330 7f9d2e8e9700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
cluster:
id: e97c1944-e132-11ea-9bdd-e83935b1c392
health: HEALTH_WARN
no active mgr
services:
mon: 3 daemons, quorum srv4,srv5,srv6 (age 2d)
mgr: no daemons active (since 2d)
mds: heyatfs:1 {0=heyfs.srv10.lxizhc=up:active} 1 up:standby
osd: 54 osds: 54 up (since 47h), 54 in (since 3w)
task status:
scrub status:
mds.heyfs.srv10.lxizhc: idle
data:
pools: 3 pools, 65 pgs
objects: 223.95k objects, 386 GiB
usage: 1.2 TiB used, 97 TiB / 98 TiB avail
pgs: 65 active+clean
io:
client: 105 KiB/s rd, 328 KiB/s wr, 0 op/s rd, 0 op/s wr
I have to say the whole story, I used cephadm to create my cluster at first and I'm so new to ceph i have 15 servers and 14 of them have OSD container and 5 of them had mon and my bootstrap mon that is srv2 have mgr. 2 of these servers have public IP and I used one of them as a client (I know this structure have a lot of question in it but my company forces me to do it and also I'm new to ceph so it's how it's now). 2 weeks ago I lost 2 OSD and I said to datacenter who gives me these servers to change that 2 HDD they restart those servers and unfortunately, those servers were my Mon server. after they restarted those servers on of them came back srv5 but I could see srv3 is out of quorum so i begon to solve this problem so I used this command in ceph shell --fsid ...
ceph orch apply mon srv3
ceph mon remove srv3
after some while I see in my dashboard srv2 my boostrap mon and mgr is not working and when I used ceph -s ssrv2 isn't there and I can see srv2 mon in removed directory
root@srv2:/var/lib/ceph/e97c1944-e132-11ea-9bdd-e83935b1c392# ls
crash crash.srv2 home mgr.srv2.xpntaf osd.0 osd.1 osd.2 osd.3 removed
but mgr.srv2.xpntaf is running and unfortunately, I lost my access to ceph dashboard now
i tried to add srv2 and 3 to monmap with
576 ceph orch daemon add mon srv2:172.32.X.3
577 history | grep dump
578 ceph mon dump
579 ceph -s
580 ceph mon dump
581 ceph mon add srv3 172.32.X.4:6789
and now
root@srv2:/# ceph -s
cluster:
id: e97c1944-e132-11ea-9bdd-e83935b1c392
health: HEALTH_WARN
no active mgr
2/5 mons down, quorum srv4,srv5,srv6
services:
mon: 5 daemons, quorum srv4,srv5,srv6 (age 16h), out of quorum: srv2, srv3
mgr: no daemons active (since 2d)
mds: heyatfs:1 {0=heyatfs.srv10.lxizhc=up:active} 1 up:standby
osd: 54 osds: 54 up (since 2d), 54 in (since 3w)
task status:
scrub status:
mds.heyatfs.srv10.lxizhc: idle
data:
pools: 3 pools, 65 pgs
objects: 223.95k objects, 386 GiB
usage: 1.2 TiB used, 97 TiB / 98 TiB avail
pgs: 65 active+clean
io:
client: 105 KiB/s rd, 328 KiB/s wr, 0 op/s rd, 0 op/s wr
and I must say ceph orch host ls
doesn't work and it hangs when I run it and I think it's because of that err no active mgr and also when I see that removed directory mon.srv2 is there and you can see unit.run file so I used that command to run the container again but it says mon.srv2 isn't on mon map and doesn't have specific IP and by the way I must say after ceph orch apply mon srv3
i could see a new container with a new fsid in srv3 server
I now my whole problem is because I ran this command ceph orch apply mon srv3
because when you see the installation document :
To deploy monitors on a specific set of hosts:
# ceph orch apply mon *<host1,host2,host3,...>*
Be sure to include the first (bootstrap) host in this list.
and I didn't see that line !!!
now I manage to have another mgr running but I got this
root@srv2:/var/lib/ceph/mgr# ceph -s
2020-10-15T13:11:59.080+0000 7f957e9cd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
cluster:
id: e97c1944-e132-11ea-9bdd-e83935b1c392
health: HEALTH_ERR
1 stray daemons(s) not managed by cephadm
2 mgr modules have failed
2/5 mons down, quorum srv4,srv5,srv6
services:
mon: 5 daemons, quorum srv4,srv5,srv6 (age 20h), out of quorum: srv2, srv3
mgr: srv4(active, since 8m)
mds: heyatfs:1 {0=heyatfs.srv10.lxizhc=up:active} 1 up:standby
osd: 54 osds: 54 up (since 2d), 54 in (since 3w)
task status:
scrub status:
mds.heyatfs.srv10.lxizhc: idle
data:
pools: 3 pools, 65 pgs
objects: 301.77k objects, 537 GiB
usage: 1.6 TiB used, 97 TiB / 98 TiB avail
pgs: 65 active+clean
io:
client: 180 KiB/s rd, 597 B/s wr, 0 op/s rd, 0 op/s wr
and when I run the ceph orch host ls
i see this
root@srv2:/var/lib/ceph/mgr# ceph orch host ls
HOST ADDR LABELS STATUS
srv10 172.32.x.11
srv11 172.32.x.12
srv12 172.32.x.13
srv13 172.32.x.14
srv14 172.32.x.15
srv15 172.32.x.16
srv2 srv2
srv3 172.32.x.4
srv4 172.32.x.5
srv5 172.32.x.6
srv6 172.32.x.7
srv7 172.32.x.8
srv8 172.32.x.9
srv9 172.32.x.10