mongodb can't stablish primary in replica set

474 Views Asked by At

I have 2 servers (productive and bi) with 2 mongo services each one, all of them in replica set. Today the vlan went down, so servers are not visible one each other. The problem is that mongodb didn't picked any member to become primary and I don't know how to force one of them to become primary.

I've tried to restart the server with no success, and also to reconfig the replicaset to change priorities, but in order to do so it needs to be done from a primary node, but I can't connect to a primary... I'm really stucked...

I've also read this question https://stackoverflow.com/a/59851668/513570 but I'm not sure if I understand it well, I expected that when there is a problem in a primary, some of the secondary nodes would be picked to become primary, but of course it didn't happened. How could I config the 4 nodes to do so?

So 2 questions here: how can I force one of the secondary members to become primary? And how to configure a replicaset to always have a primary online? Please any help will be really appreciated.

rs.status():

{
  set: 'repset',
  date: ISODate("2022-04-07T08:22:14.569Z"),
  myState: 2,
  term: Long("6"),
  syncSourceHost: '',
  syncSourceId: -1,
  heartbeatIntervalMillis: Long("2000"),
  majorityVoteCount: 3,
  writeMajorityCount: 3,
  votingMembersCount: 4,
  writableVotingMembersCount: 4,
  optimes: {
    lastCommittedOpTime: { ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
    lastCommittedWallTime: ISODate("1970-01-01T00:00:00.000Z"),
    appliedOpTime: { ts: Timestamp({ t: 1649214911, i: 2 }), t: Long("6") },
    durableOpTime: { ts: Timestamp({ t: 1649214911, i: 2 }), t: Long("6") },
    lastAppliedWallTime: ISODate("2022-04-06T03:15:11.013Z"),
    lastDurableWallTime: ISODate("2022-04-06T03:15:11.013Z")
  },
  lastStableRecoveryTimestamp: Timestamp({ t: 1649214911, i: 2 }),
  members: [
    {
      _id: 0,
      name: 'productive.vlan.local:27017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      uptime: 418,
      optime: { ts: Timestamp({ t: 1649214911, i: 2 }), t: Long("6") },
      optimeDate: ISODate("2022-04-06T03:15:11.000Z"),
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      configVersion: 13,
      configTerm: 6,
      self: true,
      lastHeartbeatMessage: ''
    },
    {
      _id: 1,
      name: 'productive.vlan.local:37017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      uptime: 410,
      optime: { ts: Timestamp({ t: 1649214911, i: 2 }), t: Long("6") },
      optimeDurable: { ts: Timestamp({ t: 1649214911, i: 2 }), t: Long("6") },
      optimeDate: ISODate("2022-04-06T03:15:11.000Z"),
      optimeDurableDate: ISODate("2022-04-06T03:15:11.000Z"),
      lastHeartbeat: ISODate("2022-04-07T08:22:14.565Z"),
      lastHeartbeatRecv: ISODate("2022-04-07T08:22:14.305Z"),
      pingMs: Long("0"),
      lastHeartbeatMessage: '',
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      configVersion: 13,
      configTerm: 6
    },
    {
      _id: 2,
      name: 'bi.vlan.local:37017',
      health: 0,
      state: 8,
      stateStr: '(not reachable/healthy)',
      uptime: 0,
      optime: { ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
      optimeDurable: { ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
      optimeDate: ISODate("1970-01-01T00:00:00.000Z"),
      optimeDurableDate: ISODate("1970-01-01T00:00:00.000Z"),
      lastHeartbeat: ISODate("2022-04-07T08:22:09.867Z"),
      lastHeartbeatRecv: ISODate("1970-01-01T00:00:00.000Z"),
      pingMs: Long("0"),
      lastHeartbeatMessage: 'Error connecting to bi.vlan.local:37017 (10.0.130.209:37017) :: caused by :: No route to host',
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      configVersion: -1,
      configTerm: -1
    },
    {
      _id: 3,
      name: 'bi.vlan.local:47017',
      health: 0,
      state: 8,
      stateStr: '(not reachable/healthy)',
      uptime: 0,
      optime: { ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
      optimeDurable: { ts: Timestamp({ t: 0, i: 0 }), t: Long("-1") },
      optimeDate: ISODate("1970-01-01T00:00:00.000Z"),
      optimeDurableDate: ISODate("1970-01-01T00:00:00.000Z"),
      lastHeartbeat: ISODate("2022-04-07T08:22:09.867Z"),
      lastHeartbeatRecv: ISODate("1970-01-01T00:00:00.000Z"),
      pingMs: Long("0"),
      lastHeartbeatMessage: 'Error connecting to bi.vlan.local:47017 (10.0.130.209:47017) :: caused by :: No route to host',
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      configVersion: -1,
      configTerm: -1
    }
  ],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1649214911, i: 2 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1649214911, i: 2 })
}
1

There are 1 best solutions below

2
On

Thanks to @Wernfried-Domscheit comment, the problem is an election cannot be done see this question

As the bi server only serves as backups and BI operations, I've ended up by changing the votes and priority of the members in the bi server:

  members: [
    {
      _id: 0,
      host: 'productive.vlan.local:27017',
      priority: 100,
      votes: 1
    },
    {
      _id: 1,
      host: 'productive.vlan.local:37017',
      priority: 10,
      votes: 1
    },
    {
      _id: 2,
      host: 'bi.vlan.local:37017',
      priority: 0,
      votes: 0
    },
    {
      _id: 3,
      host: 'bi.vlan.local:47017',
      priority: 0,
      votes: 0
    }
  ],

Another option would be to add an arbiter but in this situation it is not required at all...