Apache Geode + Spring Boot: Suspect member host (reason=member unexpectedly shut down)

34 Views Asked by At

I started one geode locator using the gfsh tool and one spring-boot application running as the cache server.
Right after starting the application, the cache server is detected as a suspect member and it's being removed from the cluster, eventually making the whole service stop due to the quorum being lost.
I would like to understand why the cache server is considered a suspect member. I believe it has something to do with this message: "Member host(security-service:19372):41000 is not equivalent or in the same redundancy zone."

Details:

  • gfsh (apache-geode-1.15.1)
  • Java 17
  • Spring Boot 2.6.8
  • spring-geode-starter:1.6.11

Logs:

[info 2023/10/19 17:27:34.514 EDT locator-ip-10-106-10-157 <locator request thread 2> tid=0x29] Peer locator: coordinator from view is ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000
[info 2023/10/19 17:27:34.554 EDT locator-ip-10-106-10-157 <unicast receiver,ip-10-106-10-157-602> tid=0x21] Received a join request from host(security-service:19372):41000
[info 2023/10/19 17:27:34.854 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] View Creator is processing 1 requests for the next membership view ([JoinRequestMessage(host(security-service:19372):41000) failureDetectionPort:50297])
[info 2023/10/19 17:27:34.855 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] preparing new view View[ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000|1] members: [ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000, host(security-service:19372)<v1>:41000{lead}]
[info 2023/10/19 17:27:34.871 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] finished waiting for responses to view preparation
[info 2023/10/19 17:27:34.872 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] received new view: View[ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000|1] members: [ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000, host(security-service:19372)<v1>:41000{lead}]
old view is: View[ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000|0] members: [ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000]
[info 2023/10/19 17:27:34.872 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] Peer locator received new membership view: View[ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000|1] members: [ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000, host(security-service:19372)<v1>:41000{lead}]
[info 2023/10/19 17:27:34.883 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] sending new view View[ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000|1] members: [ip-10-106-10-157(locator-ip-10-106-10-157:7452:locator)<ec><v0>:41000, host(security-service:19372)<v1>:41000{lead}]
[info 2023/10/19 17:27:34.885 EDT locator-ip-10-106-10-157 <Geode View Processor1> tid=0x69] Membership: Processing addition <host(security-service:19372)<v1>:41000>
[info 2023/10/19 17:27:34.914 EDT locator-ip-10-106-10-157 <FederatingManager1> tid=0x6a] Initializing region _monitoringRegion_10.152.40.12<v1>41000
[info 2023/10/19 17:27:35.304 EDT locator-ip-10-106-10-157 <Pooled Waiting Message Processor 1> tid=0x2b] Member host(security-service:19372)<v1>:41000 is not equivalent or in the same redundancy zone.
[info 2023/10/19 17:27:35.692 EDT locator-ip-10-106-10-157 <P2P message reader for host(security-service:19372)<v1>:41000 shared unordered sender uid=1 local port=60589 remote port=62710> tid=0x6c] Performing availability check for suspect member host(security-service:19372)<v1>:41000 reason=member unexpectedly shut down shared, unordered connection
[info 2023/10/19 17:27:35.717 EDT locator-ip-10-106-10-157 <P2P message reader for host(security-service:19372)<v1>:41000 shared unordered sender uid=1 local port=60589 remote port=62710> tid=0x6c] Availability check failed for member host(security-service:19372)<v1>:41000
[info 2023/10/19 17:27:35.721 EDT locator-ip-10-106-10-157 <P2P message reader for host(security-service:19372)<v1>:41000 shared unordered sender uid=1 local port=60589 remote port=62710> tid=0x6c] received suspect message from myself for host(security-service:19372)<v1>:41000: member unexpectedly shut down shared, unordered connection
[info 2023/10/19 17:27:35.724 EDT locator-ip-10-106-10-157 <P2P message reader for host(security-service:19372)<v1>:41000 shared unordered sender uid=1 local port=60589 remote port=62710> tid=0x6c] received suspect message from myself for host(security-service:19372)<v1>:41000: failed availability check
[info 2023/10/19 17:27:35.725 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 2> tid=0x6f] Performing availability check for suspect member host(security-service:19372)<v1>:41000 reason=member unexpectedly shut down shared, unordered connection
[info 2023/10/19 17:27:35.732 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 2> tid=0x6f] All other members are suspect at this point
[info 2023/10/19 17:27:35.733 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 3> tid=0x70] Performing availability check for suspect member host(security-service:19372)<v1>:41000 reason=failed availability check
[info 2023/10/19 17:27:35.734 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 3> tid=0x70] All other members are suspect at this point
[info 2023/10/19 17:27:40.755 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 3> tid=0x70] Availability check failed for member host(security-service:19372)<v1>:41000
[info 2023/10/19 17:27:40.755 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 2> tid=0x6f] Availability check failed for member host(security-service:19372)<v1>:41000
[info 2023/10/19 17:27:40.755 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 3> tid=0x70] Requesting removal of suspect member host(security-service:19372)<v1>:41000
[info 2023/10/19 17:27:40.756 EDT locator-ip-10-106-10-157 <Geode Failure Detection thread 2> tid=0x6f] Requesting removal of suspect member host(security-service:19372)<v1>:41000
[info 2023/10/19 17:27:41.056 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] View Creator is processing 1 requests for the next membership view ([RemoveMemberMessage(host(security-service:19372)<v1>:41000; reason=failed availability check)])
[info 2023/10/19 17:27:41.058 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a]   host(security-service:19372)<v1>:41000 had a weight of 15
[warn 2023/10/19 17:27:41.058 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] total weight lost in this view change is 15 of 18.  Quorum has been lost!
[fatal 2023/10/19 17:27:41.059 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] Possible loss of quorum due to the loss of 1 cache processes: [host(security-service:19372)<v1>:41000]
[fatal 2023/10/19 17:27:42.063 EDT locator-ip-10-106-10-157 <Geode Membership View Creator> tid=0x2a] Membership service failure: Exiting due to possible network partition event due to loss of 1 cache processes: [host(security-service:19372)<v1>:41000]
0

There are 0 best solutions below