Kafka without zookeeper + brokers failed after couple min

2.1k Views Asked by At

I am using Apache Kafka V3.5 in order to create 3 Kafka machines in the cluster , the goal is to have 3 broker machines without zookeeper

I am facing a problem after we installed and configured the kakas machines , and the problem is about that broker/s stopped after couple min on all machines

first I want to share the configuration in server.properties

the following server.properties after rendering from template looks as following: (the node.id changes across 3 brokers)

process.roles=broker,controller
node.id=1
[email protected]:19092,[email protected]:19093,[email protected]:19094
listeners=PLAINTEXT://:9092,CONTROLLER://:19092
inter.broker.listener.name=PLAINTEXT
advertised.listeners=PLAINTEXT://:9092
controller.listener.names=CONTROLLER
listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kraft-combined-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
initial.broker.registration.timeout.ms=240000

However, The cluster is not able to start and it seems to be a connections issue.

[2023-06-23 10:45:57,443] WARN [RaftManager id=1] Connection to node 3 (/120.201.245.121:19094) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-06-23 10:45:57,468] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-06-23 10:45:57,769] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-06-23 10:45:57,869] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-06-23 10:45:57,900] INFO [RaftManager id=1] Node 3 disconnected. (org.apache.kafka.clients.NetworkClient)
[2023-06-23 10:45:57,900] WARN [RaftManager id=1] Connection to node 3 (/120.201.245.121:19094) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-06-23 10:45:57,914] INFO [RaftManager id=1] Node 2 disconnected. (org.apache.kafka.clients.NetworkClient)
[2023-06-23 10:45:57,914] WARN [RaftManager id=1] Connection to node 2 (/120.201.245.16:19093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2023-06-23 10:45:57,969] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-06-23 10:45:58,170] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-06-23 10:45:58,229] WARN [RaftManager id=1] Graceful shutdown timed out after 5000ms (org.apache.kafka.raft.KafkaRaftClient)
[2023-06-23 10:45:58,230] ERROR [kafka-1-raft-io-thread]: Graceful shutdown of RaftClient failed (kafka.raft.KafkaRaftManager$RaftIoThread)
java.util.concurrent.TimeoutException: Timeout expired before graceful shutdown completed
        at org.apache.kafka.raft.KafkaRaftClient$GracefulShutdown.failWithTimeout(KafkaRaftClient.java:2423)
        at org.apache.kafka.raft.KafkaRaftClient.maybeCompleteShutdown(KafkaRaftClient.java:2169)
        at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2234)
        at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:64)
        at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:127)
[2023-06-23 10:45:58,230] INFO [kafka-1-raft-io-thread]: Shutdown completed (kafka.raft.KafkaRaftManager$RaftIoThread)
[2023-06-23 10:45:58,231] INFO [kafka-1-raft-io-thread]: Stopped (kafka.raft.KafkaRaftManager$RaftIoThread)
[2023-06-23 10:45:58,240] INFO [kafka-1-raft-outbound-request-thread]: Shutting down (kafka.raft.RaftSendThread)
[2023-06-23 10:45:58,240] INFO [kafka-1-raft-outbound-request-thread]: Stopped (kafka.raft.RaftSendThread)
[2023-06-23 10:45:58,241] INFO [kafka-1-raft-outbound-request-thread]: Shutdown completed (kafka.raft.RaftSendThread)
[2023-06-23 10:45:58,252] INFO [SocketServer listenerType=CONTROLLER, nodeId=1] Stopping socket server request processors (kafka.network.SocketServer)
[2023-06-23 10:45:58,257] INFO [SocketServer listenerType=CONTROLLER, nodeId=1] Stopped socket server request processors (kafka.network.SocketServer)
[2023-06-23 10:45:58,270] INFO [QuorumController id=1] closed event queue. (org.apache.kafka.queue.KafkaEventQueue)
    [2023-06-23 10:45:58,270] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
    [2023-06-23 10:45:58,273] INFO [SharedServer id=1] Stopping SharedServer (kafka.server.SharedServer)
    [2023-06-23 10:45:58,273] INFO [MetadataLoader id=1] beginShutdown: shutting down event queue. (org.apache.kafka.queue.KafkaEventQueue)
    [2023-06-23 10:45:58,273] INFO [SnapshotGenerator id=1] close: shutting down event queue. (org.apache.kafka.queue.KafkaEventQueue)
    [2023-06-23 10:45:58,273] INFO [SnapshotGenerator id=1] closed event queue. (org.apache.kafka.queue.KafkaEventQueue)
    [2023-06-23 10:45:58,277] INFO [MetadataLoader id=1] closed event queue. (org.apache.kafka.queue.KafkaEventQueue)
    [2023-06-23 10:45:58,277] INFO [SnapshotGenerator id=1] closed event queue. (org.apache.kafka.queue.KafkaEventQueue)
    [2023-06-23 10:45:58,279] INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics)
    [2023-06-23 10:45:58,279] INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics)
    [2023-06-23 10:45:58,279] INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics)
    [2023-06-23 10:45:58,283] INFO App info kafka.server for 1 unregistered (org.apache.kafka.common.utils.AppInfoParser)
    [2023-06-23 10:45:58,284] INFO App info kafka.server for 1 unregistered (org.apache.kafka.common.utils.AppInfoParser)

I've spent few days on this puzzle , as changing the server.properites few times but currently I am out of ideas. Would appreciate any help on this issue. ( in spite all machine are connected means ping and ssh are ok across machines )

the installation & configuration steps for this Kafka version are very easy , the machines are Linux version is RHEL 8.4

tar xzf    kafka_2.13-3.5.0.tgz
cd /home/kafka_2.13-3.5.0/config/kraft

note - we changed the node.id for each server.properites in each broker machine

[root@kafka01 kraft]# KAFKA_CLUSTER_ID="$(bash /home/kafka_2.13-3.5.0/bin/kafka-storage.sh random-uuid)"
[root@kafka01 kraft]# echo $KAFKA_CLUSTER_ID
kx9G7XV3TOiagO4Fsu1FYQ
[root@kafka01 kraft]# bash /home/kafka_2.13-3.5.0/bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c /home/kafka_2.13-3.5.0/config/kraft/server.properties
Formatting /tmp/kraft-combined-logs with metadata.version 3.5-IV2.
[root@kafka01 kraft]# bash /home/kafka_2.13-3.5.0/bin/kafka-server-start.sh -daemon /home/kafka_2.13-3.5.0/config/kraft/server.properties
[root@kafka01 kraft]# ls -ltr  /tmp/kraft-combined-logs
total 8
-rw-r--r-- 1 root root  86 Jun 23 11:07 meta.properties
-rw-r--r-- 1 root root 249 Jun 23 11:07 bootstrap.checkpoint
-rw-r--r-- 1 root root   0 Jun 23 11:07 recovery-point-offset-checkpoint
-rw-r--r-- 1 root root   0 Jun 23 11:07 log-start-offset-checkpoint
-rw-r--r-- 1 root root   0 Jun 23 11:07 replication-offset-checkpoint
drwxr-xr-x 2 root root 187 Jun 23 11:07 __cluster_metadata-0
[root@kafka01 kraft]# more /tmp/kraft-combined-logs/meta.properties
#
#Fri Jun 23 11:07:34 UTC 2023
cluster.id=kx9G7XV3TOiagO4Fsu1FYQ
version=1
node.id=1
[root@kafka01 kraft]#

references:

https://kafka.apache.org/35/documentation

https://kafka.apache.org/downloads

https://hevodata.com/learn/kafka-without-zookeeper

https://adityasridhar.com/posts/how-to-easily-install-kafka-without-zookeeper

1

There are 1 best solutions below

0
chribro On

I had a similar issue, as @OneCricketeer mentioned, each of the quorum members need to be available.

In my case, I had:

controller.quorum.voters=1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
listeners=PLAINTEXT://kafka1:9192,CONTROLLER://kafka1:9093
advertised.listeners=PLAINTEXT://kafka1:9192

I could ping kafka2, kafka3 from kafka1 and vice-versa, but occasionally I'd get java.net.UnknownHostException thrown.

In my Docker Compose file I needed to also set:

kafka1:
  hostname: kafka1 # this was missing
...

However, in your case you have the IP's set up but are only listening on the default interface:

[email protected]:19092,[email protected]:19093,[email protected]:19094
listeners=PLAINTEXT://:9092,CONTROLLER://:19092
advertised.listeners=PLAINTEXT://:9092

So its not the hostname. Perhaps you could try to bind the listener ports to all interfaces, rather than the default. E.g.

listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:19092
advertised.listeners=PLAINTEXT://0.0.0.0:9092

Alternatively, if you could try using the respective IP's to only listen on that interface for each node.

Reference: https://kafka.apache.org/35/documentation/#brokerconfigs_listeners