We have set up a few tables with ENGINE = Kafka() across our three node Clickhouse staging cluster.
In the logs we can frequently see this error:
[rdk:ERROR] [thrd:GroupCoordinator]: 15/15 brokers are down
I am sure the brokers are not down (not a single one). Also checking the topics, none of the currently 16 consumed topics have developed any lag.
Restarting the Clickhouse service with:
systemctl restart clickhouse-server
makes the errors go away (deleting and recreating the tables as well).
I was hoping disabling the DNS cache would help, but it didn't. Is it possible these occur when there is not much data?
Or any other ideas I could try?
SELECT version()
┌─version()─┐
│ 23.1.3.5 │
└───────────┘
Kafka status. Make sure that all of the brokers are up and running
check /
etc/clickhouse-server/config.xmlfile. for followingKAFKA_BROKERSsetting checkIncrease the
KAFKA_RECONNECT_INTERVALsetting. This setting controls how oftenClickhousewill try to reconnect to a downbroker.Increase the
KAFKA_POLL_TIMEOUTsetting. This setting controls how long Clickhouse will wait for a response from Kafka before it considers the broker to be down.If none of these work check
clickhouselogs & post relevant events for further troubleshooting