We are using Clickhouse's Kafka Engine with the format_avro_schema_registry_url SETTING as described here.
After some time we see the following errors show up in the logs (ip removed):
Code: 1000. DB::Exception: Timeout: connect timed out: XXX.XXX.XXX.XXX:443: while fetching schema id = 1566: while parsing Kafka message
It seems one of our three Clickhouse nodes has the IP of the kafka schema registry stored somewhere and is keeping it for too long. The IP changes, resulting in the error.
Restarting the Clickhouse service with:
systemctl restart clickhouse-server
makes the errors go away.
Is there a better solution?
You can consider disabling the dns cache in your environment with this setting: https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#disable_internal_dns_cache
There also
SYSTEM DROP DNS CACHEcommand for one of operations: https://clickhouse.com/docs/en/sql-reference/statements/system#drop-dns-cacheFinally, the dns cache update period is controlled by this setting: https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#dns_cache_update_period