I've never designed a distributed system before, so I'd like to ask for help here.
Suppose there is a third-party server that sends n types of messages once a minute, n > 5000.
I need to receive these messages and pass them to my system. Each of these messages must be delivered to each consumer.
In order to protect against network outages or any other reason why these messages could be lost, I want to make several replicas of the service, each of which will subscribe to all n types of messages. And then I need to perform deduplication somehow, so that consumers don't drown in a huge number of identical messages.
I have 2 ideas.
1 idea: Use redis cluster for deduplication, and then either read them from redis directly by consumers (this will be a bit faster, but I'm not sure if the same message can be read by multiple services) or send them to the kafka topic via kafka connect. The minus of this approach is that I'm not sure that a redis cluster with a small number of replicas will handle a load of 17000 * (number of replicas reading) rps
2 idea:
For replicas that read n types of messages we should write a master server that will distribute evenly n types of messages to the replicas. Each replica will write only the types that the master told it to write to the topic. At the same time all the remaining messages the replica also reads, but only checks with the topix, if there is a difference after timeout the replica will send those messages that are missing in the topix. The problem here is that if one of the replicas falls and does not have time to send all its messages, then the other replicas will send many copies of the same message, and then a rebalancing by the master will be needed.
These are my thoughts. Perhaps you have an idea on how to do it more easily