Apache Kafka topic per application

749 Views Asked by At

I'm trying to build a PaaS like Ably where I provide users with a easy to use pub/sub system. The thing is that I'm planning to use Kafka but I don't know if it's the right fit for this. Each user can have any number of apps in the PaaS and each app will receive different messages and what I thought was that each app in the PaaS would have a topic in Kafka but the number of apps can grow to millions or even billions if I get a lot of users and Kafka isn't fit for this many topics.

Should I use Kafka for this or look into something else? Maybe there's some other way of separating messages between apps that I don't know of. I can't just put everything into a single topic because then I'd receive trillions of unnecessary messages on the nodes.

2

There are 2 best solutions below

3
On

For your kafka question part :

Update March 2021: With Kafka's new KRaft mode (short for "Kafka Raft Metadata mode"; in Early Access as of Kafka v2.8), which entirely removes ZooKeeper from Kafka's architecture, a Kafka cluster can handle millions of topics/partitions. See https://www.confluent.io/blog/kafka-without-zookeeper-a-sneak-peek/ for details.

As the above feature is not yet architecture recommended for production usage current limit is thousands of topics/partitions in a kafka cluster which is backed by zookeeper

If you would want to provide some service to other applications and customer it is better to provide different topic so you could leverage authentication and authorization mechanism to avoid users to have access to other users data.

1
On

Disclaimer: I work at ably and lead some of our work around Kafka

First thing is that Ably is not built using Kafka, and Kafka is very much unsuited to the task of a service like Ably, in the same way that Ably does not do what Kafka does. Kafka is wonderfully powerful tool with a rich ecosystem but elastic scalability is very much not it's thing. Scaling a topic/partition is a slow process and adding nodes to a running active cluster is not something you can just "do". They do however, work great together

There are streaming solutions better suited to this like Apache Pulsar or Redis (PubSub/Streams), but once again its back to tradeoffs. Pulsar is better with push subscriptions, has functions and can do a lot more. Redis clusters can be scaled elastically and quickly. The tradeoffs being that Pulsar is VERY complex to run, manage and scale, and Redis is ephemeral by default. There are other solutions like NATS

There is a LOT of tech in Ably to allow the various clusters to scale to 10s of millions of connections and channels while maintaining strong guarantees , and none of it is available out of the box from a single open source vendor.

If Kafka is what you want to use Redpanda is likely where you should start. as you are trying to act on each message in a relatively simple fashion their in-line WASM could be very useful. Or you could use Ably ;)