Is this a use case for a database?
I have a distributed service that streams data from Kafka, each service aggregates that data and stores it locally in a state store. We end up with each pod having a subset of the processed data. The processed data consists of one entry per hour that shows the number of Kafka messages with certain characteristics received in that hour. Then whenever a REST request is received, the pod that gets the request proceeds to query all other pods, also via REST, and merge the information to return it.
- Max foreseen number of updates per second: 100. Each update consists of incrementing or decrementing a counter (create entry if not exists else increment counter field by 1). A row will be created once per hour.
- Max foreseen number of external REST requests per second: 10, but often less than 1 (which means that every pod gets 10 since they only have a subset of the information and the one that received the external request will query all others to get the full data)
Extra info: The data is aggregated using Kafka Streams and the result is stored in Kafka Streams default state store (RocksDB).
With all this said, I considered using a database instead.
- Using local state stores: if each pod processes the information individually we get better write performance, which is what we have the most of.
- Using a SQL database: if I use a database instead, then I will avoid every pod being stressed for every single request and instead stress the database. But while I can scale up the number of pods I can't easily scale a SQL database.
It would then seem that using state stores provide more benefit since write operations are way more frequent, but the idea of a pod querying all others seem instinctively wrong to me due to the complexity. Does it make any sense to use a database here? Given we already have one set-up being used for other functionalities. Note: we don't need any complex query, it's O(1) to the records we need to fulfill requests.