We're evaluating possible approaches to persist streaming events(user click events in a web browser from many different users) so that it allows us to build custom user dashboards to later analyse those click events. We're planning to use Kafka to serve as the intermediate layer to ingest the vast amounts of streaming data coming from various user browsers. However I am curious to know whether Kafka can also serve as a persistent database to store these events so that we can later build the dashboarding application and have it query the events via some backend web APIs that we design.
Essentially, this is what we're thinking as of now:
Dashboarding frontend --- API ---> backend service ----queries ----> Kafka(stores user click events)
This article mentions that Kafka can be used as a persistent DB that apps can query but it cannot "replace" the traditional databases. I can imagine the huge cost overhead if Kafka is used as a persistent DB but then Kafka tiered storage might be a possible solution to bring the storage costs down?
Overall, to be able to design a custom dashboard to query the ingested event streams, is it advisable to use Kafka as a DB replacement or should we consider integrating Kafka with a traditional SQL/noSQL database or some other type of database? Any recommendations on which persistent DBs go well with Kafka for these types of use-cases?
In general, your approach of
frontend -> backend -> kafka -> db
is good. Assuming the throughput is at a point that warrants bringing in kafka.No
Yes
This depends more on the context, constraints, and requirements of your work place. Expected throughput? What DBs already exist? What programming language is preferred?
You can run olap style dashboard and analytics queries on oltp databases such as postgres. Many teams run their analytics on the read replicas.
The blue chip DBs for this would be elastic search, redash, or big query. The rocket ships are snowflake and clickhouse.
Another option is to allow the data science team [if there is a data science team] to ingest the kafka stream directly into spark or some other system and do their processing directly on the hose to provide the dashboards required