Recommendations to store streaming events

450 Views Asked by knightcool At 30 March 2022 at 17:08

We're evaluating possible approaches to persist streaming events(user click events in a web browser from many different users) so that it allows us to build custom user dashboards to later analyse those click events. We're planning to use Kafka to serve as the intermediate layer to ingest the vast amounts of streaming data coming from various user browsers. However I am curious to know whether Kafka can also serve as a persistent database to store these events so that we can later build the dashboarding application and have it query the events via some backend web APIs that we design.

Essentially, this is what we're thinking as of now:

Dashboarding frontend --- API ---> backend service ----queries ----> Kafka(stores user click events)

This article mentions that Kafka can be used as a persistent DB that apps can query but it cannot "replace" the traditional databases. I can imagine the huge cost overhead if Kafka is used as a persistent DB but then Kafka tiered storage might be a possible solution to bring the storage costs down?

Overall, to be able to design a custom dashboard to query the ingested event streams, is it advisable to use Kafka as a DB replacement or should we consider integrating Kafka with a traditional SQL/noSQL database or some other type of database? Any recommendations on which persistent DBs go well with Kafka for these types of use-cases?

Original Q&A

There are 2 best solutions below

OneCricketeer On 30 March 2022 at 21:23

Yes and no.

RocksDB (or a custom state-store) will allow you to "query" Kafka data via KSQL or Kafka Streams; you wouldn't have a direct API replacement against Kafka directly. There is also a recent podcast from Confluent discussing GraphQL queries against Kafka and/or a database layer.

Regarding analysis, it would be far better to use tools like Elasticsearch (with Kibana), Apache Pinot, or Druid (along with Apache SuperSet) for such click-stream analytics and dashboarding, and using Kafka as a channel to get data into those locations.

Asad Awadia On 31 March 2022 at 01:40

In general, your approach of frontend -> backend -> kafka -> db is good. Assuming the throughput is at a point that warrants bringing in kafka.

is it advisable to use Kafka as a DB replacement

should we consider integrating Kafka with a traditional SQL/noSQL database or some other type of database?

Yes

Any recommendations on which persistent DBs go well with Kafka for these types of use-cases?

This depends more on the context, constraints, and requirements of your work place. Expected throughput? What DBs already exist? What programming language is preferred?

You can run olap style dashboard and analytics queries on oltp databases such as postgres. Many teams run their analytics on the read replicas.

The blue chip DBs for this would be elastic search, redash, or big query. The rocket ships are snowflake and clickhouse.

Another option is to allow the data science team [if there is a data science team] to ingest the kafka stream directly into spark or some other system and do their processing directly on the hose to provide the dashboards required

Recommendations to store streaming events

There are 2 best solutions below

Related Questions in APACHE-KAFKA

Related Questions in APACHE-KAFKA-STREAMS

Related Questions in EVENT-STREAM

Related Questions in EVENT-STREAM-PROCESSING

Trending Questions

Popular # Hahtags

Popular Questions