I don't have much of an experience with Kafka/ Spark-Streaming, but I have read many articles on how great the combo is in building real time systems for analysis/dashboards. Can someone explain to me why spark-streaming can't do it alone? In other words, why is Kafka in between the data source and spark-streaming?
Thanks
For processing data using Spark,we need to provide data through different data sources which is supported by Spark. (Or we need to write our own custom data source)
If it is static data, spark provides
In streaming case spark supporting data from different sources like
Kafka,Flume,Kinesis,Twitter,ZeroMQ,MQTT etc.
And Spark support simple socket Streaming also,
val lines = ssc.socketTextStream("localhost", 9999)
For more
Kafka is a high-throughput distributed messaging system. Kafka's distributed behavior, scalability and fault tolerance give an advantage over other messaging systems. (MQTT, ZMQ etc)
So question is among these data sources which one is yours ? You can replace kafka data source with your own. We are using MQTT as default source.