I want to implement lambda architecture with a simple example. I am not able to fit my technology stack into each layer of lambda architecture.
I want to find out top 10 popular hashtags in twitter tweets in real time. I am listing the purpose of each layer hare from http://lambda-architecture.net/.
- All data entering the system is dispatched to both the batch layer and the speed layer for processing.
- The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views.
- The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.
- The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
- Any incoming query can be answered by merging results from batch views and real-time views.
What part of my problem I can solve with each layer. I am working with Apache Spark & Hadoop HDFS technology stack.
I believe this link can be helpful to you.
http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/
Only you should consider Spark Core instead of using Hive after reading twitter data and moving it to the HDFS by using Flume.