Implementing lambda architecture using Hadoop, Apache Spark, HBase

951 Views Asked by At

I want to implement lambda architecture with a simple example. I am not able to fit my technology stack into each layer of lambda architecture.

I want to find out top 10 popular hashtags in twitter tweets in real time. I am listing the purpose of each layer hare from http://lambda-architecture.net/.

  • All data entering the system is dispatched to both the batch layer and the speed layer for processing.
  • The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views.
  • The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.
  • The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
  • Any incoming query can be answered by merging results from batch views and real-time views.

What part of my problem I can solve with each layer. I am working with Apache Spark & Hadoop HDFS technology stack.

1

There are 1 best solutions below

0
On

I believe this link can be helpful to you.

http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/

Only you should consider Spark Core instead of using Hive after reading twitter data and moving it to the HDFS by using Flume.