How to realize merge operation in Lambda-architecture?

522 Views Asked by Eric Zheng At 20 June 2025 at 22:26

I am implementing Lambda architecture, using spark and spark streaming for batch layer and speed layer respectively. As to now, I store both batch views and real-time views in HBase but in different table.

I am stuck at how to merge batch views generated by batch views and real-time views generated by speed layer, in order for queries. How to do it right? Should I just dump them into the same HBase table and the client go query directly to the HBase?

Original Q&A

There are 1 best solutions below

Taras Matyashovskyy On 28 May 2016 at 06:46

First of all, I think that HBase is not the best option for real-time views, as heavily loaded random read/random write is not the strongest side of the HBase.

Anyway, the one way can be the following:

cache batch view in Spark as DataFrame/DataSet for instance
fetch real-time via via Spark and represent it as DataFrame/DataSet too
create appropriate pipeline to merge those structures when needed, e.g. upon request from the UI, etc.

Very simplified flow for doing that can be found in my github

How to realize merge operation in Lambda-architecture?

There are 1 best solutions below

Related Questions in LAMBDA-ARCHITECTURE

Trending Questions

Popular # Hahtags

Popular Questions