How to realize merge operation in Lambda-architecture?

513 Views Asked by At

I am implementing Lambda architecture, using spark and spark streaming for batch layer and speed layer respectively. As to now, I store both batch views and real-time views in HBase but in different table.

I am stuck at how to merge batch views generated by batch views and real-time views generated by speed layer, in order for queries. How to do it right? Should I just dump them into the same HBase table and the client go query directly to the HBase?

1

There are 1 best solutions below

0
On

First of all, I think that HBase is not the best option for real-time views, as heavily loaded random read/random write is not the strongest side of the HBase.

Anyway, the one way can be the following:

  • cache batch view in Spark as DataFrame/DataSet for instance
  • fetch real-time via via Spark and represent it as DataFrame/DataSet too
  • create appropriate pipeline to merge those structures when needed, e.g. upon request from the UI, etc.

Very simplified flow for doing that can be found in my github