Create dynamic query with multiple joins in HDFS

79 Views Asked by At

My use case is that I want to create a reporting tool with around 200 tables each having millions of row and 100s of columns. There will be multiple joins here between the tables to finally create a report. The user will have multiple fields to select and create a report out of it. So, the query will be generated at runtime. I want to understand, what could be the best possible Big Data technology that can be used for this purpose. Current RDBMS may not be able to scale at such high volume of data. We can dump all the data on to HDFS, but how do we implement the joins on it, such that the performance of the reporting application doesn't get affected too much. Any real implementation or links or paper with similar kind of a use case will me help big time.

0

There are 0 best solutions below