My use case is that I want to create a reporting tool with around 200 tables each having millions of row and 100s of columns. There will be multiple joins here between the tables to finally create a report. The user will have multiple fields to select and create a report out of it. So, the query will be generated at runtime. I want to understand, what could be the best possible Big Data technology that can be used for this purpose. Current RDBMS may not be able to scale at such high volume of data. We can dump all the data on to HDFS, but how do we implement the joins on it, such that the performance of the reporting application doesn't get affected too much. Any real implementation or links or paper with similar kind of a use case will me help big time.
Create dynamic query with multiple joins in HDFS
79 Views Asked by Aditya At
0
There are 0 best solutions below
Related Questions in HBASE
- Apache atlas UI not showing up
- HBase Zookeeper Connection Error Docker Standalone 2.3.x and 2.4.x
- How does bulkload in databases such as hbase/cassandra/KV store work?
- How to eradicate the slowness caused due to reading rows from bigtable with hbase client in google dataflow job?
- i cant delete the specific column data by Timestamp
- hbase shell QualifierFilter is not filtering out columns when used with logical OR and SingleColumnValueFilter
- Spark - Fetch Hbase table all versions data using HBase Spark connector
- Unable to recover inconsistency in Hbase
- hBase java api, error on bulkload Added a key not lexically larger than previous sort (with JavaPairRDD<ImmutableBytesWritable, KeyValue>)
- Functionality inside completable future is completing quickly but completable future and timelimiter are taking too long
- about hbase put attribute
- java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/Table
- Big Table Java Connectivity issue
- How to check if the Thrift is working on HBase version 2.5 and How to indicate if Thrift 1 or Thrift 2 is installed?
- HMaster stuck at "Initialize ServerManager and schedule SCP for crash servers"
Related Questions in HDFS
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- How to optimize writing to a large table in Hive/HDFS using Spark
- Update hadoop hadoop-2.6.5 to haddop 3.x. Operation category WRITE is not supported in state standby
- Copy/Merge multiple HDFS files using Nifi Processor
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- ERROR flume.SinkRunner: Unable to deliver event
- Apache flume does not run hadoop 3.1.0 Flume 1.11
- Livy session to submit pyspark from HDFS
- ClickHouse Server Exception: Code: 210.DB::Exception: Fail to read from HDFS:
- Confluent HDFS Sink connector error while connecting HDFS to Hive
- Node Transitioned from NEW to UNHEALTHY and Attempting to remove non-existent node
- Error associated with Azure Datalake Gen2 and Hadoop connection
- How do I directly read files from HDFS using dask?
Related Questions in BIGDATA
- How to make an R Shiny app with big data?
- Liquibase as SaaS To Configure Multiple Database as Dynamic
- how to visualize readible big datasets with matplotlib?
- Are there techniques to mathematically compute the amount of searching in greedy graph searching?
- Pyspark & EMR Serialized task 466986024 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes)
- Is there a better way to create a custom analytics dashboard tailored for different users?
- Trigger a lambda function/url with Apache Superset
- How to download, then archive and send zip to the user without storing data in RAM and memory?
- Using bigmemory package in R to solve the Ram memory problem
- spark - How is it even possible to get an OOM?
- Aws Athena SQL Query is not working in Apache spark
- DB structure/file formats to persist a 100TB table and support efficient data skipping with predicates in Spark SQL
- How can I make this matching function faster in R? It currently takes 6-7 days, and this is not practical
- K-means clustering time series data
- Need help related to Data Sets
Related Questions in APACHE-PHOENIX
- How to dynamically get the offset number for the carousel dots basing it off of how many slides are in view
- Authenticate phoenix query server by Knox
- Building Phoenix Storage Handler - finding hbase.compat.version and hbase-thirdparty-version params
- .format("org.apache.phoenix.spark") vs .format("jdbc")
- Apache Phoenix 5.x with Pyspark 3.x
- Concurrent calls to Phoenix using phoenixdb python package
- How do I write an array of type Binary (serialized objects) using Phoenix JDBC?
- Parse json string to search for a particular key from Phoenix table
- Wildcard in subqueries not supported Apache Phoenix SQL
- CDC in Seatunnel with Phoenix DB
- Seatunnel with Phoenix
- Analytic query( like count query ) is causing performance issue in Apache Phoenix
- Apache Phoenix Hbase analytical query performance issue
- Java API connect Apache Phoenix failed,ERROR 2006 (INT08)
- avatica-go client read Phoenix Query Server:[driver: bad connection]
Related Questions in SCHEMA-DESIGN
- Data warehouse design - Multiple lookup values
- XSD equivalent code to this DTD code?
- Create dynamic query with multiple joins in HDFS
- MySQL schema advice for an interactive table with filters?
- DB Relationships
- HBase schema design correct?
- How to associate one table with two tables
- How to reference json schema definition from another schema
- How to add an element to WS-operation in consumer end for 1 provider when there's other providers?
- best method Implements followers with meteor/mongodb
- Purpose of defining an attribute globally in XSD
- Database schema design for privacy settings
- BOM Changes in SQLite
- Is renaming a XSD/WSDL type backward compatible in WS?
- Is modifying existing XSD type by extending it from exact same type backward compatible?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?