I am using hortonworks sandbox in Azure with spark 1.6. I have a Hive database populated with TPC-DS sample data. I want to read some SQL queries from external files and run them on the hive dataset in spark. I follow this topic Using hive database in spark which is just using a table in my dataset and also it writes SQL query in spark again, but I need to define whole, dataset as my source to query on that, I think i should use dataframes but i am not sure and do not know how! also I want to import the SQL query from external .sql file and do not write down the query again! would you please guide me how can I do this? thank you very much, bests!
how to use a whole hive database in spark and read sql queries from external files?
5.9k Views Asked by Fardin Behboudi At
1
There are 1 best solutions below
Related Questions in APACHE-SPARK
- Spark .mapValues setup with multiple values
- Where do 'normal' println go in a scala jar, under Spark
- How to query JSON data according to JSON array's size with Spark SQL?
- How do I set the Hive user to something different than the Spark user from within a Spark program?
- How to add a new event to Apache Spark Event Log
- Spark streaming + kafka throughput
- dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error
- Spark pairRDD not working
- How to know which worker a partition is executed at?
- Using HDFS with Apache Spark on Amazon EC2
- How to create a executable jar reading files from local file system
- How to keep a SQLContext instance alive in a spark streaming application's life cycle?
- Cassandra spark connector data loss
- Proper way to provide spark application a parameter/arg with spaces in spark-submit
- sorting RDD elements
Related Questions in HIVE
- How do I set the Hive user to something different than the Spark user from within a Spark program?
- schedule and automate sqoop import/export tasks
- PIG merge two lines in the log
- Elephant bird with hive to query protobuf file
- How can we decide the total no. of buckets for a hive table
- How to create a table in Hive with a column of data type array<map<string, string>>
- How to find number of unique connection using hive/pig
- sqoop-export is failing when I have \N as data
- How can we test expressions in hive
- Run Hive Query in R with Config
- Rhive: The messages shows: Not Connected to Hiveserver2 (But can connect HDFS)
- HIVE Query Deleting source data blob
- Hive JOIN of query with subquery takes forever
- What is Metadata DB Derby?
- How could I set the number or size of output files in an "insert" script?
Related Questions in APACHE-SPARK-SQL
- How to query JSON data according to JSON array's size with Spark SQL?
- dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error
- How to keep a SQLContext instance alive in a spark streaming application's life cycle?
- How to setup cassandra and spark
- Where are the API docs for org.apache.spark.sql.cassandra for Spark 1.3.x?
- Spark Cassandra SQL can't perform DataFrame methods on query results
- SparkSQL - accesing nested structures Row( field1, field2=Row(..))
- Cassandra Bulk Load - NoHostAvailableException
- DSE Cassandra Spark Error
- How to add any new library like spark-csv in Apache Spark prebuilt version
- Scala extraction/pattern matching using a companion object
- Error importing types from Spark SQL
- Apache Spark, add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame
- one job takes extremely long on multiple left join in Spark-SQL (1.3.1)
- scala.MatchError: in Dataframes
Related Questions in HADOOP2
- Configure hadoop to tolerate server failures
- How to output multiple values with the same key in reducer?
- Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application
- How to efficiently join two files using Hadoop?
- Yarn autodetect slaves failure
- What happens to orphaned Yarn Child processes?
- How to find the map-side sort time in Hadoop?
- YARN log aggregation on a per job basis
- How to specify the mappers failure threshold for a hadoop mapreduce job?
- Run Apache Hadoop 2.7.0 in Microsoft Windows 8.1
- Hadoop Installation 2.6.0 on Ubuntu 14 - Java Error
- Reuse Hadoop code in Spark efficiently?
- No active nodes in Hadoop cluster
- Hadoop Map Reduce Program to make Service Call
- Image feature extraction in Hadoop
Related Questions in TPC
- how to use a whole hive database in spark and read sql queries from external files?
- TPC-H schema and data someone has?
- Generating TPCH-SF300 and SF1000 data
- Databricks SQL: Type Inference Challenges Using `COPY INTO`
- TPC-DS Query 6: Why do we need 'where j.i_category = i.i_category' condition?
- Generate data from TPC-E using Mac os
- how to speed up TPC-H benchmark on Oracle database
- HammerDB: what does 'Number of Warehouses' mean?
- How to Get Nested Structure Using EF Core 7 TPC
- Can anyone help tranlating this sql queries to functional sql queries for sql server?
- Make utility complains about "undefined reference" when I tried to make TPC-E test
- Running TPC-C (or YCSB) benchmark on KV-store
- MySQL 1114 Error, Table /tmp/#sql is full
- How to generate the TPC-DS benchmarking data 1 TB in AWS S3?
- Generating TPC-DS database for sql server
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Spark Can read data directly from Hive table. You can create, drop Hive table using Spark and even you can do all Hive hql related operations through the Spark. For this you need to use Spark
HiveContextFrom the Spark documentation:
Spark HiveContext, provides a superset of the functionality provided by the basic SQLContext. Additional features include the ability to write queries using the more complete HiveQL parser, access to Hive UDFs, and the ability to read data from Hive tables. To use a HiveContext, you do not need to have an existing Hive setup.
For more information you can visit Spark Documentation
To Avoid writing sql in code, you can use property file where you can put all your Hive query and then you can use the key in you code.
Please see below the implementation of Spark HiveContext and use of property file in Spark Scala.
Entry in Properties File :
Spark submit Command to run this job:
Note: Property File location should be HDFS location.