Schema on read in hive for tsv format file

1.3k Views Asked by Priyanka Shekhawat At 02 August 2018 at 20:06

I am new on hadoop. I have data in tsv format with 50 columns and I need to store the data into hive. How can I create and load the data into table on the fly without manually creating table using create table statementa using schema on read?

Original Q&A

There are 2 best solutions below

OneCricketeer On 04 August 2018 at 17:32 BEST ANSWER

Hive requires you to run a CREATE TABLE statement because the Hive metastore must be updated with the description of what data location you're going to be querying later on.

Schema-on-read doesn't mean that you can query every possible file without knowing metadata beforehand such as storage location and storage format.

SparkSQL or Apache Drill, on the other hand, will let you infer the schema from a file, but you must again define the column types for a TSV if you don't want everything to be a string column (or coerced to unexpected types). Both of these tools can interact with a Hive metastore for "decoupled" storage of schema information

phaneendra kumar On 03 August 2018 at 08:04

you can use Hue :

http://gethue.com/hadoop-tutorial-create-hive-tables-with-headers-and/

or with Spark you can infer the schema of csv file and you can save it as a hive table.

val df=spark.read
  .option("delimiter", "\t")
  .option("header",true)
  .option("inferSchema", "true") // <-- HERE
  .csv("/home/cloudera/Book1.csv")

Schema on read in hive for tsv format file

There are 2 best solutions below

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in HIVE-TABLE

Trending Questions

Popular # Hahtags

Popular Questions