HIVE Query Deleting source data blob

173 Views Asked by Chris Seeling At 13 June 2025 at 14:56

I am using a Azure HdInsight(3.1.3.577).

running the HIVEQL statement: LOAD DATA INPATH '/myData/employee.txt' INTO TABLE employee;

loads the data correctly but also has the side effect of removing the source text file. This behavior is puzzling to me.

In the documentation (https://cwiki.apache.org/confluence/display/Hive/GettingStarted) there is the following "loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous."

My confusion is why this would be efficient, given that the HDFS (Azure blobstore) has to be loaded afresh with the source data for each run.

Original Q&A

There are 2 best solutions below

sras On 05 June 2015 at 07:26

Hive uses HDFS to store it's tables data under default location "/user/hive/warehouse".

When the data is already exists in HDFS, we create a External table and provides the HDFS path using LOCATION keyword. This will not result in moving the file / directory to default location.

By doing this Hive assumes that the data exists in the given path and it doesn't owns the data.

Even if you drop the table, the data specified in the path still exists.

Kishore On 05 June 2015 at 07:31

try this,

create external table myTable (Userid string, name string)
row format delimited
fields terminated by '\t'
LOCATION '/myData/employee.txt' ;

Location should be in hdfs;

HIVE Query Deleting source data blob

There are 2 best solutions below

Related Questions in AZURE

Related Questions in HADOOP

Related Questions in HIVE

Trending Questions

Popular # Hahtags

Popular Questions