I am trying to teach myself HIVE and familiarise myself to the HDInsight and HDFS on MicrsoftAzure using the tutorial From Raw Data to Insight using HDP and Microsoft Business Intelligence
I have managed to Staging the data on HDFS and now using AzurePowershell and Microsoft Azure HDInsight Query Console to create a Hive schema on the raw data.
I am trying to use the DDL statement below to create the table ‘price_data':
create external table price_data (stock_exchange string, symbol string, trade_date string, open float, high float, low float, close float, volume int, adj_close float)
row format delimited
fields terminated by ','
stored as textfile
location '/nyse/nyse_prices';
The blob files are located in the container "nyse" and each blob file within the container is named 'nyse_prices/NYSE_daily_prices_.csv'.
I have made sure that the format conforms to Processing data with Hive documentation on MSDN.
When I run the above query it executes successfully and creates the table.
The external table be pointing to the underlying files and therefore should be populated with the data within each csv file.
However when I run the query:
select count(*) from price_data
It returns 0. This is not correct. Can some one let me know what I am doing wrong here.
Cheers
I think the location you are specifying may be incorrect.
You have a default container, which is the container you specify or create when creating an HDInsight container. For example, 'mycontainer'. If I put all the csv files in that container as nyse_prices/filename.csv, then my location would be just '/nyse_prices'. Just the directory that contains the files. The 'container' is treated as the root in this case - '/'.
If the files are not in the default container, or on a different storage account, you can use a location of 'wasb://[email protected]/nyse_prices'.
As a test, I just created nyse_prices/ on my default container and uploaded some of the csv's to it. Then modified your query to use
location '/nyse_prices';
and was able to do selects against the data after that.