HIVE partitioned by column becomes all 0 after inserting data from another table

634 Views Asked by At

I am using Hortonworks to create partitioned table in HIVE and insert data into it using another table in HIVE. The problem is, after I inserted data into the table I created, all values in the partitioned column (passenger_count) in the resulting table shows 0 even though none of the values in the original table are 0.

Below are the steps I have taken to create the partitioned table and insert data into it:

  1. Run the following query to create table called 'date_partitioned':

    create table date_partitioned
    (tpep_dropoff_datetime string, trip_distance double)
    partitioned by (passenger_count int);
    
  2. Run the following query to insert data into 'date_partitioned' table, from another existing table:

    INSERT INTO TABLE date_partitioned
    PARTITION (passenger_count)
    SELECT tpep_dropoff_datetime, trip_distance, passenger_count
    FROM trips_raw;
    

The column types and sample values of the 'trips_raw' are shown in the screenshots below: enter image description here enter image description here

As you can see, the 'passenger_count' column is int type and contains non-zero values. But when I look at the results from the 'date_partitioned' table, the values from the 'passenger_count' column all show 0. The table also created a duplicate 'passenger_count' (so it has 2 'passenger_count' columns, one of which is empty). You can see from the screenshot below:

enter image description here enter image description here

Any advise would be greatly appreciated. I am curious as to why the 'passenger_count' show 0 in the resulted table when the original column has no 0, and why there's an additional 'passenger_count' column in the resulted table.

1

There are 1 best solutions below

0
On BEST ANSWER

Are you sure that all rows loaded for passenger_count is 0? Can you do a COUNT and GROUP BY passenger_count on both tables? Maybe you're just sampling all zeroes?