why there is a data mismatch in hive and bigSQL by 1 record?

168 Views Asked by At

I have created a hive table and integrated it to bigSQL. in the hive my count is proper, but in bigSQL, the record count is extra by 1. Below is the table properties that I have used to create the hive table.

create table test(name string,age int,sal float,city string,country string,emp_id int,increment int)
ROW FORMAT DELIMITED FIELDS TERMINATED  BY '|'
STORED AS TEXTFILE
LOCATION '/user/test'
tblproperties ("skip.header.line.count"="1");

The textfile that I am loading has column names in the very first row. So I have to use the

tblproperties ("skip.header.line.count"="1");

When I do a count query in hive, I get below output

Total MapReduce CPU Time Spent: 7 seconds 440 msec
OK
48203

However, when I synced the table in bigSQL, I am getting below count

+-------+
|     1 |
+-------+
| 48204 |

Any idea, where am I committing the mistake?

thanks

1

There are 1 best solutions below

0
learner On

I found the workaround for this problem.

1) We need to create a temp hive table with tblproperties ("skip.header.line.count"="1");.
2) Load the file on this temp table.
3) create another table without tblproperties ("skip.header.line.count"="1");.
4) insert into tbl select * from temo_tbl;.