Sqoop incremental

100 Views Asked by ayushman At 28 January 2020 at 16:23

Is it possible to have incremental load by Sqoop for teradata on the basic of load time whose datatype is timestamp(6)? How it handles the format while loading in hdfs:

sql exxeption:- expecting something '(' and ').

Original Q&A

There are 1 best solutions below

Snigdhajyoti On 13 June 2020 at 10:31

So you have one column which is loadtime [type timestamp(6)].
While pulling the data to HDFS you have to create a 3 step process. Also have to create partition in hdfs end.
I prefer this way because you don’t to mess with older data.
But you could use --incremental option in sqoop.

Fetch the max of loadtime present in the hdfs. If that is 1st time pass a default value something like 0000-00-00 00:00:00.000000
While pulling data via sqoop add one condition pass that WHERE loadtime > ${max_of_loadtime_in_hdfs}. You will be passing max_of_loadtime_in_hdfs as an argument.
Save the data in a newer partition

Now timestamp format would be different for teradata and hive.
It's better to cast to string while saving it to hive.

So while checking the loadtime with your hive loadtime(string type) you have to cast back to timestamp(6).
So in your condition in sqoop end will be (which ever works for you):

WHERE loadtime > CAST(${max_of_loadtime_in_hdfs} AS TIMESTAMP(6))
WHERE loadtime > TO_TIMESTAMP(${max_of_loadtime_in_hdfs}, 'yyyy-mm-dd hh24:mi:ss.ff6')

max_of_loadtime_in_hdfs will be something like this 2020-12-31 23:59:59.999999

So in your final sqoop query:

SELECT
    column_A,
    column_B,
    ...
    column_N,
    to_char(loadtime, 'yyyy-mm-dd hh24:mi:ss.ff6') loadtime
FROM teradata_table
WHERE loadtime > TO_TIMESTAMP(${max_of_loadtime_in_hdfs}, 'yyyy-mm-dd hh24:mi:ss.ff6') 
AND $CONDITIONS --this is for sqoop don’t remove this

hive table definition:

CREATE EXTERNAL TABLE teratable_in_hive (
    column_A <column_A_datatype>,
    column_B <column_B_datatype>,
    ...
    column_N <column_N_datatype>,
    loadtime string
)
PARTITIONED BY (pull_seq_no <type can be int or string>)
STORED AS <file_format TEXTFILE / PARQUET>
LOCATION 'hdfs:///<some-path>/teratable_in_hive'

Sqoop incremental

There are 1 best solutions below

Related Questions in HDFS

Related Questions in SQOOP

Related Questions in SQOOP2

Trending Questions

Popular # Hahtags

Popular Questions