I have a source data which is close to 1tb and want to load that into synapse sql .however full-load will take time and is not efficient for larger dataset , if i go with incremental approach then in watermark table what should be the timestamp in the beginning ??should i give source data start date in watermark table ??
im trying to create a logic to incrementally load one month data but failing for the logic to give the date range
1 TB is not something that will take a long time, depending on your cost-goals and how much scale you want to use (in short, depends on the compute both on the source and sink side). There is no such thing as full-load being inefficient for 1 TB sized dataset.
And it is not exactly delta load what you mean there, delta load is used for periodically updating after the full load is done. So in your case you want to partition your full load into several steps.
For this you can analyze the timestamps on the data, lets say it goes from 2016 to 2019. Then you can break it up in 4 years and do the load in 4 different runs. First load the timestamps with 2016, then 2017, so on..
You need to provide more information how the logic is failing for the date range, what exact method are you using to load the dataset?