I am creating a clinical data warehouse, so I am testing different scenarios. I am loading the below tables from oracle DB (Attunity connector) to SQL DB (OLE DB):
Table1 1.2 GB(3 million rows) Table2 20 GB(200 million rows) Table3 100 GB(250 million rows) Table4 25 GB(60 million rows)
For my initial load I am planning to use SSIS and just select * from TABLE1/TABLE2/TABLE3/TABLE4
Questions :
Is it ok to have multiple data flow tasks for loading each table in one package. So that they are all running together. i just wanted to improve the speed with that. But somehow it is slower than if I run it individually.
Also for loading complete tables is "select * from table" a good way? It seems pretty slow!!
You can have as many parallel data flow tasks executing as the number of processor cores you have minus one. That is, if you are using an octacore processor, the ideal number of parallel tasks is 7 (8 -1 ). Just put in it different sequence containers(not compulsory,but for the sake of readability) and execute.
You can speed up the data load by adjusting several things like the setting
DelayValidation=true
and usingOPTION ( FAST 10000(or any value,just do some trials))
and also play around with theDefaultBufferSize
andDefaultBufferMaxRows
until you get the right one. Also, check if theMAXDOP
value is not set to1
int the settings, if you intend to run parallel DFTs.And, NEVER use
SELECT * from table_name
. List out the column names,*
adds additional overhead and can slow down your query considerably.