I posted this question
INSERT Statement Expensive Queries on Activity Monitor
As you will see the XML structure has different levels.
I have created different tables
Organisation = organisation_id (PRIMARY_KEY)
Contacts = organisation_id (FOREIGN_KEY)
Roles = organisation_id (FOREIGN_KEY)
Rels = organisation_id (FOREIGN_KEY)
Succs = organisation_id (FOREIGN_KEY)
What I want is to generate the organisation_id
and do the insert on each table in cascading manner. At the moment the process takes almost 2 hours for 300k. I have 3 approach
Convert XML to List Object and Send by batch(1000) as JSON text and send to a stored procedure the uses OPENJSON
Convert XML to list object and send by batch (1000) and save the batch as JSON a file that SQL Server can read and pass the filepath on a stored procedure which then opens the JSON file using OPENROWSET and OPENJSON
Send the path to XML to a stored procedure then use OPENROWSET and OPENXML.
All process (1-3) inserts the data into a FLAT temp table then iterate each row to call different INSERT stored procedure for each tables. Approach #3 seems to fail with errors on 300k but works on 4 records.
The other question is, will it be much faster if I use an physical table than a temp table?
-------UPDATE------- As explained on the link, I was doing while loop. Someone suggested / commented to do a batch insert on each of the table. The problem is, for example, Contacts I can only do this if I know the organisation_id
select
organisation_id = IDENTITY( bigint ) -- IF I CAN GENERATE THE ACTUAL ORGANISATION ID
,name = Col.value('.','nvarchar(20)')
,contact_type = c.value('(./@type)[1]','nvarchar(50)')
,contact_value= c.value('(./@value)[1]','nvarchar(50)')
into
#temporganisations
from
@xml.nodes('ns1:OrgRefData/Organisations/Organisation') as Orgs(Col)
outer apply Orgs.Col.nodes('Contacts/Contact') as Cs(c)
Then when I do the batch insert
insert into contacts
(
organisation_id,type,value
)
select
torg.organisation_id -- if this is the actual id then perfect
,torg.type
,torg.value
from #temporg torg
I would suggest that you shred the XML client-side, and switch over to doing some kind of Bulk Copy, this will generally perform much better.
At the moment, you cannot do a normal
bcp
orSqlBulkCopy
, because you also need the foreign key. You need a way to uniquely identifyOrganisation
within the batch, and you say that is difficult owing to the number of columns needed for that.Instead, you need to generate some kind of unique ID client-side, an incrementing integer will do. You then assign this ID to the child objects as you are shredding the XML into
Datatables
/IEnumerables
/ CSV files.You have two options:
IDENTITY
fromOrganisationId
and just directly insert your generated ID. This means you can leverage standardSqlBulkCopy
procedures.The downside is that you lose the benefit of automatic
IDENTITY
assignment, but you could instead just use theSqlBulkCopyOptions.KeepIdentity
option which only applies to this insert, and carry on withIDENTITY
for other inserts. You would need to estimate a correct batch of IDs that won't clash.A variation on this is to use GUIDs, these are always unique. I don't really recommend this option.
You need to define equivalent Table Types for each of the tables. Each has a column for the temporary primary key of the
Organisation
Pass through the shredded XML as Table-Valued Parameters. You would have
@Organisations
,@Contacts
etc.Then you would have SQL along the following lines:
OUTPUT
to client, and have the client join the IDs to the child tables, then BulkCopy them back again as part of the child tables.This makes the SQL simpler, however you still need the
MERGE
, and you risk complicating the client code significantly.