Best way to automate processing of daily XML files into SQL database

164 Views Asked by At

I need to process a frequent flow of XML files into a SQL Server database on an automated basis. I have a sample XML file and have gotten as far as being able to process the data via SQL query using OPENROWSET.

However, this only works on a manual basis as I'm told that OPENROWSET will not allow use of a variable to define the name / path of the file to be imported.

This is the code:

WITH rs (xmlData) AS
(
   SELECT TRY_CAST(BulkColumn AS XML) 
   FROM OPENROWSET(BULK N'C:\Users\SFLYNN01\Desktop\jobStatusUpdate-C158788-20231017161702.xml', SINGLE_BLOB) AS x
)
SELECT 
      c.value('(JobRef/text())[1]', 'VARCHAR(20)')AS JobRef
    , c.value('(Stages[Type/text()="1"]/Address/AddressCode/text())[1]', 'VARCHAR(20)')AS ColCode
    , c.value('(Stages[Type/text()="1"]/DateTimeFrom/text())[1]', 'DATE') AS ColDate
    , c.value('(Stages[Type/text()="1"]/DateTimeFrom/text())[1]', 'TIME') AS ColTime
    , c.value('(Stages[Type/text()="2"]/Address/AddressCode/text())[1]', 'VARCHAR(20)')AS DelCode
    , c.value('(Stages[Type/text()="2"]/DateTimeFrom/text())[1]', 'DATE') AS DelDate
    , c.value('(Stages[Type/text()="2"]/DateTimeFrom/text())[1]', 'TIME') AS DelTime
    , c.value('(Response/JobTrackingURL/text())[1]', 'VARCHAR(20)')AS TrackingURL
    , c.value('(statusId/text())[1]', 'VARCHAR(20)')AS Status
FROM 
    rs
CROSS APPLY 
    xmldata.nodes('/job') AS t(c)

Can anyone suggest a method where I can automatically process the XML files in my database as they drop in to a waiting SFTP folder? The amount of files will be too much for a manual process. Thanks.

2

There are 2 best solutions below

2
On

What tool do you use?

$filePath = "`'C:\Temp\print4.pdf`'"
$select = "
WITH rs (xmlData) AS
    (SELECT TRY_CAST(BulkColumn AS XML) 
   FROM OPENROWSET(BULK Nxpathx, SINGLE_BLOB) AS x)
SELECT 
     c.value('(JobRef/text())[1]', 'VARCHAR(20)')AS JobRef
   , c.value('(Stages[Type/text()=`"1`"]/Address/AddressCode/text())[1]', 'VARCHAR(20)')AS ColCode
   , c.value('(Stages[Type/text()=`"1`"]/DateTimeFrom/text())[1]', 'DATE') AS ColDate
   , c.value('(Stages[Type/text()=`"1`"]/DateTimeFrom/text())[1]', 'TIME') AS ColTime
   , c.value('(Stages[Type/text()=`"2`"]/Address/AddressCode/text())[1]', 'VARCHAR(20)')AS DelCode
   , c.value('(Stages[Type/text()=`"2`"]/DateTimeFrom/text())[1]', 'DATE') AS DelDate
   , c.value('(Stages[Type/text()=`"2`"]/DateTimeFrom/text())[1]', 'TIME') AS DelTime
   , c.value('(Response/JobTrackingURL/text())[1]', 'VARCHAR(20)')AS TrackingURL
   , c.value('(statusId/text())[1]', 'VARCHAR(20)')AS Status
FROM rs
CROSS APPLY xmldata.nodes('/job') AS t(c)"

$select = $select -replace 'xpathx', $filePath

You can then use send $select. As you did not write what database you use, I cannot say how to go on from above example. But PowerShell has addons for many different databases.

0
On

You can use dynamic SQL to run your OPENROWSET statement, and grab the XML as an output variable:

declare 
    @SQL nvarchar(max),
    @XML xml,
    @FileName nvarchar(1000) ='C:\Users\SFLYNN01\Desktop\jobStatusUpdate-C158788-20231017161702.xml'

select @SQL = N'
   SELECT @XML = TRY_CAST(BulkColumn AS XML) 
   FROM OPENROWSET(BULK ', quotename(@FileName, ''''), ' SINGLE_BLOB) AS x'

exec sp_executesql
    @SQL,
    N'@XML output',
    @Xml output

SELECT 
...
FROM @Xml.nodes('/job') AS t(c)

After that, you can do your parsing as normal, just against the variable. Note, you could also do all the parsing in dynamic sql as well, using the CTE like you originally had, but thats a heck of a lot of escaping single quotes that aint nobody got time for.

As far as a full blown process to ingest XML files, you'll need some way to get files from the SFTP. That's either done by polling the SFTP, or if you're lucky enough to have some notification come from it, acting upon that. It's going to depend heavily on what tools and expertise you have at your disposal as to what you build the process with. Airflow would be a good general option, but you could probably just as easily script an agent job running a powershell script to do the same.