I would like to import about 9.000.000 rows from an xml file to a MySql server. Currently I'm inserting the data rows by row, which is very slow. I'm able to upload about 50 rows / sec, which means it's gonna take days to complete. For another project I loaded similar data into a data table with 5000 rows at a time, and then i would bulk insert all 5000 rows at once. This made me reach about 7.500 rows / sec. Problem is that was with an SQL server, and this is MySQL. I can't find any one using the MySqlBulkLoader class from a data table. Is this possible and how would i go about doing it?
Example of first 750 elemets in the xml file: http://view.qrdetector.dk/test.xml
These are the columns i need in my database from the xml file.
'Create datatable to hold the information from the XML file
Dim ReadXML_DT As New DataTable
ReadXML_DT.Columns.Add("KoeretoejIdent", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejArtNavn", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejAnvendelseNavn", GetType(String))
ReadXML_DT.Columns.Add("RegistreringNummerNummer", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejOplysningStatus", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejOplysningFoersteRegistreringDato", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejOplysningStelNummer", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejMaerkeTypeNavn", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejModelTypeNavn", GetType(String))
ReadXML_DT.Columns.Add("KoeretoejVariantTypeNavn", GetType(String))
ReadXML_DT.Columns.Add("DrivkraftTypeNavn", GetType(String))
ReadXML_DT.Columns.Add("SynResultatSynsType", GetType(String))
ReadXML_DT.Columns.Add("SynResultatSynsDato", GetType(String))
ReadXML_DT.Columns.Add("SynResultatSynStatusDato", GetType(String))
ReadXML_DT.Columns.Add("SidsteSynTjek", GetType(String))
I've manually made a the CSV file with 130.000 rows with the 15 colums I need. Then I used the bulk insert code from Plutonix's reply. I'm now able to parse the 130.000 rows in about 215 sec, which gives me an average speed of about 600 rows / sec. This is pretty much the same result as before. Is this because of my connection to the MySQL server?
Dim sw As Stopwatch = New Stopwatch
sw.Start()
' Finally, BulkLoad
Dim cols As String() = {"KoeretoejIdent", "KoeretoejArtNavn", "KoeretoejAnvendelseNavn", "RegistreringNummerNummer", "KoeretoejOplysningStatus", "KoeretoejOplysningFoersteRegistreringDato", "KoeretoejOplysningStelNummer", "KoeretoejMaerkeTypeNavn", "KoeretoejModelTypeNavn", "KoeretoejVariantTypeNavn", "DrivkraftTypeNavn", "SynResultatSynsType", "SynResultatSynsDato", "SynResultatSynStatusDato", "SidsteSynTjek"}
Dim rows As Integer = 0
Using dbcon As New MySqlConnection(connectionString)
Dim bulk = New MySqlBulkLoader(dbcon)
bulk.TableName = "synsbasen_testLoad"
bulk.FieldTerminator = "^"
bulk.LineTerminator = "\r\n" ' == CR/LF
bulk.FileName = "C:/Users/Synsbasen/Desktop/abc.csv" ' full file path name to CSV
bulk.NumberOfLinesToSkip = 1 ' has a header (default)
bulk.Columns.Clear()
For Each s In cols
bulk.Columns.Add(s) ' specify col order in file
Next
rows = bulk.Load()
End Using
sw.Stop()
' SW is a stopwatch
MsgBox(rows & "rows converted and loaded in " & sw.Elapsed.TotalSeconds & " secs")
This will read in one million rows from XML, extract a subset of the data, export to CSV (using CSVHelper), then load them to MySql using
MySqlBulkLoaderin about 30 seconds.Yours will take longer naturally, but 1,000,000 rows is already kind of large, so it should scale decently enough. If you wanted to load them in batches use something like 500k. It would be easy to write the CSV in parts using
Skip()andTake().The interim steps took 9 secs to select the parts from the XML, 15 secs to write the CSV, 7 secs for MySQL to load the data.
The specifics of your XML file are a mystery (the link was added to the question well after this was posted). Since you can load it to a
DataTable, the test just used the result ofmyDT.WriteXml(...), so you may have to change that part. Leave it as a linq query and letCSVHelperconsume it since the applet itself doesn't need the collection data at all.The only "trick" was to format the
DateTimefield so MySQL could parse it:CSVHelper is very cool, very powerful and integral to the results.
SmSampleis just a class with the properties shown which is the desired columns from the larger XML data. Its roles are to a) shed the extraneous columns you don't need/want as soon as possible and b) provide the 'holder' for the data for CSVHelper to act upon.The answer also uses the built in
MySqlBulkLoadertool which I find easier to use than the SQLLOAD DATA LOCAL INFILEform.