Error loading MySQL/MariaDB Inline Data With Unescaped Double Quotes In Fields

121 Views Asked by At

I am having essentially the same problem as described here but the issue was left unresolved in that question.

I am trying to import a series of data files totaling about 100 million records into a MariaDB database. I've run into issues with some lines in the import file that look like:

"GAYATRI INC DBA "WHIPIN"","1950","S I","","AUSTIN","TX","78704","5124425337","B","93"

which I was trying to load with a statement like:

LOAD DATA INFILE 'testline.txt'
INTO TABLE data
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
(@name,@housenum,@street,@aptnum,@city,@state,@zip,@phone,@business,@year)
SET name=@name, housenum=@housenum, street=@street, aptnum=@aptnum, city=@city, state=@state, zip=@zip, phone=@phone, business=@business, year=@year;

but am receiving errors because the first field contains unescaped double quotes in the text of the field. That seems to be OK in and of itself as the database seems smart enough to handle that in most situations. However, because the field ends with a double quote in the text plus a double quote to close the field it assumes the first double quote is escaping the second double quote following RFC4180 and thus is not terminating the field even though the next character is a comma.

The source files can't be created any differently as they are exports from old software which I do not control. Obviously searching through 100 million records and changing entries like this by hand is not feasible. I'm unsure of whether any fields might contain commas though it's probably safe to assume they do in this quantity of records so programmatically forcing fields to break at commas is probably out too.

Any ideas on how to get them to import correctly?

0

There are 0 best solutions below