Regex for Pipe Delimted with quoted Identifiers

1.5k Views Asked by At

Possible Duplicate:
Parsing CSV files in C#

I have a C# application that parses a pipe delimited file. It uses the Regex.Split method:

Regex.Split(line, @"(?<!(?<!\\)*\\)\|")

However recently a data file came across with a pipe included in one of the data fields. The data field in question used quoted identifers so when you open in Excel it opens correctly.

For example I have a file that looks like:

Field1|Field2|"Field 3 has a | inside the quotes"|Field4

When I use the above regex it parses to:

Field1
Field2
Field 3 has a
inside the quotes
Field4

when I would like

Field1
Field2
Field 3 has a | inside the quotes
Field4

I've done a fair amount of research and can't seem to get the Regex.Split to split the file on pipes but respect the quoted identifiers. Any help is greatly appreciated!

1

There are 1 best solutions below

2
On

Here is a quick expression I've thrown together than seems to do the trick:

"([^"]+)"|([^\|]+)

Though your expression seems to be doing something with \'s as well, so you might need to add to this expression any other needs you have. I've ignored them in my answer because they were not explained in the question and therefore I cannot provide a solution without knowing why they are there - they may in fact not need to be there at all.

Also, my expression ignores empty fields though (i.e. 1||2|3 would come out as 1, 2 and 3 only) and I don't know whether this is what you need, if it isn't let me know and I can change the expression to something that would cater for that too.

Hope this helps anyway.