Regex: Remove line breaks in double quote field?

Question

Regex: Remove line breaks in double quote field?

1.3k Views Asked by Dan At 04 September 2019 at 13:10

I have a .csv file with this error that I want to correct with regular expression, some fields contain line breaks, example:

"abc

de
f 123",123,456

Should be in one line, I want to remove line-breaks only but to keep the text

"abcdef 123",123,456

I tried to isolate within double quotes, but this removes only first line-break:

^(?:"[0-9a-zA-Z])\r?\n(?=",)

(Text can include numbers and other characters, so I tried to include them, I only want to remove line-breaks and keep all other characters, hope it is clear)

Original Q&A

There are 1 best solutions below

**Nick Reed** · Accepted Answer · 2019-09-04T18:05:01.903000

It's not possible to capture multiple newlines in a single query if you're using a repeated capture group; the regex engine can only grab the last match. That being said, if you're using powergrep (or some other search-and-replace that can selectively replace capture groups, not the whole match), do you really need a one-liner?

^"(?:[^"\n]|(\n+))*", will look between " and ", and match any amount of text that doesn't include a ", keeping it within your quoted statement - or, alternatively, will capture the last group of newlines it finds. If your tool can remove/replace text only in the capture group, why not use this regex a few times in a row? It'll leave your typo-free rows alone, but will remove one chunk of whitespace from your erroneous rows each time it's run. (Try it here! Note that this matches all lines, but only captures whitespace for the badly formatted ones)

How will you know when you're done? Try using ^(?=.*\n.*)"[^"]*", - it'll match any lines in your csv file that still have newlines, but will ignore properly formatted lines. When this regex returns no matches, you can be confident your file is typo-free. (Try it here!)

It's not a very elegant solution, but if you run it enough times, you'll get rid of all the whitespace.

Regex: Remove line breaks in double quote field?

There are 1 best solutions below

Related Questions in REGEX

Related Questions in POWERGREP

Trending Questions

Popular # Hahtags

Popular Questions