I have a delimited file exported with pipe as the delimiter by someone who didn't use a text qualifier and the text contains pipe characters. There should be 13 columns.
Additional columns have been created in several lines. Some 3, some 4. So, it's not the same number of additional columns created.
The delimiters are fine up till the text/column after the 10th pipe and the text/columns after the last two pipes. I want to replace the extra pipe characters after the 10th pipe and before the last two pipes with spaces regardless of whether it is 1 or more pipes in one go using a Regex pattern.
I have tried this:
^([^|]*)(?!(?:\|[^|\r\n]*){12}$)((?:\|[^|\r\n]*){10})(?:(\|)([^|\r\n]*))((?:\|[^|\r\n]*){2})$
replace with $1$2$4$5.
This only worked for where there were 14 columns. Lines with a larger number of columns are not matched.
Here's a line with more than 14 columns:
5070599047|SA|03/31/2023 00:00:00|04/03/2023 00:00:00|23121|152|65642.950000|0|0|XP-OFS-222761665|ZIB|BRH|NPAU|31-03-2023|749542|30271246.390000|342
The result I need:
5070599047|SA|03/31/2023 00:00:00|04/03/2023 00:00:00|23121|152|65642.950000|0|0|XP-OFS-222761665|ZIB BRH NPAU 31-03-2023 749542|30271246.390000|342
I need to remove all pipes after ZIB and before |30271246.390000|342 and keep the text in that space. I also need a pattern that will work for lines with more than 13 columns regardless of the additional number of columns to reduce them all to 13 columns.
The first 10 and last two pipes in each line are to be preserved. Each of the four others is to be replaced with a space. Each match of the following regular expression (with
gandmflags1 set) can be replaced with a space.If there are 12 or fewer pipes no substitutions are made.
Demo
Notepad++ uses the Boost regex engine, which differs only slightly from the PCRE engine, which is the one used at the "Demo" link. In particular, both support
\Kand\G.\Kcauses previously-matched tokens to be discarded from the match that is returned and resets the start of match to the current location in the string.\Gsets the current position to be at the end of the previous match.The expression can be broken down as follows.2
1.
gcauses the regex engine to make multiple matches, if possible.mcauses^and$to represent end-of-line (rather than end-of-string) anchors, respectively.2. Also, by hovering over each part of the expression at the link (the cursor, not you, yourself) one can obtain an explanation of its function.