I have the following string received as part of a file record
1234567890|ABCDE|""|"01|02|03|"|453625|New Account|05736372828|NA|||AT|899
The record is using pipe symbol | as a delimiter, however if the | is appearing within the data that is inside double quotes " it should be not be split and considered as single text e.g. "01|02|03"
I am using regex to try converting the "01|02|03|" data into "01,02,03," before splitting the string using | delimiter, however the regex is not working as expected.
Below is the code snippet written for the requirement using reference from another SO question Regular expression, replace all commas between double quotes
public static void main(String[] args) {
String orig = "1234567890|ABCDE|\"\"|\"01|02|03|\"|453625|New Account|05736372828|NA|||AT|899";
String regex = "(?<=\")([^\"]+?)\\|([^\"]+?)(?=\")";
String old = orig;
String result = orig.replaceAll(orig, "$1,$2");
while (!result.equalsIgnoreCase(old)){
old = result;
result = result.replaceAll(regex, "$1,$2");
}
System.out.println(result);
}
The output from the above code is 1234567890|ABCDE|""|"01,02,03|"|453625|New Account|05736372828|NA|||AT|899 which is not as expected. The | after 03 in "01|02|03|" is not getting replaced with ,.
Appreciate if someone can help correct the regex or share an altogether new regex that would help split the string by retaining the | within the ".
You can use a positive lookahead pattern to match only pipes that are followed by an odd number of double quotes, providing that the double quotes are properly paired:
Demo: https://ideone.com/s0TOq0