Read csv file into R with commas between single and double quotes

100 Views Asked by At

I am trying to read a csv file into R that looks like that:

"""V1"",""V2"",""V3"",""V4"", ""V5"""
"245, ""Ab"", """",""amp, phen +, len"", 0"
"247, ""Af"", NA,""amp, len"", 0"
"248, ""Ac"", """",""opi"", 0"

Desired output would be a dataframe like that:

tibble(
  V1 = c(245,247,248),
  V2 = c("Ab","Af","ac"),
  V3 = c(NA, NA, NA),
  V4 = c("amp, phen +, len", "amp, len", "opi"),
  V5 = c(0,0,0)
)

I am not getting along with the mix of single quotes and double quotes. Annoyingly, every row (including the header) starts and ends with a single quote. On the other hand, strings are marked with double quotes (""string""). One of the main problems is that some of the strings contain commas in between (for instance ""amp, phen +, len"").

I have tried several things, for instance read_csv with quote = "" or quote = "\"", nothing leading to the desired output.

If I ignore the quotes with quote = "", then the commas that lie within quotes (in V4) are seen as delimiters and not as part of the string anymore. So everything after the first comma is shifted to the next column. I get a dataframe that looks like that:

tibble(
  V1 = c(245,247,24),
  V2 = c("Ab","Af","ac"),
  V3 = c(NA, NA, NA),
  V4 = c("amp", "amp", "opi"),
  V5 = c("phen +, len", "len, 0", 0)
)

If I use quote = "\"", then only one column is formed (because every row starts and ends with a single quote). No comma is recognized as a delimiter anymore.

Is there a possibility to tell read_csv (or another similar function) to ignore single quotes and use double quotes to quote strings? Or is there a way to just efficiently delete all single quotes at the beginning and end of each row?

Thanks in advance for your help!

0

There are 0 best solutions below