r read_xlsx() tacks cell number to the end of column names

1k Views Asked by At

I'm trying to import some data from Excel into R using read_xlsx(). I normally like to use janitor::clean_names() to make column names uniform and to tidy the data. This is the code I used:

file_two <- "./data-raw/mass_spec_data_anon.xlsx"
mass_spec_two <- readxl::read_xlsx(file_two, skip = 2) %>% 
  janitor::clean_names()

The output gives me the following warning:

New names:
* gfpA_1 -> gfpA_1...7
* gfpA_2 -> gfpA_2...8
* gfpA_3 -> gfpA_3...9
* GFP1 -> GFP1...10
* GFP2 -> GFP2...11
* ...

I would like for the column names not to have their cell numbers as part of their names (i.e. "gfpA_1" not "gfpA_1_7". Any help with this problem would be appreciated.

1

There are 1 best solutions below

0
On

You can specify what kind of name repair you want in read_xlsx(), with the .name_repair argument. By default it's de-duplicating using column index to create unique names, but you can override that to provide only minimal repair:

readxl::read_xlsx(file_two, skip = 2, .name_repair = "minimal")

Then janitor::clean_names() will de-duplicate if there are duplicated names at that point, but not with reference to column index.