How to replace the name_repair behavior of the readr package by numbering duplicates but not by their column position?

100 Views Asked by At

Suppose I have this csv file:

asdf,qwer,asdf,qwer,qwer
1,2,3,4,5

If I use readr::read_csv("some.csv") to read it I will obtain new column names for duplicates based on the position of the column.

# A tibble: 1 × 5
  asdf...1 qwer...2 asdf...3 qwer...4 qwer...5
     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1        1        2        3        4        5

What could I do if I'd rather have names with suffixes based on the number of duplicates and with no modification for the first occurence like that:

# A tibble: 1 × 5
   asdf  qwer asdf_1 qwer_1 qwer_2
  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1     1     2      3      4      5

Hint

It seems possible to use the name_repair argument of read_csv and provide a function.

1

There are 1 best solutions below

0
r2evans On BEST ANSWER

Since name_repair= can be a function, we can deal with it programmatically. Fortunately, base::make.unique does most of it, and we can customize it with sep="_" to get your exact output.

namefun <- function(nm) make.unique(nm, sep = "_")
txt <- 'asdf,qwer,asdf,qwer,qwer
1,2,3,4,5'
readr::read_csv(txt, name_repair = namefun)
# Rows: 1 Columns: 5
# ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────
# Delimiter: ","
# dbl (5): asdf, qwer, asdf_1, qwer_1, qwer_2
# ℹ Use `spec()` to retrieve the full column specification for this data.
# ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# # A tibble: 1 × 5
#    asdf  qwer asdf_1 qwer_1 qwer_2
#   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
# 1     1     2      3      4      5