How to replace and delete different underscore in a single name row

38 Views Asked by At

Let's suppose I have this situation

data = data.frame('A' = c('A_A_', 'B_B_'))

A_A_ where I would like to remove the final and replace the central underscore. What can I do to save the following two steps?

data %>% 
  mutate(A = sub("_$","", A)) %>% 
  mutate(A = sub("_","->", A))

Thanks

2

There are 2 best solutions below

0
Chris On BEST ANSWER

An unwieldy base::strsplit

paste0(unlist(strsplit(data$A, split = '_'))
[1:length(unlist(strsplit(data$A, split = '_'))) %% 2 == 1],
'->',
unlist(strsplit(data$A, split = '_'))
[1:length(unlist(strsplit(data$A, split = '_'))) %% 2 == 0])
[1] "A->A" "B->B"

that likely wouldn't be called a 'one-liner'...

5
Tim Biegeleisen On

You could use sub() with capture groups:

data$A <- sub("([^_]+)_([^_]+)_", "\\1->\\2", data$A)

The regex pattern used here says to match:

  • ([^_]+) match and capture in \1 the first term
  • _ match the first underscore
  • ([^_]+) match and capture in \2 the second term
  • _ match the final underscore

Then we splice together the two segments separated by ->.