How to extract multiple numbers between a repeating pattern using stringr?

74 Views Asked by At

I have a column of strings that look like the string below, where the numbers following the double colons "::" are ages. In this example, 51, 40, 9, 5, 2, and 15 are the ages. The numbers following the "||" are just saying this is the first person, second person, etc. I'd like to extract just the ages.

library(tidyverse)

ex_str = "0::51||1::40||2::9||3::5||4::2||5::15"

I've tried things like,

test_string |>
  str_extract_all("::[0-9]+")

only to get the output below.

[[1]]
[1] "::51" "::40" "::9"  "::5"  "::2"  "::15"

I apologize for the simple question. I've watched a few videos and read some guides online, but I just can't figure it out.

3

There are 3 best solutions below

0
Sash Sinha On BEST ANSWER

You can use str_extract_all with a regex that includes a positive look-behind for '::':

library(tidyverse)

ex_str <- "0::51||1::40||2::9||3::5||4::2||5::15"
ages <- str_extract_all(ex_str, "(?<=::)\\d+") %>% unlist()
ages_numeric <- as.numeric(ages)
print(ages_numeric)

Output:

[1] 51 40  9  5  2 15
1
jpsmith On

You can try using strsplit on the "||" and then gsub out the first digit(s) and the "::":

gsub("\\d+::", "",strsplit(ex_str, "\\|\\|")[[1]])

# or (thanks to @r2evans):
gsub("\\d+::", "",strsplit(ex_str, "||", fixed=TRUE)[[1]])

You could also so the reverse - strsplit after the gsub:

strsplit(gsub("\\d+::", "",ex_str), "\\|\\|")[[1]]

# or (thanks to @r2evans):
strsplit(gsub("\\d+::", "",ex_str), "||", fixed = TRUE)[[1]]

All return the same:

#[1] "51" "40" "9"  "5"  "2"  "15"
0
ThomasIsCoding On

You can use strsplit like below

> strsplit(ex_str, "\\D+")[[1]][c(FALSE, TRUE)]
[1] "51" "40" "9"  "5"  "2"  "15"