I have these tibbles:
library(tidyverse)
data <- tribble(
~code, ~v1, ~v2, ~v3,
'ENSG00000141510.14', 10, 20, 30,
'ENSG00000133703.10', 15, 90, 50,
'ENSG00000187848.11', 18, 17, 16,
'ENSG00000133703.10', 55, 47, 22
)
code_info <- tribble(
~code, ~name,
'ENSG00000141510', 'p53',
'ENSG00000133703', 'KRAS',
'ENSG00000187848', 'P2XR2'
)
I want to create a new column data$name
by comparing the columns data$code
and code_info$code
. In this sense, the code_info
tibble kinda works as a key:value pair for the strings in data$code
.
Each row in data
will have its name
column equal to the name
column in code_info
whose string in code_info$code
is a partial string of the one in data$code
.
In this example, the expected result would be:
# A tibble: 4 x 5
code v1 v2 v3 name
<chr> <dbl> <dbl> <dbl> <chr>
1 ENSG00000141510.14 10 20 30 p53
2 ENSG00000133703.10 15 90 50 KRAS
3 ENSG00000187848.11 18 17 16 P2XR2
4 ENSG00000133703.10 55 47 22 KRAS
So far I've been doing this by manually hardcoding a bunch of sequential str_replace
, in a mutate
, like:
data %>%
mutate(
name = str_replace(code, "ENSG00000141510.14", "p53"),
name = str_replace(name, "ENSG00000133703.10", "KRAS"),
name = str_replace(name, "ENSG00000187848.11", "P2XR2")
)
# output:
# # A tibble: 4 x 5
# code v1 v2 v3 name
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 ENSG00000141510.14 10 20 30 p53
# 2 ENSG00000133703.10 15 90 50 KRAS
# 3 ENSG00000187848.11 18 17 16 P2XR2
# 4 ENSG00000133703.10 55 47 22 KRAS
Which does the trick but obviously isn't very scalable. Is there a way to incorporate the information in code_info
, so that this is done automatically, without hard-coding the replacement values?
Try this sloution with
left_join()
andseparate()
in order to create a common variable between datasets:Output: