In R, use if loop with agrep to assign value

197 Views Asked by At

The pattern list looks like:

pattern <- c('aaa','bbb','ccc','ddd')

X came from df looks like:

df$X <- c('aaa-053','aaa-001','aab','bbb')

What I tried to do: use agrep to find the matching name in pattern based on df$X, then assign value to an existing column 'column2' based on the matching result, for example, if 'aaa-053' matched 'aaa', then 'aaa' would be the value in 'column2', if not matched, then return na in that column.

for (i in 1:length(pattern)) {
 match <- agrep(pattern, df$X, ignore.case=TRUE, max=0)
 if agrep = TRUE {
   df$column2 <- pattern
 } else {df$column2 <- na
 }
}

Ideal column2 in df looks like:

'aaa','aaa',na,'bbb'
1

There are 1 best solutions below

3
r2evans On

agrep by itself isn't going to give you much to determine which to use when multiples match. For instance,

agrep(pattern[1], df$x)
# [1] 1 2 3

which makes sense for the first two, but the third is not among your expected values. Similarly, it's feasible that it might select multiple patterns for a given string.

Here's an alternative:

D <- adist(pattern, df$x, fixed = FALSE)
D
#      [,1] [,2] [,3] [,4]
# [1,]    0    0    1    3
# [2,]    3    3    2    0
# [3,]    3    3    3    3
# [4,]    3    3    3    3
D[D > 0] <- NA
D
#      [,1] [,2] [,3] [,4]
# [1,]    0    0   NA   NA
# [2,]   NA   NA   NA    0
# [3,]   NA   NA   NA   NA
# [4,]   NA   NA   NA   NA
apply(D, 2, function(z) which.min(z)[1])
# [1]  1  1 NA  2
pattern[apply(D, 2, function(z) which.min(z)[1])]
# [1] "aaa" "aaa" NA    "bbb"