I have seen snippets of this floating around but sadly no full answers as of yet so thought I would ask.
I'm working on a function to assign a value based off the presence or absence of some key words ranked by severity. Like such:
severity <- c("kw1", "kw2", "kw3", "kw4", "kw5", "kw6")
Where it basically goes through a single column in a dataset and assigns a value based on the presence or absence of the first/most severe entry in the severity list.
Using the following, I realized you could detect multiple strings with str_detect:
How can I check if multiple strings exist in another string?
severity_rankings <- severity_df |>
dplyr::mutate(
# Classify severity based on strings
severity_kw = dplyr::case_when(
if (any(stringr::str_detect(tolower(severity_string),severity))) ~ severity[min(which(str_detect(tolower(severity_string),severity) == TRUE))],
.default = NA
))
But this keeps throwing an error like it's trying to parse the whole column:
Error in `dplyr::mutate()`:
ℹ In argument: `severity_kw = dplyr::case_when(...)`.
Caused by error in `stringr::str_detect()`:
! Can't recycle `string` (size 20) to match `pattern` (size 6).
Run `rlang::last_trace()` to see where the error occurred.
Ultimately, what I would like is an output along these lines:
ID severity_string severity_kw
1 kw1 with KW2 and kw6 kw1
2 kw6 kw6
3 kw6 with kW5, kw2 also kw2
4 KW3 kw3
5 KW5 kw5
6 KW4 with kw2, kw1 also kw1
7 KW1 kw1
8 KW2 kw2
9 KW4 with KW5 kw4
10 KW6 kw6
11 KW6 with KW1 on the side kw1
12 KW2 with KW4 and KW1 kw1
13 kw5 with kw6 kw5
14 kw7 <NA>
15 KW3 and KW2 kw2
16 KW2 kw2
17 KW1 and KW6 kw1
18 KW3 kw3
19 KW3 and KW1 kw1
20 kw1 kw1
I'm sure it's bad syntax or the wrong dplyr call on my part, but not sure where to start.
Any and all advice would be appreciated.
For generating the initial dataframe:
severity_df <- data.frame(
ID = c(1:20),
severity_string = c("kw1 with KW2 and kw6", "kw6", "kw6 with kW5, kw2 also", "KW3", "KW5",
"KW4 with kw2, kw1 also", "KW1", "KW2", "KW4 with KW5", "KW6",
"KW6 with KW1 on the side", "KW2 with KW4 and KW1", "kw5 with kw6", "kw7", "KW3 and KW2",
"KW2", "KW1 and KW6", "KW3", "KW3 and KW1", "kw1"),
stringsAsFactors = FALSE
)
The issue is that you're trying to use
str_detect()with values forstringandinputwhich have incompatible lengths. You can reproduce the error like this:I think you've got a misplaced
ifin there too, but that seems to be besides the point of the question.For your use case, I would change tack and use a tool like
map_chr()with a bespoke function: