How to create a new column in R based on what another column starts with

182 Views Asked by At

My df contains demographic information on 50 people. I have a column in my df called "Ethnicity" which contains a lot of ethnicity categories including "White British", "White Other", and "White Irish". I want to create a new column where all observations with one of these 3 values is classified as "White", and all observations which don't start with "White" are classified as "POC".

df %>% mutate(Status = case_when(startsWith(Ethnicity, "White") ~ "White"))

I get the following error

Error in `mutate()`:
! Problem while computing `Status = case_when(startsWith(Ethnicity,
  "White") ~ "White")`.
Caused by error in `startsWith()`:
! non-character object(s)
Run `rlang::last_error()` to see where the error occurred.
1

There are 1 best solutions below

3
jkatam On

Please check the below code with adsl dataframe.
Here i am using the ETHNIC with HISPANIC OR LATINO or NOT HISPANIC OR LATINO and i am creating a new column with all that starts with HISP as HIS and all others as 'NON', just for example

library(tidyCDISC)
library(tidyverse)

data(adsl,package='tidyCDISC')

adsl %>% select(USUBJID, ETHNIC) %>% 
mutate(new_column=case_when(str_detect(ETHNIC,'^HISPA')==T ~ 'HIP',
                                                            TRUE ~ 'NON'))

Created on 2023-02-04 with reprex v2.0.2

# A tibble: 254 × 3
   USUBJID     ETHNIC                 new_column
   <chr>       <chr>                  <chr>     
 1 01-701-1015 HISPANIC OR LATINO     HIP       
 2 01-701-1023 HISPANIC OR LATINO     HIP       
 3 01-701-1028 NOT HISPANIC OR LATINO NON       
 4 01-701-1033 NOT HISPANIC OR LATINO NON       
 5 01-701-1034 NOT HISPANIC OR LATINO NON       
 6 01-701-1047 NOT HISPANIC OR LATINO NON       
 7 01-701-1097 NOT HISPANIC OR LATINO NON       
 8 01-701-1111 NOT HISPANIC OR LATINO NON       
 9 01-701-1115 NOT HISPANIC OR LATINO NON       
10 01-701-1118 NOT HISPANIC OR LATINO NON       
# … with 244 more rows
# ℹ Use `print(n = ...)` to see more rows