How do I create a column in R with a "yes" output if any of the selected columns are "yes"

78 Views Asked by At

Edit - I broke down the steps using advice I received below and managed to complete this on my own. I will briefly go over the steps in the comments. Thank you!

I am working with a public dataset that has many sources of designation of a community being disadvantaged (DAC). I want to create another column that will indicate "Yes" if ANY of the designations are "yes" (even if other sources indicated "no).

I'm also wondering if one column name is a problem because it contains parenthesis - DAC(CARB)_LIC(CARB) I found somewhere online where it said that backticks ` help in this situation.

As for the first problem, I tried a number of different solutions I found searching online but nothing has worked so far. I can't find many examples of multiple OR conditions (using |) to create a column.

I am trying name the new column DAC. Columns which designate whether a community is disadvantaged are:

  • Disadvantaged_Community_OEHHA - "YES" or "NO" in all caps
  • Disadvantaged_Community_SB535 - "Yes" or "No" with only the first letter capitalized
  • DAC(CARB)_LIC(CARB) - "Low Income and Disadvantaged" or "NA"
  • Disadvantaged_Community_CE53 - "YES" or "NO" in all caps
  • Disadvantaged_Community_CE54 - "YES" or "NO" in all caps

This is the last thing I tried:

clean_transport_2022 <-transform(clean_transport_2022, DAC = ifelse(Disadvantaged_Community_OEHHA == "YES" | Disadvantaged_Community_SB535 == "Yes", `DAC(CARB)_LIC(CARB)` == "Low Income and Disadvantaged" | Disadvantaged_Community_CE53 = "YES" | Disadvantaged_Community_CE54 = "YES", "Yes", "No"))

Edit - Here is the head() for the dataset -

> head(clean_transport_2022)
# A tibble: 6 × 22
  Recipient_Contractor    Project_Title Project_Amount Project_Type
  <chr>                   <chr>                  <dbl> <chr>       
1 ABAG                    Bay Area EV …         14533. Electric Ve…
2 CA EV Alliance          Bay Area Cha…         12474. Electric Ve…
3 RTC Fuels, LLC dba Pea… Pearson Fuel…         71053  E85 Fueling…
4 Blink Acquisition       Nissan Elect…          6849. Electric Ve…
5 Redwood Coast Energy A… North Coast …         70000  Electric Ve…
6 ABAG                    Bay Area EV …          7266. Electric Ve…
# ℹ 18 more variables: Fuel_Type <chr>, Supply_Chain_Phase <chr>,
#   RDD_D_Phase <chr>, Status <chr>, Project_City <chr>,
#   Project_Zip <chr>, County <chr>, Air_District <chr>,
#   Electric_Service_Area <chr>,
#   Disadvantaged_Community_OEHHA <chr>,
#   Disadvantaged_Community_SB535 <chr>, Low_Income_SB535 <chr>,
#   Project_Location <chr>, LowIncome_SB1000 <chr>, …
1

There are 1 best solutions below

6
Josep Pueyo On

As you want to operate accross columns, this is not easy, since R is designed to work column by column. Here is a solution using base R (without importing libraries). I used apply to operate rowwise.

You are right about the backtiks, but since the solution does not need to use column names, you don't have to use backticks here. If you don't need to operate on all your columns, the easiest option is to create a new data.frame with only the columns where do you want to check your conditions. Another options would be, using the subset inside the apply.

As you did not provide any sample of your data, I created a synthetic dataframe that I believe is similar to yours:

prob <- c(0.2, 0.8)

df <- data.frame(
  "Disadvantaged_Community_OEHHA" = sample(c("YES", "NO"), 10, replace = TRUE, prob = prob),
  "Disadvantaged_Community_SB535" = sample(c("Yes", "No"), 10, replace = TRUE, prob = prob),
  "DAC(CARB)_LIC(CARB)" = sample(c("Low Income and Disadvantaged" , NA), 10, replace = TRUE, prob = prob),
  "Disadvantaged_Community_CE53" = sample(c("YES", "NO"), 10, replace = TRUE, prob = prob),
  "Disadvantaged_Community_CE54" = sample(c("YES", "NO"), 10, replace = TRUE, prob = prob)
)

df$any_yes <- apply(df, 1, \(x) ifelse(any(grepl("[yY]|Low", x)), "yes", "no"))

df
#>    Disadvantaged_Community_OEHHA Disadvantaged_Community_SB535
#> 1                            YES                            No
#> 2                             NO                            No
#> 3                            YES                            No
#> 4                             NO                            No
#> 5                            YES                            No
#> 6                             NO                            No
#> 7                             NO                           Yes
#> 8                            YES                            No
#> 9                             NO                           Yes
#> 10                            NO                            No
#>    DAC.CARB._LIC.CARB. Disadvantaged_Community_CE53
#> 1                 <NA>                           NO
#> 2                 <NA>                           NO
#> 3                 <NA>                           NO
#> 4                 <NA>                           NO
#> 5                 <NA>                          YES
#> 6                 <NA>                           NO
#> 7                 <NA>                           NO
#> 8                 <NA>                          YES
#> 9                 <NA>                           NO
#> 10                <NA>                           NO
#>    Disadvantaged_Community_CE54 any_yes
#> 1                            NO     yes
#> 2                            NO      no
#> 3                            NO     yes
#> 4                            NO      no
#> 5                            NO     yes
#> 6                            NO      no
#> 7                            NO     yes
#> 8                            NO     yes
#> 9                            NO     yes
#> 10                           NO      no

Created on 2024-02-25 with reprex v2.1.0

I would recommend to read this to post future questions.