How do I create a column in R with a "yes" output if any of the selected columns are "yes"

Question

How do I create a column in R with a "yes" output if any of the selected columns are "yes"

78 Views Asked by Julia Caron At 25 February 2024 at 06:20

Edit - I broke down the steps using advice I received below and managed to complete this on my own. I will briefly go over the steps in the comments. Thank you!

I am working with a public dataset that has many sources of designation of a community being disadvantaged (DAC). I want to create another column that will indicate "Yes" if ANY of the designations are "yes" (even if other sources indicated "no).

I'm also wondering if one column name is a problem because it contains parenthesis - DAC(CARB)_LIC(CARB) I found somewhere online where it said that backticks ` help in this situation.

As for the first problem, I tried a number of different solutions I found searching online but nothing has worked so far. I can't find many examples of multiple OR conditions (using |) to create a column.

I am trying name the new column DAC. Columns which designate whether a community is disadvantaged are:

Disadvantaged_Community_OEHHA - "YES" or "NO" in all caps
Disadvantaged_Community_SB535 - "Yes" or "No" with only the first letter capitalized
DAC(CARB)_LIC(CARB) - "Low Income and Disadvantaged" or "NA"
Disadvantaged_Community_CE53 - "YES" or "NO" in all caps
Disadvantaged_Community_CE54 - "YES" or "NO" in all caps

This is the last thing I tried:

clean_transport_2022 <-transform(clean_transport_2022, DAC = ifelse(Disadvantaged_Community_OEHHA == "YES" | Disadvantaged_Community_SB535 == "Yes", `DAC(CARB)_LIC(CARB)` == "Low Income and Disadvantaged" | Disadvantaged_Community_CE53 = "YES" | Disadvantaged_Community_CE54 = "YES", "Yes", "No"))

Edit - Here is the head() for the dataset -

> head(clean_transport_2022)
# A tibble: 6 × 22
  Recipient_Contractor    Project_Title Project_Amount Project_Type
  <chr>                   <chr>                  <dbl> <chr>       
1 ABAG                    Bay Area EV …         14533. Electric Ve…
2 CA EV Alliance          Bay Area Cha…         12474. Electric Ve…
3 RTC Fuels, LLC dba Pea… Pearson Fuel…         71053  E85 Fueling…
4 Blink Acquisition       Nissan Elect…          6849. Electric Ve…
5 Redwood Coast Energy A… North Coast …         70000  Electric Ve…
6 ABAG                    Bay Area EV …          7266. Electric Ve…
# ℹ 18 more variables: Fuel_Type <chr>, Supply_Chain_Phase <chr>,
#   RDD_D_Phase <chr>, Status <chr>, Project_City <chr>,
#   Project_Zip <chr>, County <chr>, Air_District <chr>,
#   Electric_Service_Area <chr>,
#   Disadvantaged_Community_OEHHA <chr>,
#   Disadvantaged_Community_SB535 <chr>, Low_Income_SB535 <chr>,
#   Project_Location <chr>, LowIncome_SB1000 <chr>, …

Original Q&A

There are 1 best solutions below

**Josep Pueyo** · Answer 1 · 2024-02-25T07:09:26.400000

As you want to operate accross columns, this is not easy, since R is designed to work column by column. Here is a solution using base R (without importing libraries). I used apply to operate rowwise.

You are right about the backtiks, but since the solution does not need to use column names, you don't have to use backticks here. If you don't need to operate on all your columns, the easiest option is to create a new data.frame with only the columns where do you want to check your conditions. Another options would be, using the subset inside the apply.

As you did not provide any sample of your data, I created a synthetic dataframe that I believe is similar to yours:

prob <- c(0.2, 0.8)

df <- data.frame(
  "Disadvantaged_Community_OEHHA" = sample(c("YES", "NO"), 10, replace = TRUE, prob = prob),
  "Disadvantaged_Community_SB535" = sample(c("Yes", "No"), 10, replace = TRUE, prob = prob),
  "DAC(CARB)_LIC(CARB)" = sample(c("Low Income and Disadvantaged" , NA), 10, replace = TRUE, prob = prob),
  "Disadvantaged_Community_CE53" = sample(c("YES", "NO"), 10, replace = TRUE, prob = prob),
  "Disadvantaged_Community_CE54" = sample(c("YES", "NO"), 10, replace = TRUE, prob = prob)
)

df$any_yes <- apply(df, 1, \(x) ifelse(any(grepl("[yY]|Low", x)), "yes", "no"))

df
#>    Disadvantaged_Community_OEHHA Disadvantaged_Community_SB535
#> 1                            YES                            No
#> 2                             NO                            No
#> 3                            YES                            No
#> 4                             NO                            No
#> 5                            YES                            No
#> 6                             NO                            No
#> 7                             NO                           Yes
#> 8                            YES                            No
#> 9                             NO                           Yes
#> 10                            NO                            No
#>    DAC.CARB._LIC.CARB. Disadvantaged_Community_CE53
#> 1                 <NA>                           NO
#> 2                 <NA>                           NO
#> 3                 <NA>                           NO
#> 4                 <NA>                           NO
#> 5                 <NA>                          YES
#> 6                 <NA>                           NO
#> 7                 <NA>                           NO
#> 8                 <NA>                          YES
#> 9                 <NA>                           NO
#> 10                <NA>                           NO
#>    Disadvantaged_Community_CE54 any_yes
#> 1                            NO     yes
#> 2                            NO      no
#> 3                            NO     yes
#> 4                            NO      no
#> 5                            NO     yes
#> 6                            NO      no
#> 7                            NO     yes
#> 8                            NO     yes
#> 9                            NO     yes
#> 10                           NO      no

^{Created on 2024-02-25 with reprex v2.1.0}

I would recommend to read this to post future questions.

How do I create a column in R with a "yes" output if any of the selected columns are "yes"

There are 1 best solutions below

Related Questions in R

Related Questions in CONDITIONAL-STATEMENTS

Related Questions in PARENTHESES

Trending Questions

Popular # Hahtags

Popular Questions