Look for multiple strings in multiple columns for a particular item and return the colnames in R

68 Views Asked by At

I want to find two strings in multiple columns for a particular item. I need to get the colnames which contain "0/1" or "1/1" for each item in Item_Name.

Example dataframe:

Item_Name Sample_A Sample_B Sample_C
Nino1 0/1 0/0 1/1
Nino2 0/0 1/1 0/1
Nino3 1/1 0/1 0/0

Expected Output:

Nino1 - SampleA, Sample_C

Nino2 - Sample_B, Sample_C

Nino3 - Sample_A, Sample_B

Till now I know to use grepl to find one pattern at time. I am not sure how to get colnames part.

data_frame_mod  <- dataframe[Reduce(`|`, lapply(dataframe, grepl, pattern = "0/1"))]

Thanks

3

There are 3 best solutions below

0
Mark On

Here's a fairly rudimentary way of doing it:

library(tidyverse)

desired <- c("0/1", "1/1")

dataframe |>
  rowwise() |> 
  mutate(samples = names(dataframe)[c_across(everything()) %in% desired] |> paste(collapse = ", ")) |>
  ungroup()

Update: a one-liner:

mutate(df, goal = apply(df, 1, \(x) names(df)[x %in% desired] |> paste(collapse = ", ")))

Output:

# A tibble: 3 × 5
  Item_Name Sample_A Sample_B Sample_C samples           
  <chr>     <chr>    <chr>    <chr>    <chr>             
1 Nino1     0/1      0/0      1/1      Sample_A, Sample_C
2 Nino2     0/0      1/1      0/1      Sample_B, Sample_C
3 Nino3     1/1      0/1      0/0      Sample_A, Sample_B
0
Onyambu On

You do not need reduce. Use %in%:

val <- as.matrix(df[-1]) %in% c('0/1', '1/1')
nms <- names(df[-1])[col(df[-1])][val]
transform(df, output = tapply(nms,row(df[-1])[val], toString))

 Item_Name Sample_A Sample_B Sample_C             output
1     Nino1      0/1      0/0      1/1 Sample_A, Sample_C
2     Nino2      0/0      1/1      0/1 Sample_B, Sample_C
3     Nino3      1/1      0/1      0/0 Sample_A, Sample_B

Notice that if you want to use grepl:

 grepl("[01]/1", as.matrix(df[-1]))

In tidyverse:

library(tidyverse)
df %>%
    mutate(across(starts_with('Sample'),
                                ~ifelse(.x %in% c('0/1', '1/1'), cur_column(),NA),
                                .names = 'a_{col}')) %>%
    unite(output, starts_with('a'), sep = ', ', na.rm = TRUE)

  Item_Name Sample_A Sample_B Sample_C             output
1     Nino1      0/1      0/0      1/1 Sample_A, Sample_C
2     Nino2      0/0      1/1      0/1 Sample_B, Sample_C
3     Nino3      1/1      0/1      0/0 Sample_A, Sample_B
0
r2evans On

If you don't have a very big frame, you can pivot and grep:

library(dplyr)
library(tidyr) # pivot_longer
quux |>
  pivot_longer(cols = -Item_Name) |>
  filter(value %in% c("0/1", "1/1")) |>
  summarize(res = toString(name), .by = Item_Name)
# # A tibble: 3 × 2
#   Item_Name res               
#   <chr>     <chr>             
# 1 Nino1     Sample_A, Sample_C
# 2 Nino2     Sample_B, Sample_C
# 3 Nino3     Sample_A, Sample_B

Or we can use sapply per group:

quux |>
  summarize(
    res = toString(names(which(sapply(pick(starts_with("Sample_")), 
                                      `%in%`, c("0/1", "1/1"))))),
    .by = Item_Name)
#   Item_Name                res
# 1     Nino1 Sample_A, Sample_C
# 2     Nino2 Sample_B, Sample_C
# 3     Nino3 Sample_A, Sample_B

This latter one assumes one row per Item_Name.