Creating function to group data with %like% pipeline giving errors

62 Views Asked by At

I am trying to create a function that will take data with a description column, match a phrase in that column (using %like% from the data.table package), and add a grouping variable with a group name.

For example, after running my code on the iris dataset, I expect to get something like this:

Sepal.Length Sepal.Width Petal.Length Petal.Width Species Function
1          5.1         3.5          1.4         0.2  setosa Setosa
2          4.9         3.0          1.4         0.2  setosa Setosa
3          4.7         3.2          1.3         0.2  setosa Setosa
4          4.6         3.1          1.5         0.2  setosa Setosa
5          5.0         3.6          1.4         0.2  setosa Setosa
6          5.4         3.9          1.7         0.4  setosa Setosa

This is for genomic data, with descriptors of protein function, so the %like% part is vital. I also want to be able to use this in a versatile manner for genomic sets with various numbers of groupings so the goal is to have the ones that I do not need return a data set that can easily be filtered out and only the real groupings kept, thus the if...else portions. Thus a line without a grouping would create a row like this:

Sepal.Length Sepal.Width Petal.Length Petal.Width Species Function
1          1           1            1           1       1     NULL

and a line filtering out Function=="NULL" would remove the extraneous groupings.

However, no matter what I try I cannot seem to get the code to work. Currently the groups only return as the value of the title of the new variable (e.g. running it with the final line being data=group1 returns [1] "Setosa") instead of the desired data frame, and the full code returns an error on the rbind as a result.

How can I fix this to make a functional code for this function?

Function <- function(data, grouping=T, groupingcategoryname="Function", wide_for_heatmap=F,
                     group1_filter_list="set", group1="Setosa", 
                     group2_filter_list="vers", group2="Versicolor", 
                     group3_filter_list="NULL", group3="NULL", 
                     filtercol1="Species", filtercol2="Species", filtercol3="Species"){
  groups<-data
  groups <- groups[!duplicated(groups$sampleID), ]
  
  group_1 <- groups |> filter((!!as.name(filtercol1)) %like% group1_filter_list)
  if(nrow(group_1)>=0){
    group_1 <- as.data.frame(group_1)
    group_1$Function <- group1
  }else {
    group_1 <- as.data.frame(group_1)
    group_1[1,] <- as.numeric(1)
    group1$Function<- "NULL"
  }
  
  group_2 <- groups |> filter((!!as.name(filtercol1)) %like% group2_filter_list)
  if(nrow(group_2)>=0){
    group_2 <- as.data.frame(group_2)
    group_2$Function <- "group2"
  }else {
    group_2 <- as.data.frame(group_2)
    group_2[1,] <- as.numeric(1)
    group2$Function<- "NULL"
  }
  
  group_3 <- groups |> filter((!!as.name(filtercol1)) %like% group3_filter_list)
  if(nrow(group_3)>=0){
    group_3 <- as.data.frame(group_3)
    group_3$Function <- group3
  }else {
    group_3 <- as.data.frame(group_3)
    group_3[1,] <- as.numeric(1)
    group3$Function<- "NULL"
  }
  
  data <- rbind(group_1, group_2, group_3)
  
}

test <- Function(iris)
0

There are 0 best solutions below