Prevent conversion to factor when number of columns in a data.frame can be reduced to one

182 Views Asked by At

I have a procedure that can extract items from a data frame based on a list of conditions on the columns (see Extracting items from an R data frame using criteria given as a (column_name = value) list):

Here are the data frame and condition list:

> experimental_plan_1
  lib genotype treatment replicate
1   A       WT    normal         1
2   B       WT       hot         1
3   C      mut    normal         1
4   D      mut       hot         1
5   E       WT    normal         2
6   F       WT       hot         2
7   G      mut    normal         2
8   H      mut       hot         2
> condition_1 <- list(genotype="WT", treatment="normal")

My goal is to extract the values in the lib column for lines corresponding to criteria given in the list.

I can use the following function to extract the wanted values:

> get_libs <- function(experimental_plan, condition) {experimental_plan[apply((experimental_plan[, names(condition)] == condition), 1, all), "lib"]}

This works well with the above data frame:

> get_libs(experimental_plan_1, condition_1)
[1] A E
Levels: A B C D E F G H

However, I would like for this to to be more general: My experimental_plan and condition could have different columns:

> experimental_plan_2
  lib genotype replicate
1   A       WT         1
2   B       WT         2
3   C       WT         3
4   D      mut         1
5   E      mut         2
6   F      mut         3
> condition_2 <- list(genotype="WT")

This time it fails:

> get_libs(experimental_plan_2, condition_2)
Error in apply((experimental_plan[, names(condition)] == condition), 1,  : 
  dim(X) must have a positive length

In this case, the expected output should be:

[1] A B C
Levels: A B C D E F

How can I write a function that performs the same thing in a more robust manner?


Comment

I find it quite frustrating that the function does not work despite both cases being highly similar: both data frames have a lib column, and in both cases the names in the condition list correspond to column names in the data frame.

R apparently automatically converts a data.frame to a factor when the number of columns extracted from the data frame is reduced to one:

> class(experimental_plan_1)
[1] "data.frame"
> class(experimental_plan_2)
[1] "data.frame"
> class(names(condition_1))
[1] "character"
> class(names(condition_2))
[1] "character"
> class(experimental_plan_1[, names(condition_1)])
[1] "data.frame"
> class(experimental_plan_2[, names(condition_2)])
[1] "factor"

This goes against the principle of least surprise. I would expect a computation to return the same type of output when the same type of inputs are given.

0

There are 0 best solutions below