Creating a dataset (data frame,df) in R consisting of all possible combinations of 10 different variables

77 Views Asked by At

Thanks in advance for any advice. As part of a study, I need to:

Part 1:

I need to create a .csv dataset (or r data frame?) that produces all possible combinations of 10 different variables. Each of the 10 variables has either 2 (i.e., binary 0,1) or 4 levels. I think this should be possible easily in both excel and r, but r would be preferable. They are provided in the table below:enter image description here

For example, the first set of combinations would keep "druga_LIFE" at 0.5 and then would cycle through all combinations of the other variables, then it would follow by fixing "druga_LIFE" at 1 and cycling through all other combinations of variables. Eventually, it would move on to fix "druga.NEED" at 0 changing other variables, then at 1 and so on.

The dataset should be a full set of combinations with no repeat combinations.

I understand there is a large number of possible combinations - this is as expected, but I don't think this should be too difficult to compute.

Part 2:

I then need to go through this dataset, selecting only the possible combinations where:

  1. "druga.LIFE" is the same as "drugb.LIFE"

AND

2)"druga.NEED" is the same as "drugb.NEED"

I think this should be simple with the dplyr package in R.

I have created the df in r, but do not know how to begin with cycling through to produce all possible combinations.

#DATASET OF ALL POSSIBLE CHOICE SETS#

#Creating the Vectors of choices

DrugA_LIFE <- c(0.5, 1, 2,5)
DrugA_NEED <- c(0,1)
DrugA_CERT <- c(0, 0.2, 0.4, 0.6)
DrugA_RISK <- c(0.1, 0.2, 0.4, 0.6)
DrugA_WAIT <- c(0, 0.5, 1, 2)

DrugB_LIFE <- c(0.5, 1, 2,5)
DrugB_NEED <- c(0,1)
DrugB_CERT <- c(0, 0.2, 0.4, 0.6)
DrugB_RISK <- c(0.1, 0.2, 0.4, 0.6)
DrugB_WAIT <- c(0, 0.5, 1, 2)

#Create data frame

df <- data.frame(DrugA_LIFE, DrugA_NEED, DrugA_CERT, DrugA_RISK, DrugA_WAIT, DrugB_LIFE,      DrugB_NEED, DrugB_CERT, DrugB_RISK, DrugB_WAIT)
1

There are 1 best solutions below

0
r2evans On
  1. All possible combinations? expand.grid or tidyr::expand_big. We can apply either function to an already-made frame using do.call.

  2. Unique? Use R's unique or dplyr::distinct.

  3. Filtering? Use ... dplyr::filter (or base R subset).

library(dplyr)
# library(tidyr) # expand_grid
do.call(tidyr::expand_grid, df) %>%
  distinct() %>%
  filter(DrugA_LIFE == DrugB_LIFE, DrugA_NEED == DrugB_NEED)
# # A tibble: 32,768 × 10
#    DrugA_LIFE DrugA_NEED DrugA_CERT DrugA_RISK DrugA_WAIT DrugB_LIFE DrugB_NEED DrugB_CERT DrugB_RISK DrugB_WAIT
#         <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#  1        0.5          0          0        0.1          0        0.5          0          0        0.1        0  
#  2        0.5          0          0        0.1          0        0.5          0          0        0.1        0.5
#  3        0.5          0          0        0.1          0        0.5          0          0        0.1        1  
#  4        0.5          0          0        0.1          0        0.5          0          0        0.1        2  
#  5        0.5          0          0        0.1          0        0.5          0          0        0.2        0  
#  6        0.5          0          0        0.1          0        0.5          0          0        0.2        0.5
#  7        0.5          0          0        0.1          0        0.5          0          0        0.2        1  
#  8        0.5          0          0        0.1          0        0.5          0          0        0.2        2  
#  9        0.5          0          0        0.1          0        0.5          0          0        0.4        0  
# 10        0.5          0          0        0.1          0        0.5          0          0        0.4        0.5
# # … with 32,758 more rows
# # ℹ Use `print(n = ...)` to see more rows

Data:

df <- structure(list(DrugA_LIFE = c(0.5, 1, 2, 5), DrugA_NEED = c(0, 1, 0, 1), DrugA_CERT = c(0, 0.2, 0.4, 0.6), DrugA_RISK = c(0.1, 0.2, 0.4, 0.6), DrugA_WAIT = c(0, 0.5, 1, 2), DrugB_LIFE = c(0.5, 1, 2, 5), DrugB_NEED = c(0, 1, 0, 1), DrugB_CERT = c(0, 0.2, 0.4, 0.6), DrugB_RISK = c(0.1, 0.2, 0.4, 0.6), DrugB_WAIT = c(0, 0.5, 1, 2)), class = "data.frame", row.names = c(NA, -4L))