How to create subsets of a dataframe based on columns using a for loop in R

49 Views Asked by At

I have a dataframe which looks like this:

   id age1 sex1 age2  sex2 age3  sex3 age4  sex4
1    5    20  <NA>    NA   <NA>    NA   <NA>    27 Female
2   25    NA  <NA>    NA   <NA>    NA   <NA>    35 Female
3   65    NA  <NA>    NA   <NA>    NA   <NA>    NA   <NA>

this is the code for the data:

temp <- structure(list(id = c(5L, 25L, 65L, 25L, 65L, 5L, 5L, 85L, 285L, 
541L), age1 = c(20L, NA, NA, NA, NA, NA, NA, NA, NA, NA), sex1 = structure(c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = c("missing", 
"inapplicable", "refusal", "don't know", "inconsistent", "Male", 
"Female"), class = "factor"), age2 = c(NA, NA, NA, NA, 31L, 
NA, NA, NA, NA, NA), sex2 = structure(c(NA, NA, NA, NA, 7L, 
NA, NA, NA, NA, NA), .Label = c("missing", "inapplicable", "refusal", 
"don't know", "inconsistent", "Male", "Female"), class = "factor"), 
    age3 = c(NA, NA, NA, NA, 32L, NA, NA, NA, 25L, 23L), sex3 = structure(c(NA, 
    NA, NA, NA, 7L, NA, NA, NA, 6L, 7L), .Label = c("missing", 
    "inapplicable", "refusal", "don't know", "inconsistent", 
    "Male", "Female"), class = "factor"), age4 = c(27L, 35L, 
    NA, NA, 33L, NA, 24L, NA, 26L, NA), sex4 = structure(c(7L, 
    7L, NA, NA, 7L, NA, 7L, NA, 6L, NA), .Label = c("missing", 
    "inapplicable", "refusal", "don't know", "inconsistent", 
    "Male", "Female"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

I would like to know how to make multiple subsets based the data based on the columns.

I know I could do this by using the codes:

Subset1<- temp[,1:3]
Subset2<-temp[,c(1,4:5)]
Subset3<- temp[,c(1,6:7)]

But there must be a more concise way to do this. I've tried a for loop but I'm new to R and don't know how to this including keeping the names of the new subsets consistent.

2

There are 2 best solutions below

1
Ronak Shah On BEST ANSWER

We can use split.default to split data based on number in the column names and append the first column in each list.

new_list <- lapply(split.default(temp[-1], gsub("\\D", "", names(temp)[-1])), 
                   function(x) cbind(temp[1], x))
new_list

#$`1`
#    id age_1 sex_1
#1    5    20  <NA>
#2   25    NA  <NA>
#3   65    NA  <NA>
#4   25    NA  <NA>
#5   65    NA  <NA>
#6    5    NA  <NA>
#7    5    NA  <NA>
#8   85    NA  <NA>
#9  285    NA  <NA>
#10 541    NA  <NA>

#$`2`
#    id age_2  sex_2
#1    5    NA   <NA>
#...

This returns a list of dataframes, if you want data in separate dataframes, we can do :

names(new_list) <- paste0('Subset', seq_along(new_list))
list2env(new_list, .GlobalEnv)
0
ThomasIsCoding On

Here is another base R solution

ind <- 1:4
list2env(setNames(lapply(ind, function(k) subset(temp,select = c(1,2*k+(0:1)))),
                  paste0("Subset",ind)),
         envir = .GlobalEnv)

where subset + lapply was used