Sample rows from a dataframe by id when some ids have more rows than others

51 Views Asked by At

this is very basic but I couldn't find an answer online. I use R and have a dataset like this (but much larger):

set.seed(123)
id<-c(1,1,1,2,2,3,3,3,3,3,4,5,5,6,6,6)
week<-c(1,2,3,1,2,1,2,3,4,5,1,1,2,1,2,3)
value<-rnorm(16, mean=5, sd=1)
mydf<-data.frame(id, week, value)

id refers to a particular person, so some individuals have more observations than others. I'd like to take a sample of individuals from the dataframe, but so that for each sampled individuals, all this individual's rows would be included into the sample. If I do

mydf[sample(nrow(mydf),3),]

I obviously just get three random rows, when I'd like to get, for instance

 id  week  value
  1    1  4.439524
  1    2  4.769823
  1    3  6.558708
  4    1  6.224082
  6    1  5.110683
  6    2  4.444159
  6    3  6.786913

How to sample rows with this constraint? Thank you in advance!

1

There are 1 best solutions below

2
s_baldur On BEST ANSWER

One option:

# set seed for reproducibility
set.seed(958) 

# Sample size
n <- 3
# Take simple random sample from the ids present
sampled_ids <- sample(unique(mydf$id), n)

# Keep only rows of the sampled IDs
mydf[mydf$id %in% sampled_ids, ]

#    id week    value
# 4   2    1 5.070508
# 5   2    2 5.129288
# 12  5    1 5.359814
# 13  5    2 5.400771
# 14  6    1 5.110683
# 15  6    2 4.444159
# 16  6    3 6.786913