I need a code or idea for the case that we have a dataset of 1000 rows. I want to subsample from rows with the size of 800 for multiple times (I dont know how many times should I repeat). How should I control that all members would be picked at least in one run? I need the code in r.
To make the question more clear, lets define the row names as:
rownames(dataset) = A,B,C,D,E,F,G,H,J,I
if I subsample 3 times:
A,B,C,D,E,F,G,H
D,E,A,B,H,J,F,C
F,H,E,A,B,C,D,J
The I is not in any of the subsample sets. I would like to do subsampling for 90 or 80 percent of the data for many times but I expect all the rows would be chosen at least in one of the subsample sets. In the above sample the element I should be picked in at least one of the subsamples.
One way to do this is random sampling without replacement to designate a set of "forced" random picks, in other words have a single guaranteed appearance of each row, and decide ahead of time which subsample that guaranteed appearance will be in. Then, randomly sample the rest of the subsample.