I have a dataset where I have a known sample_size
and percent
"yes" for each year/location/ID combination. I am trying to use sample_size
and percent
to back-calculate the full dataset, which would be a binary Y_N column (where 1 is yes and 0 is no), corresponding to the percent in the percent
column.
Here is an example of the dataset:
table <- "year location ID sample_size percent
1 2000 A 1a 10 40
2 2001 A 1a 10 30
3 2000 B 2a 10 70
4 2001 B 2a 10 90
5 2005 C 1a 20 10
6 2006 C 1a 20 50"
#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df
And here is a subset of what that would look like expanded for the first two year/location/ID combinations:
table <- "year location ID Y_N
1 2000 A 1a 1
2 2000 A 1a 1
3 2000 A 1a 1
4 2000 A 1a 1
5 2000 A 1a 0
6 2000 A 1a 0
7 2000 A 1a 0
8 2000 A 1a 0
9 2000 A 1a 0
10 2000 A 1a 0
11 2001 A 1a 1
12 2001 A 1a 1
13 2001 A 1a 1
14 2001 A 1a 0
15 2001 A 1a 0
16 2001 A 1a 0
17 2001 A 1a 0
18 2001 A 1a 0
19 2001 A 1a 0
20 2001 A 1a 0"
Is there a way to do this, say with dplyr mutate()
, so that the expanded dataset can be grouped by each unique year/location/ID combination?
You could first compute the number of yes and no, then reshape to long and afterwards use
tidyr::uncount
like so: