Creating a percentage column based on the sums of a column grouped by a different column?

46 Views Asked by At

I am wrangling with a huge dataset and my R skills are very new. I am really trying to understand the terminology and processes but finding it a struggle as the R-documentation often makes no sense to me. So apologies if this is a dumb question.

I have data for plant species at different sites with different percentages of ground-cover. I want to create a new column PROP-COVER which gives the proportion of each species' cover as a percentage of the total cover of all species in a particular site. This is slightly different to calculating percentage cover by site area as it is disregards bare ground with no vegetation. This is an easy calculation with just one site, but I have over a hundred sites and need to perform the calculation on species ground-cover grouped by site. The desired column output is PROP-COVER.

SPECIES   SITE  COVER   PROP-COVER(%)
1           1   10         7.7
2           1   20         15.4
3           1   10         7.7
4           1   20         15.4
5           1   30         23.1
6           1   40         30.8
2           2   20         22.2
3           2   50
5           2   10
6           2   10
1           3   5
2           3   25
3           3   40
5           3   10

I have looked at for loops and repeat but I can't see where the arguments should go. Every attempt I make returns a NULL.

Below is an example of something I tried which I am sure is totally wide of the mark, but I just can't work out where to begin with or know if it is even possible.

a<- for (i in data1$COVER) {
  sum(data1$COVER[data1$SITE=="i"],na.rm = TRUE)
}
a
NULL

I have a major brain-blockage when it comes to how 'for' loops etc work, no amount of reading about it seems to help, but perhaps what I am trying to do isn't possible? :(

Many thanks for looking.

1

There are 1 best solutions below

0
On

In Base R:

merge(df, prop.table(xtabs(COVER~SPECIES+SITE, df), 2)*100)

   SPECIES SITE COVER      Freq
1        1    1    10  7.692308
2        1    3     5  6.250000
3        2    1    20 15.384615
4        2    2    20 22.222222
5        2    3    25 31.250000
6        3    1    10  7.692308
7        3    2    50 55.555556
8        3    3    40 50.000000
9        4    1    20 15.384615
10       5    1    30 23.076923
11       5    2    10 11.111111
12       5    3    10 12.500000
13       6    1    40 30.769231
14       6    2    10 11.111111

In tidyverse you can do:

df %>%
   group_by(SITE) %>%
   mutate(n = proportions(COVER) * 100)

# A tibble: 14 x 4
# Groups:   SITE [3]
   SPECIES  SITE COVER     n
     <int> <int> <int> <dbl>
 1       1     1    10  7.69
 2       2     1    20 15.4 
 3       3     1    10  7.69
 4       4     1    20 15.4 
 5       5     1    30 23.1 
 6       6     1    40 30.8 
 7       2     2    20 22.2 
 8       3     2    50 55.6 
 9       5     2    10 11.1 
10       6     2    10 11.1 
11       1     3     5  6.25
12       2     3    25 31.2 
13       3     3    40 50   
14       5     3    10 12.5 

The code could also be written as n = COVER/sum(COVER) or even n = prop.table(COVER)