I have a dataset of project responses in which each row is a project, and each project can be related to up to three sectors, similar to this:
project = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
sect1 = c("wat", "wat", "wat", "wat", "lan", "lan", "lan", "tra", "tra", "air"),
sect2 = c("lan", "lan", "lan", "tra", "tra", "air", "wat", "wat", "wat", "wat"),
sect3 = c(NA, NA, "tra", "air", NA, NA, "pol", NA, "pol", NA),
)
So every project can be related to more than one sectors. I want to estimate the percentage of projects that are related with "wat" from the total number of projects (considering when "wat" appear in either sect1, sect2, sect3 ). I know the total proportion or percentage will be >100. Now I am doing it like this, calculating proportion of each sector per each column and adding the proportions.
#Calculate % for sect1
df <- df %>%
group_by(sect1)%>%
summarise(Freq_sect1 = n())%>% mutate(Prop_sect1 = Freq_sect1 /sum(Freq_sect1))
arrange(desc(Prop_cc1))
names(df)[1] = "sector"
#Calculate % for sect2
df2 <- data1 %>%
group_by(sect2)%>%
summarise(Freq_sect2 = n())%>% mutate(Prop_sect2 = Freq_sect2 /sum(Freq_sect2))
arrange(desc(Prop_cc2))
names(df)[1] = "sector"
#(repeated for sect3)
#Adding proportions to calculate sector_total_prop
df_combined<- left_join(df, df2, df3, by="sector")
df_combined$sector_total_prop<-df_combined$Prop_sect2+df_combined$Prop_sect2+ df_combined$Prop_sect3
I wonder if there is a more efficient way to do this calculation when you have one observation (in my case, project) that can have more than one value (in my case, more than one sector) and you want to estimate the total proportion of that value (in my case, which percentage of projects are related with a specific sector. Thanks