I am using Mclust to estimate probability of component membership, but "density" is not included in the output from me.weighted(). Consequently, I am unable to plot probability density. The following code is lengthy because I want to clearly illustrate my purpose and problem, but I clearly indicate where my problem/question arises. My last chunk of code is my attempt at a solution, but it probably only highlights my ignorance of probability densities.
In this research project, my first objective is to compute an index of age-1 fish abundance for subsequent analysis. For that, I want to estimate the proportion of age-1 fish at specific lengths (i.e. an age-length key). It is reasonable to assume that the smaller mode is mostly age-1 fish and the larger mode is of age-2+ fish. My data are fish body length (fork length, cm) and abundance as a proportion of total (i.e., weighted univariate). Note, some outlying large lengths with small proportions were omitted; thus, sum(dat.df$proportions) < 1.
My specific aim here is to illustrate the probability densities superimposed on fish size composition, which reflects two age groups. Basically, in the last chunk of ggplot code, I want to swap out the estimated probability of membership to each (red) or either (green) component with probability densities becauses it would make a nice, informative figure in my manuscript.
I have read relevant articles (Murphy; Scrucca et al.; Mignan; R-Bloggers, etc), but found no answer.
So, I would greatly appreciate any help on how to compute the probability densities for each component and also the component-combined probability density.
Packages
library(ggplot2)
library(mclust)
Data
dat.df <- data.frame(flcm = 15:33, proportion = c(0.0043, 0.0114, 0.0296, 0.0519, 0.0540, 0.0403, 0.0294, 0.0152, 0.0257, 0.0793, 0.1458, 0.1505, 0.1277, 0.0909, 0.0389, 0.0308, 0.0121, 0.0101, 0.0085), z1 = c(rep(1,9), rep(0,10)), z2 = c(rep(0,9), rep(1,10)))
Plot data
ggplot()+
geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
fill = "gray", position="dodge", stat="identity")+
xlab("Fork length (cm)")+
ylab("Probability density")+
theme_bw()
WITHOUT WEIGHTS (i.e., ignore dat.df$proportion)
Fit mixture model without weights
mod1 <- densityMclust(dat.df[, "flcm"], modelName = "V")
Plot probability density
plot(mod1, what = "density", data = dat.df$flcm, breaks = 5)
WITH WEIGHTS (i.e., include dat.df$proportion)
Refit model with weights
mod1_w <- me.weighted(modelName = "V",
data = dat.df$flcm,
z = cbind(dat.df$z1, dat.df$z2),
weights = dat.df$proportion)
Plot data with estimated fractional membership (updated z)
ggplot()+
geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
fill = "gray", position="dodge", stat="identity")+
geom_line(aes(x = dat.df$flcm,
y = (mod1_w$z[,1] * dat.df$proportion)),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = (mod1_w$z[,2] * dat.df$proportion)),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = (mod1_w$z[,1] * dat.df$proportion) +
mod1_w$z[,2] * dat.df$proportion),
color = "green") +
xlab("Fork length (cm)")+
ylab("Probability density")+
theme_bw()
Plot probability density - Here's where my problem/question arises
plot(mod1_w, what = "density", data = dat.df$flcm, breaks = 5)`
Here's my attempted solution. Basically, for each component (age1, age2), multiply probabilities and scale to proportional abundance:
#age1 probability density
age1 <- mod1_w$z[,1]* #probability of age1 membership multiplied by
dnorm(dat.df$flcm, mod1_w$parameters$mean[1], #probability of flcm given age1
mod1_w$parameters$variance$sigmasq[1])*
sum(mod1_w$z[,1]*mod1_w$weights) #and scaled to proportional abundance of age1
#age2 probability density
age2 <- mod1_w$z[,2]* #probability of age2 membership multiplied by
dnorm(dat.df$flcm, mod1_w$parameters$mean[2],
mod1_w$parameters$variance$sigmasq[2])* #probability of flcm given age2
sum(mod1_w$z[,2]*mod1_w$weights) #and scaled to proportional abundance of age2
#combined ages probability density
age_all <- age1 + age2
#looks bad - the probability densities don't correspond well with proportional abundance
ggplot()+
geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
fill = "gray", position="dodge", stat="identity")+
geom_line(aes(x = dat.df$flcm,
y = age1),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = age2),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = age_all),
color = "green") +
xlab("Fork length (cm)")+
ylab("Probability density")+
theme_bw()
I am posting a solution to my own question; hopefully, this will help others. Basically, I switched to the package mxdist, which gives me the desired output as indicated in the following code.