How to calculate probability density for individual and for combined components from me.weighted()

159 Views Asked by At

I am using Mclust to estimate probability of component membership, but "density" is not included in the output from me.weighted(). Consequently, I am unable to plot probability density. The following code is lengthy because I want to clearly illustrate my purpose and problem, but I clearly indicate where my problem/question arises. My last chunk of code is my attempt at a solution, but it probably only highlights my ignorance of probability densities.

In this research project, my first objective is to compute an index of age-1 fish abundance for subsequent analysis. For that, I want to estimate the proportion of age-1 fish at specific lengths (i.e. an age-length key). It is reasonable to assume that the smaller mode is mostly age-1 fish and the larger mode is of age-2+ fish. My data are fish body length (fork length, cm) and abundance as a proportion of total (i.e., weighted univariate). Note, some outlying large lengths with small proportions were omitted; thus, sum(dat.df$proportions) < 1.

My specific aim here is to illustrate the probability densities superimposed on fish size composition, which reflects two age groups. Basically, in the last chunk of ggplot code, I want to swap out the estimated probability of membership to each (red) or either (green) component with probability densities becauses it would make a nice, informative figure in my manuscript.

I have read relevant articles (Murphy; Scrucca et al.; Mignan; R-Bloggers, etc), but found no answer.

So, I would greatly appreciate any help on how to compute the probability densities for each component and also the component-combined probability density.

Packages

library(ggplot2)
library(mclust) 

Data

dat.df <- data.frame(flcm = 15:33, proportion = c(0.0043, 0.0114, 0.0296, 0.0519, 0.0540, 0.0403, 0.0294, 0.0152, 0.0257, 0.0793, 0.1458, 0.1505, 0.1277, 0.0909, 0.0389, 0.0308, 0.0121, 0.0101, 0.0085), z1 = c(rep(1,9), rep(0,10)), z2 = c(rep(0,9), rep(1,10)))

Plot data

ggplot()+
  geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
           fill = "gray", position="dodge", stat="identity")+
  xlab("Fork length (cm)")+
  ylab("Probability density")+
  theme_bw()

WITHOUT WEIGHTS (i.e., ignore dat.df$proportion)

Fit mixture model without weights

mod1 <- densityMclust(dat.df[, "flcm"], modelName = "V")

Plot probability density

plot(mod1, what = "density", data = dat.df$flcm, breaks = 5)

WITH WEIGHTS (i.e., include dat.df$proportion)

Refit model with weights

mod1_w <- me.weighted(modelName = "V", 
            data = dat.df$flcm,
            z = cbind(dat.df$z1, dat.df$z2),
            weights = dat.df$proportion)

Plot data with estimated fractional membership (updated z)

ggplot()+
  geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
           fill = "gray", position="dodge", stat="identity")+
  geom_line(aes(x = dat.df$flcm, 
                y = (mod1_w$z[,1] * dat.df$proportion)), 
            color = "red") +
  geom_line(aes(x = dat.df$flcm, 
                y = (mod1_w$z[,2] * dat.df$proportion)), 
            color = "red") +
    geom_line(aes(x = dat.df$flcm, 
                y = (mod1_w$z[,1] * dat.df$proportion) +
                  mod1_w$z[,2] * dat.df$proportion), 
              color = "green") +
  xlab("Fork length (cm)")+
  ylab("Probability density")+
  theme_bw()

Plot probability density - Here's where my problem/question arises

plot(mod1_w, what = "density", data = dat.df$flcm, breaks = 5)`

Here's my attempted solution. Basically, for each component (age1, age2), multiply probabilities and scale to proportional abundance:

#age1 probability density
age1 <- mod1_w$z[,1]* #probability of age1 membership multiplied by
  dnorm(dat.df$flcm, mod1_w$parameters$mean[1], #probability of flcm given age1
        mod1_w$parameters$variance$sigmasq[1])* 
  sum(mod1_w$z[,1]*mod1_w$weights) #and scaled to proportional abundance of age1

#age2 probability density
age2 <- mod1_w$z[,2]* #probability of age2 membership multiplied by
  dnorm(dat.df$flcm, mod1_w$parameters$mean[2],
        mod1_w$parameters$variance$sigmasq[2])* #probability of flcm given age2
  sum(mod1_w$z[,2]*mod1_w$weights) #and scaled to proportional abundance of age2

#combined ages probability density
age_all <- age1 + age2

#looks bad - the probability densities don't correspond well with proportional abundance
ggplot()+
  geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
           fill = "gray", position="dodge", stat="identity")+
  geom_line(aes(x = dat.df$flcm, 
                y = age1), 
            color = "red") +
  geom_line(aes(x = dat.df$flcm, 
                y = age2), 
            color = "red") +
    geom_line(aes(x = dat.df$flcm, 
                y = age_all), 
              color = "green") +
  xlab("Fork length (cm)")+
  ylab("Probability density")+
  theme_bw()
1

There are 1 best solutions below

0
On

I am posting a solution to my own question; hopefully, this will help others. Basically, I switched to the package mxdist, which gives me the desired output as indicated in the following code.

library(mxdist)

#input data (dat.df is created by code in my original question)
dat.mx <- as.mixdata(dat.df[, 1:2])
#preliminary plot
plot(dat.mx)
#define initial parameters
dat_parms <- data.frame(pi=c(0.3, 0.7), mu=c(18, 26), sigma=c(2, 3))
#fit the model
fit1 <- mix(dat.mx, dat_parms, "gamma", constr=mixconstr(consigma="CCV"))
#plot default
plot(fit1)
#replot using ggplot for greater flexibility over appearance
z <- fitted(fit1)
dat.mx[dat.mx$flcm == "Inf", "flcm"] <- 34
ggplot()+
  geom_bar(aes(x=dat.mx$flcm, y=dat.mx$proportion),
           fill = "gray", position="dodge", stat="identity")+
  geom_line(aes(x = dat.mx$flcm, 
                y = z$joint[,1]), 
            color = "red") +
  geom_line(aes(x = dat.mx$flcm, 
                y = z$joint[,2]), 
            color = "red") +
    geom_line(aes(x = dat.mx$flcm, 
                y = z$mixed), 
              color = "green") +
  xlab("Fork length (cm)")+
  ylab("Probability density")+
  theme_bw()
#conditional probabilities are output
z$conditprob