I keep hitting the same error when trying to fit distribution to data using fitdistrplus. MWE is below. In short, I want to fit a Poisson binomial distribution to some data. I'm using the poibin R package for the Poisson binomial p,d,q,r functions (I've also tried poisbinom with same error). In the MWE I create dd, the vector of successes. I'm trying to use fitdist then to fit the distribution given the starting values in the start list. The error says (I think) that I'm giving it start values that have names that aren't in the dpoibin function, which is where I'm stuck.
library(fitdistrplus)
library(poibin)
set.seed(123)
dd <- rpoibin(10, pp=seq(0.1, 0.5, length.out=10))
ppp <- runif(10)
ret <- try(fitdistrplus::fitdist(dd, distr=dpoibin,
start=list(pp = ppp)))
Error message:
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, : 'start' must specify names which are arguments to 'distr'.
The error comes from the function
fitdistrplus:::checkparamlist, which is called byfitdistto ensure the names in the list passed tostartmatch the parameter names in the function passed todistr. When you pass a vector likepppas a parameter instart,checkparamlistrenames each element of the vector by appending an integer. This means the argument names become"pp1", "pp2", "pp3"and so on up to"pp10". Since there is no argument being passed calledpp, an error is thrown.I'm not sure if there is a way to estimate vectorized parameters in
fitdistdue to this problem, but fortunately in this case we can easily just fit the distribution ourselves.Since we know the mean of the distribution is
and the variance is
(Reference)
Then we know that if we have a sample
dd, the following function will return 0 ifppfits the distribution perfectly:To demonstrate this works, let's take a much larger sample from
rpoibinNow we find the set of values that optimizes our objective function:
We can confirm this is a good fit by plotting a histogram and overlay the output of
dpoibinwith our calculated values for theppparameter:Note that there could be many solutions to the optimal value of
pp, and we should not expect to getseq(0.1, 0.5, length.out = 10). For a start, order does not make a difference. We can see ourpp_opthas a very similar mean and variance toseq(0.1, 0.5, length.out = 10), which is all that matters in terms of fitting the distributionIn general, it is not possible to recover
ppexactly from a given sample due to the ordering and the fact that an infinite number of sets have the same distribution and calculated variance.Created on 2023-07-18 with reprex v2.0.2