I would like to write a function that takes a string as an input. To be exact this input represents a probability distribution and the output would be 5 percentile, mean, and 95 percentile as a vector or list.
For example, if I input: "beta(0.2,0.3)" I would get: 2.793073e-06 0.4 0.9992341
First, thing to do, is read the distribution from the string, this can be done with regular expressions:
dist <- gsub("[^A-z]","","beta(0.2,0.3)")
params <- gsub("[^0-9.,]","","beta(0.2,0.3)")
And the parameters can be put in a numeric vector using a strsplit, e.g.
params <- as.numeric(unlist(strsplit(gsub("[^0-9.,]","","beta(0.2,0.3)"),split=",")))
Now, I need to declare a function for the mean (since to my understanding functions for random distribution means do not exist in R). For beta distribution this would be:
beta_mean <- function(alpha, beta) {
return(alpha / (alpha + beta))
}
And percentiles I can get from qbeta function, i.e.:
qbeta(c(0.05,0.95), params[1], params[2])
Since, I want to deal with many different distributions, is there a more elegant way than:
meanvalue <- NA
percentiles <- NA
if(dist == "beta") {
meanvalue <- beta_mean(params[1], params[2])
percentiles <- qbeta(c(0.05,0.95), params[1], params[2])
} else if (dist == "gamma") {
meanvalue <- gamma_mean(params[1], params[2])
percentiles <- qgamma(c(0.05,0.95), params[1], params[2])
} #and so on and so on and so on...
return(c(percentiles[1],meanvalue,percentiles[2])
SO, what I want to do is to link distribution name string (e.g. "beta" or "gamma" or whatever) to the corresponding functions (DISTRIBUTIONNAME_mean and qDISTRIBUTIONNAME), so I don't have to use that extremely long if-else-structure, which contains too much (unnecessary?) repetition.
How can I accomplish this?
I can only give a partial answer that covers the quantiles:
To get the mean, you would need to parametrize the expectation of all distributions, which is painful.