I am fitting my data to the lognormal, and I do the KS test in Python and R and I get very different results.
The data are:
series
341 291 283 155 271 270 250 272 209 236 295 214 443 632 310 334 376 305 216 339
In R the code is:
fit = fitdistr(series, "lognormal")$estimate
fit
meanlog
5.66611754205579
sdlog
0.290617205700481
ks.test(series, "plnorm", meanlog=fit[1], sdlog=fit[2], exact=TRUE)
One-sample Kolmogorov-Smirnov test
data: series
D = 0.13421, p-value = 0.8181
alternative hypothesis: two-sided
In Python the code is:
distribution = stats.lognorm
args = distribution.fit(series)
args
(4.2221814852591635, 154.99999999212395, 0.45374242945626875)
stats.kstest(series, distribution.cdf, args, alternative = 'two-sided')
KstestResult(statistic=0.8211678552361514, pvalue=2.6645352591003757e-15)
The SciPy implementation of the log-normal distribution is not parameterized in the same way as it is in the R code. Search for
[scipy] lognorm
here on stackoverflow for many similar questions, and see the note about the parameterization in thelognorm
docstring. Also note that to match the R result, the location parameterloc
must be fixed at the value 0 using the argumentfloc=0
. The R implementation does not include a location parameter.Here's a script that shows how to get the same values that are reported by R:
Output:
The
kstest
function in SciPy does not have an option to compute the exact p-value. To compare its value to R, you can useexact=FALSE
infitdistr
: