It would be great if someone can check whether my approach is correct or not. Question in short will be, if the error calculation is the correct way. lets assume i have the following data.
data = c(23.7,25.47,25.16,23.08,24.86,27.89,25.9,25.08,25.08,24.16,20.89)
Furthermore i want to check if my data follows a normal distribution.
Edit: I know that there are tests etc. but i will concentrate on constructing the qqplot with confidence lines. I know that there is a method in the car package, but i want to understand the building of these lines.
So i calculate the percentiles for my sample data as well as for my theoretical distribution (with estimated mu = 24.6609
and sigma = 1.6828
. So i end up with these two vectors containing the percentiles.
percentileReal = c(23.08,23.7,24.16,24.86,25.08,25.08,25.16,25.47,25.90)
percentileTheo = c(22.50,23.24,23.78,24.23,24.66,25.09,25.54,26.08,26.82)
Now i want to calculate the confidence intervall for alpha=0.05
for the theoretical percentiles. If i rembember myself correct, the formula is given by
error = z*sigma/sqrt(n),
value = +- error
with n=length(data)
and z=quantil of the normal distribution for the given p
.
So in order to get the confidence intervall for the 2nd percentile i'll do the following:
error = (qnorm(20+alpha/2,mu,sigma)-qnorm(20-alpha/2,mu,sigma))*sigma/sqrt(n)
Insert the values:
error = (qnorm(0.225,24.6609,1.6828)-qnorm(0.175,24.6609,1.6828)) * 1.6828/sqrt(11)
error = 0.152985
confidenceInterval(for 2nd percentil) = [23.24+0.152985,23.24-0.152985]
confidenceInterval(for 2nd percentil) = [23.0870,23.3929]
Finally i have
percentileTheoLower = c(...,23.0870,.....)
percentileTheoUpper = c(...,23.3929,.....)
same for the rest....
So what do you think, can i go with it?
If your goal is to test if the data follows a normal distribution, use the shapiro.wilk test:
1-p
is the probability that the distribution is non-normal. So, sincep>0.05
we cannot assert that the distribution is non-normal. A crude interpretation is that "there is a 53% chance that the distribution is normal."You can also use
qqplot(...)
. The more nearly linear this plot is, the more likely it is that your data is normally distributed.Finally, there is the nortest package in R which has, among other things, the Pearson Chi-Sq test for normality:
This (more conservative) test suggest that there is only a 29% chance that the distribution is normal. All these tests are fully explained in the documentation.