I'm using the last column from the following data, Data
And I'm trying to apply the idea of a kernel density estimator to this dataset which is represented by
where k is some kernal, normally a normal distribution though not necessarily., h is the bandwidth, n is the length of the data set, X_i is each data point and x is a fitted value. So using this equation I have the following code,
AstroData=read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/galaxy.dat"),
header=FALSE)
x=AstroData$V3
xsorted=sort(x)
x_i=xsorted[1:1266]
hist(x_i, nclass=308)
n=length(x_i)
h1=.002
t=seq(min(x_i),max(x_i),0.01)
M=length(t)
fhat1=rep(0,M)
for (i in 1:M){
fhat1[i]=sum(dnorm((t[i]-x_i)/h1))/(n*h1)}
lines(t, fhat1, lwd=2, col="red")
Which produces a the following plot,
which is actually close to what I want as the final result should appear as this once I remove the histograms,
Which if you noticed is finer tuned and the red lines which should represent the density are rather rough and are not scaled as high. The final plot that you see is run using the density function in R,
plot(density(x=y, bw=.002))
Which is what I want to get to without having to use any additional packages.
Thank you
After some talk with my roommate he gave me the idea to go ahead and decrease the interval of the t-values (x). In doing some I changed it from 0.01 to 0.001. So the final code for this plot is as appears,
Which in terms gives the following plot, which is the one that I wanted,