I am trying to get a curve to fit to this scatter data that gives me a Gaussian curve:
library(tidyverse)
MK20 <- tribble(~X.Intensity, ~Average,
0.400, 0.0000000,
0.463, 0.0000000,
0.536, 0.000000,
0.621, 0.0000000,
0.719, 0.0000000,
0.833, 0.0000000,
0.965, 0.0000000,
1.120, 0.0000000,
1.290, 0.0000000,
1.500, 0.0000000,
1.740, 0.0000000,
2.010, 0.0000000,
2.330, 0.0000000,
2.700, 0.0000000,
3.120, 0.0000000,
3.620, 0.0000000,
4.190, 0.0000000,
4.850, 0.0000000,
5.610, 0.0000000,
6.500, 0.0000000,
7.530, 0.0000000,
8.720, 0.0000000,
10.100, 0.0000000,
11.700, 0.0000000,
13.500, 0.0000000,
15.700, 0.0000000,
18.200, 0.0000000,
21.000, 0.0000000,
24.400, 0.0000000,
28.200, 0.0000000,
32.700, 0.0000000,
37.800, 0.0000000,
43.800, 0.7023333,
50.700, 3.3700000,
58.800, 7.3933333,
68.100, 11.4666667,
78.800, 14.3666667,
91.300, 15.4000000,
106.000, 14.5000000,
122.000, 12.0000000,
142.000, 8.6433333,
164.000, 5.2200000,
190.000, 2.4500000,
220.000, 0.7580000,
255.000, 0.1306667,
295.000, 0.0000000,
342.000, 0.0000000,
396.000, 0.0000000,
459.000, 0.0000000,
531.000, 0.0000000,
615.000, 0.0000000,
712.000, 0.0000000,
825.000, 0.0000000,
955.000, 0.0000000,
1110.000, 0.0000000,
1280.000, 0.0000000,
1480.000, 0.0000000,
1720.000, 0.0000000,
1990.000, 0.0000000,
2300.000, 0.0000000,
2670.000, 0.0000000,
3090.000, 0.0000000,
3580.000, 0.0000000,
4150.000, 0.0000000,
4800.000, 0.0000000,
5560.000, 0.0000000,
6440.000, 0.0000000,
7460.000, 0.0000000,
8630.000, 0.0000000)
The code I'm using to plot is:
plot(log10(MK20$X.Intensity), MK20$Average, col=1, xlim=c(-0.5,4),
ylim=c(0,20), xlab="Log(Average diameter)", ylab="Intensity", xaxt='n')
I'm using the minor.tick.axis function to add minor ticks on the logarithmic x axis. I want to add a Gaussian curve (which fits best) to this data. I tried to add a type='l'
on the plot but the curve wasn't smooth and I don't want a curve that necessarily touches every data point but one that fits best.
We can't use the usual
fitdistr
approach to fit a normal distribution in this case because we don't have the original data. It looks like the 'Average' column is some type of density estimate. If it was a pdf then it should integrated to 1 but it doesn't.So we could turn this into a pdf by scaling it down by about 6.14 and then try finding the mean and standard deviation to match that pdf.
Here is a first attempt at a simple Gaussian fit. First I chose a mean of 2 (by looking at where the density was the greatest), a scaling factor of k = 6.14 (the value of the integral) and then played with the sd until there was a reasonable fit.
Next I used optimx to fit the 3 parameters (k = scaling factor, m = mean, s = standard deviation) by minimising the sum of squares between the fit and the data.
Objective function (sum of squares of differences between fit and data)
Use optimx to find parameters (minimise sum of squares) The initial values for the parameters are taken from the fit by eye.
Lets replot with the fitted parameters