Using the correlation matrix after a fit in Gnuplot

429 Views Asked by At

Say I need to fit some data to a parabola, and then perform some calculations involving the correlation matrix elements of the fit parameters: is there a way to use these parameters directly in gnuplot after the fit converges? Are they stored in some variable like the error estimates?.

I quote the explicit problem I'm having. All of this is written to a plot.gp text file and ran with gnuplot plot.gp.

I include set fit errorbariables at the beginning, and then proceed with:

f(x)=a+b*x+c*x*x
fit f(x) 'file.dat' u 1:2:3 yerrors via a,b,c

Once the fit is done, I can use the values of a,b,c and their errors a_err, b_err and c_err directly in the plot.gp script; my question is: can I do the same with the correlation matrix of the parameters?

The problem is that the matrix is printed to terminal once the script finishes to run:

correlation matrix of the fit parameters:
                a      b      e      
a               1.000 
b               0.910  1.000 
c              -0.956 -0.987  1.000 

Are the entries of the matrix stores in some variable (like a_err, b_err) that I can access after the fit is done but before the script ends?

2

There are 2 best solutions below

0
On BEST ANSWER

I think the command you are looking for is

set fit covariancevariables

 If the `covariancevariables` option is turned on, the covariances between
 final parameters will be saved to user-defined variables. The variable name
 for a certain parameter combination is formed by prepending "FIT_COV_" to
 the name of the first parameter and combining the two parameter names by
 "_". For example given the parameters "a" and "b" the covariance variable is
 named "FIT_COV_a_b".
2
On

Edit: I certainly missed gnuplot's intended way via option covariancevariables (apparently available since gnuplot 5.0). Ethan's answer is the way to go. I nevertheless leave my answer, with some modifications it might maybe be useful to extract something else from the fit output.

Maybe I missed it, but I am not aware that you can directly store the elements of the correlation matrix into variables, however, you can do it with some workaround.

You can set the output file for your fit results (check help set fit). The shortest output will be created with the option results. The results will be written to this file (actually, appended if the file already exists).

Example:

After 5 iterations the fit converged.
final sum of squares of residuals : 0.45
rel. change during last iteration : -3.96255e-10

degrees of freedom    (FIT_NDF)                        : 1
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 0.67082
variance of residuals (reduced chisquare) = WSSR/ndf   : 0.45

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
a               = 1.75             +/- 0.3354       (19.17%)
b               = -2.65            +/- 1.704        (64.29%)
c               = 1.75             +/- 1.867        (106.7%)

correlation matrix of the fit parameters:
                a      b      c      
a               1.000 
b              -0.984  1.000 
c               0.898 -0.955  1.000

Now, you can read this file back into a datablock (check gnuplot: load datafile 1:1 into datablock) and extract the values from the last lines (here: 3), check help word and check real.

Script:

### get fit correlation matrix into variables
reset session

$Data <<EOD
1   1
2   3
3   10
4   19
EOD

f(x) = a*x**2 + b*x + c

myFitFILE = "SO71788523_fit.dat"
set fit results logfile myFitFILE
fit f(x) $Data u 1:2 via a,b,c 

set key top left
set grid x,y

# load file 1:1 into datablock
FileToDatablock(f,d) = GPVAL_SYSNAME[1:7] eq "Windows" ? \
                       sprintf('< echo   %s ^<^<EOD  & type "%s"',d,f) : \
                       sprintf('< echo "\%s   <<EOD" & cat  "%s"',d,f)     # Linux/MacOS
load FileToDatablock(myFitFILE,'$FIT')

# extract parameters into variables
N = 3           # number of parameters
getValue(p1,p2) = real(word($FIT[|$FIT|-N+p1],p2+1))   # extract value as floating point number
aa = getValue(1,1)
ba = getValue(2,1)
bb = getValue(2,2)
ca = getValue(3,1)
cb = getValue(3,2)
cc = getValue(3,3)

set label 1 at graph 0.1,graph 0.8 \
    sprintf("Correlation matrix:\naa: %g\nba: %g\nbb: %g\nca: %g\ncb: %g\ncc: %g",aa,ba,bb,ca,cb,cc) 

plot $Data u 1:2 w lp pt 7 lc "red", \
     f(x) w l lc "blue" title sprintf("fit: a=%g, b=%g, c=%g",a,b,c)

### end of script

Result:

enter image description here