How to do a weighted fit for a with a set of data containing f(x) +- df(x)?

4.5k Views Asked by At

I have a function f(x) = a/x and I have a set of data containing values for f(x) +- df(x) and x +- dx. How do I tell gnuplot to do a weighted fit for a with that?

I know that fitaccepts the using term and this works for df(x), but it does not work for dx. It seems gnuplot treats the error I have for x as the error for the whole RHS of my function (a/x +- dx).

How do I do a weighted fit that fits f(x) +- df(x) = a/(x +- dx) to find the optimal a?

3

There are 3 best solutions below

0
On BEST ANSWER

Since version 5.0, gnuplot has an explicit provision for taking uncertainty in the independent variable into account

fit f(x) datafile using 1:2:3:4 xyerror

using "Orear's effective variance method".

(Above command expects data in the form x y dx dy.)

3
On

You're fitting an equation like:

 z = a/(x +- dx)

This can be equivalently written as:

 z = a/x +- dz

for an appropriate dz.

I think (if my calculus and statistics serve correctly), that you can calculate dz from x and dx by:

dz = partial_z/partial_x*dx

provided that dx is small.

For this case, that yields:

dz = -a/x**2*dx

So now you have a function of 2 variables (z = a/x - (a/x**2)*dx) that you want to fit for 1 constant. Of course, I could be wrong about this ... It's been a while since I've played with this stuff.

3
On

Here a simple example will suffice to prove that gnuplot is doing what you want:

Construct a flat text file data.dat with the following toy model data:

#f  df  x  dx
1  0.1  1  0.1
2  0.1  2  0.1
3  0.1  3  0.1
4  0.1  4  0.1
10 1.0  5  0.1  

Just by eying the data, we expect a good model would be a direct proportionality f = x, with one obvious outlier at x = 5, f = 10. We may tell gnuplot to fit this data in two very different ways. The following script weightedFit.gp demonstrates the difference:



    # This file is called weightedFit.gp
    #
    # Gnuplot script file for demonstrating the difference between a 
    #  weighted least-squares fit and an unweighted fit, using mock data in "data.dat"
    #
    # columns in the .dat are 
    # 1:f, 2:d_f, 3: x, 4: d_x
    # x is the independent variable and f is the dependent variable

    # you have to play with the terminal settings based on your system
    # set term wxt
    #set term xterm

     set   autoscale                        # scale axes automatically
     unset log                              # remove any log-scaling
     unset label                            # remove any previous labels
     set xtic auto                          # set xtics automatically
     set ytic auto                          # set ytics automatically
     set key top left

    # change plot labels!
     set title "Weighted and Un-weighted fits"
     set xlabel "x"
     set ylabel "f(x)"
     #set key 0.01,100

    # start with these commented for auto-ranges, then zoom where you want!
    set xr [-0.5:5.5]
    #set yr [-50:550]

    #this allows you to access ASE values of var using var_err
     set fit errorvariables

    ## fit syntax is x:y:Delta_y column numbers from data.dat

    #Fit data as linear, allowing intercept to float
    f(x)=m*x+b
    fW(x)=mW*x+bW

    # Here's the important difference.  First fit with no uncertainty weights:
    fit f(x) 'data.dat' using 3:1 via m, b
    chi = sprintf("chiSq = %.3f", FIT_WSSR/FIT_NDF)
    t = sprintf("f = %.5f x + %.5f", m, b)
    errors = sprintf("Delta_m = %.5f, Delta_b = %.5f", m_err, b_err)

    # Now, weighted fit by properly accounting for uncertainty on each data point:
    fit fW(x) 'data.dat' using 3:1:2 via mW, bW
    chiW = sprintf("chiSqW = %.3f", FIT_WSSR/FIT_NDF)
    tW = sprintf("fW = %.5f x + %.5f", mW, bW)
    errorsW = sprintf("Delta_mW = %.5f, Delta_bW = %.5f", mW_err, bW_err)

    # Pretty up the plot
    set label 1 errors at 0,8
    set label 2 chi at 0,7
    set label 3 errorsW at 0,5
    set label 4 chiW at 0,4

    # Save fit results to disk
    save var 'fit_params'

    ## plot using x:y:Delta_x:Delta_y column numbers from data.dat

    plot "data.dat" using 3:1:4:2 with xyerrorbars title 'Measured f vs. x', \
         f(x) title t, \
         fW(x) title tW

    set term jpeg
    set output 'weightedFit.jpg'
    replot
    set term wxt

The generated plot weightedFit.jpg tells the story: the green fit does not take data point uncertainties into account and is a bad model for the data. The blue fit accounts for the large uncertainty in the outlier and properly fits the proportionality model, obtaining slope 1.02 +/- 0.13 and intercept -0.05 +/- 0.35.

As I just joined today, I lack the '10 reputation' needed to post images so you'll just have to run the script yourself to see the fit. Once you have the script and data file in your working directory, do:

gnuplot> load 'weightedFit.gp'

Your fit.log should look like this:

*******************************************************************************
Thu Aug 20 14:09:57 2015


FIT:    data read from 'data.dat' using 3:1
        format = x:z
        x range restricted to [-0.500000 : 5.50000]
        #datapoints = 5
        residuals are weighted equally (unit weight)

function used for fitting: f(x)
    f(x)=m*x+b
fitted parameters initialized with current variable values

iter      chisq       delta/lim  lambda   m             b            
   0 1.0000000000e+01  0.00e+00 4.90e+00  2.000000e+00 -2.000000e+00
   1 1.0000000000e+01  0.00e+00 4.90e+02  2.000000e+00 -2.000000e+00

After 1 iterations the fit converged.
final sum of squares of residuals : 10
rel. change during last iteration : 0

degrees of freedom    (FIT_NDF)                        : 3
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 1.82574
variance of residuals (reduced chisquare) = WSSR/ndf   : 3.33333

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
m               = 2                +/- 0.5774       (28.87%)
b               = -2               +/- 1.915        (95.74%)

correlation matrix of the fit parameters:
                m      b      
m               1.000 
b              -0.905  1.000 


*******************************************************************************
Thu Aug 20 14:09:57 2015


FIT:    data read from 'data.dat' using 3:1:2
        format = x:z:s
        x range restricted to [-0.500000 : 5.50000]
        #datapoints = 5
function used for fitting: fW(x)
    fW(x)=mW*x+bW
fitted parameters initialized with current variable values

iter      chisq       delta/lim  lambda   mW            bW           
   0 2.4630541872e+01  0.00e+00 1.78e+01  1.024631e+00 -4.926108e-02
   1 2.4630541872e+01  0.00e+00 1.78e+02  1.024631e+00 -4.926108e-02

After 1 iterations the fit converged.
final sum of squares of residuals : 24.6305
rel. change during last iteration : 0

degrees of freedom    (FIT_NDF)                        : 3
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 2.86534
variance of residuals (reduced chisquare) = WSSR/ndf   : 8.21018
p-value of the Chisq distribution (FIT_P)              : 1.84454e-005

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
mW              = 1.02463          +/- 0.1274       (12.43%)
bW              = -0.0492611       +/- 0.3498       (710%)

correlation matrix of the fit parameters:
                mW     bW     
mW              1.000 
bW             -0.912  1.000 

See http://gnuplot.info/ for documentation. Cheers!