Suspicious correlation matrix and errors after the fit of weighted data

54 Views Asked by At

I am using zfit to perform an extended unbinned maximum likelihood fit of a sWeighted data sample. In particular, I use zfit.miminize.Minuit. I am facing an issue that if I let yield parameters to get negative values the correlation matrix is got filled with 1's and the fit errors are compatible with 0. While if I constrain lower bound of yield parameters to 0 the correlation matrix and errors look reasonable. Also if I do the fit of a sample of similar composition but not sWeighted I have no problems at all in both configurations.

If I get it right the documentation says that the weights should be taken into account in the error estimation, etc.

"Weighted likelihoods are a special class of likelihoods as they are not an actual likelihood. However, the minimum is still valid, however the profile is not a proper likelihood. Therefore, corrections will be automatically applied to the Hessian uncertainty estimation in order to correct for the effects in the weights. The corrections used are "asymptotically correct" and are described in Parameter uncertainties in weighted unbinned maximum likelihood fitshttps://doi.org/10.1140/epjc/s10052-022-10254-8` by Christoph Langenbruch. Since this method uses the jacobian matrix, it takes significantly longer to calculate than without weights."

Do you know what could be the reason of my issue?

Here is a simplified example of what I am doing. Assuming, observables are defined as well as 1D pdf functions.

signal_pdf = zfit.pdf.ProductPDF(pdfs=[signal_x_pdf,signal_y_pdf,signal_z_pdf])
bkg_pdf = zfit.pdf.ProductPDF(pdfs=[bkg_x_pdf,bkg_y_pdf,bkg_z_pdf])
    
signal_yield = zfit.Parameter('signal_yield', 100, -100, 10000,step_size=1)
bkg_yield = zfit.Parameter('bkg_yield', 100, -100, 10000,step_size=1)

extended_signal = signal_pdf.create_extended(signal_yield)
extended_bkg = bkg_pdf.create_extended(bkg_yield)

model = zfit.pdf.SumPDF([extended_signal,extended_bkg])

data_sw = zfit.Data.from_pandas(obs=obs, df=df[['x','y','z']],weights=df['signal'])
nll_sw = zfit.loss.ExtendedUnbinnedNLL(model, data_sw)
minimizer = zfit.minimize.Minuit(tol=1e-6)
result_sw = minimizer.minimize(nll_sw)
result_sw.hesse()
result_sw.correlation()

Update: the issue persists even if I use weights equal to one.

0

There are 0 best solutions below