Use of data with negative weights in unbinned maximum likelihood fit in zfit

382 Views Asked by At

I am trying to perform an unbinned 3D angular fit in zfit, where the input data is a sample with per-event sWeights assigned from a separate invariant mass peak fit. I think I'm running into issues of negatively weighted events in some regions of the angular phase space, as zfit gives the error:

Traceback (most recent call last):
  File "unbinned_angular_fit.py", line 282, in <module>
    main()
  File "unbinned_angular_fit.py", line 217, in main
    result = minimizer.minimize(nll)
  File "/home/dhill/miniconda/envs/ana_env/lib/python3.7/site-packages/zfit/minimizers/baseminimizer.py", line 265, in minimize
    return self._hook_minimize(loss=loss, params=params)
  File "/home/dhill/miniconda/envs/ana_env/lib/python3.7/site-packages/zfit/minimizers/baseminimizer.py", line 274, in _hook_minimize
    return self._call_minimize(loss=loss, params=params)
  File "/home/dhill/miniconda/envs/ana_env/lib/python3.7/site-packages/zfit/minimizers/baseminimizer.py", line 278, in _call_minimize
    return self._minimize(loss=loss, params=params)
  File "/home/dhill/miniconda/envs/ana_env/lib/python3.7/site-packages/zfit/minimizers/minimizer_minuit.py", line 179, in _minimize
    result = minimizer.migrad(**minimize_options)
  File "src/iminuit/_libiminuit.pyx", line 859, in iminuit._libiminuit.Minuit.migrad
RuntimeError: exception was raised in user function
User function arguments:
    Hm_amp = +nan
    Hm_phi = +0.000000
    Hp_phi = +0.000000
Original python exception in user function:
RuntimeError: Loss starts already with NaN, cannot minimize.
  File "/home/dhill/miniconda/envs/ana_env/lib/python3.7/site-packages/zfit/minimizers/minimizer_minuit.py", line 121, in func
    values=info_values)
  File "/home/dhill/miniconda/envs/ana_env/lib/python3.7/site-packages/zfit/minimizers/baseminimizer.py", line 47, in minimize_nan
    return self._minimize_nan(loss=loss, params=params, minimizer=minimizer, values=values)
  File "/home/dhill/miniconda/envs/ana_env/lib/python3.7/site-packages/zfit/minimizers/baseminimizer.py", line 107, in _minimize_nan
    raise RuntimeError("Loss starts already with NaN, cannot minimize.")

I can avoid this error by restricting one of the fit observable ranges slightly, to avoid the region with small numbers of data events where some data is weighted negatively (signal is being slightly over-subtracted by the sWeights). But I wondered if there is another way around this in zfit?

Perhaps the UnbinnedNLL method in zfit explicitly requires positive events, but the negatively weighted data points could be set to zero or a small positive value instead? I should say that the level of negative weighting appears to be small compared to the total sum of weights, and occurs at the edge of one of the angular distributions where there are only a small number of data events. The low rate of data in this region is due to experimental acceptance effects.

Code that runs on a test file to reproduce the error is here: https://github.com/donalrinho/zfit_3D_unbinned_angular_fit_test

The error is not encountered when the range for the costheta_X_VV_reco variable is restricted to (-0.9, 1.0) instead of the full range (-1.0, 1.0). I believe this is because it removes a region of phase space where the weighted data is negative.

2

There are 2 best solutions below

0
On

just to close this thread, it is the case that the PDF is negative in some places which I think is due to the acceptance PDF. It's also true that h_pst ins't used, but removing it didn't change anything. In the end I have just fitted the data in the region where I don't have any negative PDF values, which doesn't seem to impact the results (it just ignores a small region in costheta_X where the density is close to zero).

0
On

As far as can be seen in the definition of the NLL in zfit, the weights simply multiply the log probabilities, so negative weights should not be an issue.

However, it seems that the PDF returns negative probabilities for some of the values, which you can be seen by simply getting the returned array with

custom_pdf.pdf(data)

This negative probabilities will turn into NaNs once the log is taken.

Maybe there is a typo in the definition of the PDF as the variable h_pst seems to be unused.