Trees written with uproot, cannot be read properly

149 Views Asked by At

I have created a script that reads from some root files, does some processing, and creates a slimmed version of these files, using uproot3 (3.14.4). Then I use these files I created to create histograms needed for my analysis. The problem that I noticed is that some of the produced files cannot be read properly.

An example of the traceback that I get is:

Traceback (most recent call last):
  File "Create_histos.py", line 1690, in <module>
    Frame = calculate_yields(process,name,region,Type,paths)
  File "Create_histos.py", line 1486, in calculate_yields
    FF=reading_BIG_and_CUTS(ff,region)
  File "Create_histos.py", line 42, in reading_BIG_and_CUTS
    foo=FF.pandas.df(entrystart=i_min,entrystop=i_max,flatten = False)
  File "/afs/cern.ch/user/a/atsiamis/test2/venv_3_8/lib64/python3.8/site-packages/uproot3/_connect/_pandas.py", line 32, in df
    return self._tree.arrays(branches=branches, outputtype=pandas.DataFrame, namedecode=namedecode, entrystart=entrystart, entrystop=entrystop, flatten=flatten, flatname=flatname, awkwardlib=awkwardlib, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor, blocking=blocking)
  File "/afs/cern.ch/user/a/atsiamis/test2/venv_3_8/lib64/python3.8/site-packages/uproot3/tree.py", line 563, in arrays
    futures = [(branch.name if namedecode is None else branch.name.decode(namedecode), interpretation, branch.array(interpretation=interpretation, entrystart=entrystart, entrystop=entrystop, flatten=(flatten and not ispandas), awkwardlib=awkward0, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor, blocking=False)) for branch, interpretation in branches]
  File "/afs/cern.ch/user/a/atsiamis/test2/venv_3_8/lib64/python3.8/site-packages/uproot3/tree.py", line 563, in <listcomp>
    futures = [(branch.name if namedecode is None else branch.name.decode(namedecode), interpretation, branch.array(interpretation=interpretation, entrystart=entrystart, entrystop=entrystop, flatten=(flatten and not ispandas), awkwardlib=awkward0, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor, blocking=False)) for branch, interpretation in branches]
  File "/afs/cern.ch/user/a/atsiamis/test2/venv_3_8/lib64/python3.8/site-packages/uproot3/tree.py", line 1475, in array
    _delayedraise(fill(j))
  File "/afs/cern.ch/user/a/atsiamis/test2/venv_3_8/lib64/python3.8/site-packages/uproot3/tree.py", line 59, in _delayedraise
    raise err.with_traceback(trc)
  File "/afs/cern.ch/user/a/atsiamis/test2/venv_3_8/lib64/python3.8/site-packages/uproot3/tree.py", line 1443, in fill
    source = self._basket(i, interpretation, local_entrystart, local_entrystop, awkward0, basketcache, keycache)
  File "/afs/cern.ch/user/a/atsiamis/test2/venv_3_8/lib64/python3.8/site-packages/uproot3/tree.py", line 1247, in _basket
    byteoffsets = awkward0.numpy.empty((key._fObjlen - key.border - 4) // 4, dtype=awkward0.numpy.int32)  # native endian
ValueError: negative dimensions are not allowed

From my understanding, it seems that some of the variables are not filled, and as a result when trying to access them with my script (i.e. create a pandas dataframe), it produces this error. This error also occurs with the array() method. I crossed-check to make sure that there is not any problem with the original files, which seem to be fine. The weirdest part is that this kind of error is not consistent , i.e. something that occurs in all produced files, but it happens "randomly" every now and then. Also, just to let you know, I used to use this specific script for a subset of the files, which seem to work fine, but when moving to all of them, the problem started occuring, even in the files that seemingly were ok before. Is there a known bug with the uproot.recreate module for uproot3 version?

Here is an example of the module that I use to create my root files :

with uproot.recreate(main_path+REGION+tag+'/'+'*'+DIR+'*'+'_'+FILE+'.root') as f:
    tree = uproot.newtree({br:'float64' for br in cols})
    f["tree"] = tree
    f['tree'].extend({br:Data[br] for br in cols})

cols are the variables that I store to the root file, and Data is a pandas Dataframe. After further checking, I see that the basket_compressedbytes(), basket_uncompressedbytes() equal to 0, which is not the case for the "not problematic variables", which indicates that something wrong is going on during the writing of the trees. Is it possible that the issue occurs, because I do not set a specific compression scheme at the recreate method?

0

There are 0 best solutions below