Huge memory usage in pyROOT

888 Views Asked by At

My pyROOT analysis code is using huge amounts of memory. I have reduced the problem to the example code below:

from ROOT import TChain, TH1D

# Load file, chain
chain = TChain("someChain")
inFile = "someFile.root"
chain.Add(inFile)

nentries = chain.GetEntries()

# Declare histograms
h_nTracks = TH1D("h_nTracks", "h_nTracks", 16, -0.5, 15.5)
h_E = TH1D("h_E","h_E",100,-0.1,6.0)
h_p = TH1D("h_p", "h_p", 100, -0.1, 6.0)
h_ECLEnergy = TH1D("h_ECLEnergy","h_ECLEnergy",100,-0.1,14.0)

# Loop over entries
for jentry in range(nentries):
   # Load entry
   entry = chain.GetEntry(jentry)

   # Define variables
   cands = chain.__ncandidates__
   nTracks = chain.nTracks
   E = chain.useCMSFrame__boE__bc
   p = chain.useCMSFrame__bop__bc
   ECLEnergy = chain.useCMSFrame__boECLEnergy__bc

   # Fill histos
   h_nTracks.Fill(nTracks)
   h_ECLEnergy.Fill(ECLEnergy)

   for cand in range(cands):
      h_E.Fill(E[cand])
      h_p.Fill(p[cand])

where someFile.root is a root file with 700,000 entries and multiple particle candidates per entry.

When I run this script it uses ~600 MB of memory. If I remove the line

h_p.Fill(p[cand])

it uses ~400 MB.

If I also remove the line

h_E.Fill(E[cand])

it uses ~150 MB.

If I also remove the lines

h_nTracks.Fill(nTracks)
h_ECLEnergy.Fill(ECLEnergy)

there is no further reduction in memory usage.

It seems that for every extra histogram that I fill of the form

h_variable.Fill(variable[cand])

(i.e. histograms that are filled once per candidate per entry, as opposed to histograms that are just filled once per entry) I use an extra ~200 MB of memory. This becomes a serious problem when I have 10 or more histograms because I am using GBs of memory and I am exceeding the limits of my computing system. Does anybody have a solution?

Update: I think this is a python3 problem.

If I take the script in my original post (above) and run it using python2 the memory usage is ~200 MB, compared to ~600 MB with python3. Even if I try to replicate Problem 2 by using the long variable names, the job still only uses ~200 MB of memory with python2, compared to ~1.3 GB with python3.

During my Googling I came across a few other accounts of people encountering memory leaks when using pyROOT with python3. It seems this is still an issue as of Python 3.6.2 and ROOT 6.08/06, and that for the moment you must use python2 if you want to use pyROOT.

So, using python2 appears to be my "solution" for now, but it's not ideal. If anybody has any further information or suggestions I'd be grateful to hear from you!

1

There are 1 best solutions below

0
On

I'm glad you figured out Python3 was the problem. But if you (or anyone) continues to have memory usage issues when working with histograms in the future, here are some potential solutions I hope you'll find helpful!

THnSparse

Use THnSparse--THnSparse is an efficient multidimensional histogram that shows its strengths in histograms where only a small fraction of the total bins are filled. You can read more about it here.

TTree

TTrees are data structures in ROOT that are quite frankly glorified tables. However, they are highly optimized. A TTree is composed of branches and leaves that contain data that, through ROOT, can be speedily and efficiently accessed. If you put your data into a TTree first, and then read it into a histogram, I guarantee you will find lower memory usage and higher run times.

Here is some example TTree code.

root_file_path = "../hadd_www.root"

muon_ps = ROOT.TFile(root_file_path)
muon_ps_tree = muon_ps.Get("WWWNtuple")
muon_ps_branches = muon_ps_tree.GetListOfBranches()
canv= ROOT.TCanvas()

num_of_events = 5000

ttvhist = ROOT.TH1F('Statistics2', 'Jet eta for ttV (aqua) vs WWW (white); Pseudorapidity',100, -3, 3)
i = 0

muon_ps_tree.GetEntry(i)
print len(muon_ps_tree.jet_eta)

#sys.exit()
while muon_ps_tree.GetEntry(i):
    if i > num_of_events: break
    for k in range(0,len(muon_ps_tree.jet_eta)-1):
        wwwhist.Fill(float(muon_ps_tree.jet_eta[0]), 1)
    i += 1  

ttvhist.Write()
ttvhist.Draw("hist")
ttvhist.SetFillColor(70);

And here's a resource where you can learn about how fantastic TTrees are:

TTree ROOT documentation

For more reading, here is a discussion on speeding up ROOT historgram building on the CERN help forum:

Memory-conservative histograms for usage in DQ monitoring

Best of luck with your data analysis, and happy coding!