I trying to debug a memory leak in my python ML app. When I run it locally, the memory usage on my local machine increases gradually by 700mb. So I put a breakpoint in when memory use is around its maximum, and run the following:
import sys
import pandas as pd
import numpy as np
global_vars = list(globals().items())
local_vars = list(locals().items())
all_vars = global_vars + local_vars
total = 0
mem = []
vars = []
for var, obj in all_vars:
if isinstance(obj, pd.DataFrame):
gso = obj.memory_usage().sum()
elif isinstance(obj, np.ndarray):
gso = obj.size
else:
gso = sys.getsizeof(obj)
total += gso
mem = mem + [gso]
vars = vars + [var]
print(var, gso)
df = pd.DataFrame({'obj': vars, 'size': mem}).sort_values('size', ascending=False)
print(df['size'].sum()/1000000)
which returns just over 20mb!
So based on the top response to this question, I tried using tracemalloc
:
shapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics('traceback')
sz = []
for stat in stats:
sz = sz + [stat.size]
print(sz/1000000)
which gives just under 100mb - so clearly there are some memory issues here. Other methods I have tried provide similar values. How can I find out where the rest of the memory usage is coming from?