The following code should do what I want but it takes 10gb of ram by the time it is 20% done with the loop.
# In [4]: type(pd)
# Out[4]: pandas.sparse.frame.SparseDataFrame
memid = unique(pd.Member)
pan = {}
for mem in memid:
pan[mem] = pd[pd.Member==mem]
goal = pandas.Panel(pan)
I created a GitHub issue here.
https://github.com/wesm/pandas/issues/663
I'm pretty sure I identified a circular reference between NumPy ndarray views causing a memory leak. Just committed a fix:
https://github.com/wesm/pandas/commit/4c3916310a86c3e4dab6d30858a984a6f4a64103
Can you install from source and let me know if that fixes your problem?
BTW you might try using SparsePanel instead of Panel because Panel will convert all of the sub-DataFrames to dense form.
Lastly, you might consider using groupby as an alternative to the
O(N * M)
chopping-up of the SparseDataFrame. It's even shorter:pan = dict(pd.groupby('Member'))