I have which I think is a pretty general problem. Namely, to recast a bipartite adjacency matrix in a list of a list of nodes. In Pandas, that would mean transform from a specific pd.DataFrame format to a specific pd.Series format.
For non discrete-math people, this looks like the following transformation:
From
df = pd.DataFrame(columns=['item1','item2','item3'],
index=['foo','bar','qux'],
data = [[1,1,0],[0,1,1],[0,0,0]])
which looks like
item1 item2 item3
foo 1 1 0
bar 0 1 1
qux 0 0 0
To
srs = pd.Series([['item1','item2'],['item2','item3'],[]],index=['foo','bar','qux'])
that looks like
foo [item1, item2]
bar [item2, item3]
qux []
dtype: object
I have partially achieved this goal with the following code:
df_1 = df.stack().reset_index()
srs = df_1.loc[df_1[0]==1].groupby('level_0')['level_1'].apply(list)
which, together with being slightly unreadable, has the issue of having dropped poor qux along the way.
Is there any shorter path to the desired result?
If want avoid reshape by
stackandgroupbyhere is possible use list comprehension with convert0,1to boolean byDataFrame.astypeand then filter columns names, last pass it toSeriesconstructor:If also performance is important use: