Pandas: from adjacency matrix to series of node lists

140 Views Asked by HerrIvan At 23 November 2020 at 11:18

I have which I think is a pretty general problem. Namely, to recast a bipartite adjacency matrix in a list of a list of nodes. In Pandas, that would mean transform from a specific pd.DataFrame format to a specific pd.Series format.

For non discrete-math people, this looks like the following transformation:

From

df = pd.DataFrame(columns=['item1','item2','item3'],
                  index=['foo','bar','qux'], 
                  data = [[1,1,0],[0,1,1],[0,0,0]])

which looks like

    item1   item2   item3
foo     1       1       0
bar     0       1       1
qux     0       0       0

srs = pd.Series([['item1','item2'],['item2','item3'],[]],index=['foo','bar','qux'])

that looks like

foo    [item1, item2]
bar    [item2, item3]
qux                []
dtype: object

I have partially achieved this goal with the following code:

df_1 = df.stack().reset_index()

srs = df_1.loc[df_1[0]==1].groupby('level_0')['level_1'].apply(list)

which, together with being slightly unreadable, has the issue of having dropped poor qux along the way.

Is there any shorter path to the desired result?

Original Q&A

There are 2 best solutions below

jezrael On 23 November 2020 at 11:21 BEST ANSWER

If want avoid reshape by stack and groupby here is possible use list comprehension with convert 0,1 to boolean by DataFrame.astype and then filter columns names, last pass it to Series constructor:

print([list(df.columns[x]) for x in df.astype(bool).to_numpy()])
[['item1', 'item2'], ['item2', 'item3'], []]

s = pd.Series([list(df.columns[x]) for x in df.astype(bool).to_numpy()], index=df.index)
print(s)
foo    [item1, item2]
bar    [item2, item3]
qux                []
dtype: object

If also performance is important use:

c = df.columns.to_numpy()
s = pd.Series([list(c[x]) for x in df.astype(bool).to_numpy()], index=df.index)

Bill Huang On 23 November 2020 at 11:24

Applying straightforward list comprehension on each row (axis=1) can work. If there are no non-zero elements in the row, an empty list will be produced.

df.apply(lambda row: [df.columns[i] for i, el in enumerate(row) if el], axis=1)

Result

foo    [item1, item2]
bar    [item2, item3]
qux                []
dtype: object

Pandas: from adjacency matrix to series of node lists

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DISCRETE-MATHEMATICS

Related Questions in DATA-TRANSFORM

Trending Questions

Popular # Hahtags

Popular Questions