How to use upsetplot python for finding intersection in pandas dataframe

1.6k Views Asked by At

I'm trying to use upsetplot for finding the intersection between column data in a dataframe. I am using a code from the one provided by the developers of this library, like the following:

import upsetplot
from upsetplot import from_indicators, plot 
   
plot(from_indicators(indicators=pd.notna, data=data), show_counts=True)
plt.show()

So, this code above gave me a graph as an output with the counts of cell/pd_series in a df where is not empty (not a number). But I would like to have a code where instead of notna I could count the "core" items in all columns.

My code above would gave me from this dataframe (changed number to letters in this example):

-------column_1--column_2--column_3--column_4--column_5     
row_1--   A    --   A    --        --   A    --   
row_2--   B    --        --   B    --   B    --  
row_3--        --        --   C    --        --
row_4--   D    --   D    --        --   D    --
row_5--   E    --        --   E    --        --
row_6--        --        --        --        --   F

...a graph sort of like this:

column_1 :           **** (4 not_empty)
column_3, column_4 : *** (3 not_empty)
column_2 :           ** (2 not_empty)
column_5 :           * (1 not_empty)

But actually what I want is a graph with some information like this:

column_1, column_2, column_4 : ** (A, D in_common)
column_1, column_3, column_4 : * (B in_common)
column_1, column_3 :           * (E in_common)
column_5 :                     - (F not_in_common)

Does any of you have some idea on how to change the "pd.notna" for another piece of code that could deliver what I'm looking for? Thanks in advance!

1

There are 1 best solutions below

0
On

The UpSet plot shows both those graphs. The totals graph is the former, and the intersection/subset plot is the latter.

See https://gist.github.com/jnothman/0fc6daf3d9d75513dd3311e86e06cc8c