Python: Venn diagram from score data

118 Views Asked by At

I have the following data:

df =
id testA testB
1  3     NA
1  1     3
2  2     NA
2  NA    1
2  0     0
3  NA    NA
3  1     1

I would like to create a Venn diagram of the number of times that testA and testB appear, testA but not testB, and testB but not testA.

The expected outcome would be the following groups:

enter image description here

Both tests: 3
A but not B: 2
B but not A: 1
2

There are 2 best solutions below

0
Lukas S On BEST ANSWER

While I am not sure how you get to your index in the dataframe, or if you have another index. Also, I assumed NA to be np.nan.

In any case, you can try something like the following (but start where your df exists). First, I try to recreate your DataFrame. Then, i create two sets, namely setA and setB, which contain the indices of where the data is not nan. Finally, a Venn diagram is created, containing these two sets.

from matplotlib_venn import venn2
import pandas
import numpy as np

df = pandas.DataFrame()
df["testA"] = [3,1,2,np.nan,0,np.nan,1]
df["testB"] = [np.nan,3,np.nan,1,0,np.nan,1]

setA = set([index_ for index_ in df.index if not np.isnan(df["testA"].loc[index_])])
setB = set([index_ for index_ in df.index if not np.isnan(df["testB"].loc[index_])])
venn2([setA, setB])

You then get something like this.

0
Shu ba On

So you'd probably want to use the speed of pandas for that, and not cast to python object. In pandas we can give filter conditions, so copying and modifying @Lukas answer -

import pandas
import numpy as np

df = pandas.DataFrame()
df["testA"] = [3,1,2,np.nan,0,np.nan,1]
df["testB"] = [np.nan,3,np.nan,1,0,np.nan,1]

setA = set([index_ for index_ in df.index if not np.isnan(df["testA"].loc[index_])])
setB = set([index_ for index_ in df.index if not np.isnan(df["testB"].loc[index_])])

both = df[df["testB"].notnull()| df["testA"].notnull()]
either = df[df["testB"].notnull()& df["testA"].notnull()]
right_side = df[df["testB"].notnull()]

Each give you part of the van diagram. Then you can len(var) to see how many results it gave you.