Python: Venn diagram from score data

104 Views Asked by At

I have the following data:

df =
id testA testB
1  3     NA
1  1     3
2  2     NA
2  NA    1
2  0     0
3  NA    NA
3  1     1

I would like to create a Venn diagram of the number of times that testA and testB appear, testA but not testB, and testB but not testA.

The expected outcome would be the following groups:

enter image description here

Both tests: 3
A but not B: 2
B but not A: 1
2

There are 2 best solutions below

0
On BEST ANSWER

While I am not sure how you get to your index in the dataframe, or if you have another index. Also, I assumed NA to be np.nan.

In any case, you can try something like the following (but start where your df exists). First, I try to recreate your DataFrame. Then, i create two sets, namely setA and setB, which contain the indices of where the data is not nan. Finally, a Venn diagram is created, containing these two sets.

from matplotlib_venn import venn2
import pandas
import numpy as np

df = pandas.DataFrame()
df["testA"] = [3,1,2,np.nan,0,np.nan,1]
df["testB"] = [np.nan,3,np.nan,1,0,np.nan,1]

setA = set([index_ for index_ in df.index if not np.isnan(df["testA"].loc[index_])])
setB = set([index_ for index_ in df.index if not np.isnan(df["testB"].loc[index_])])
venn2([setA, setB])

You then get something like this.

0
On

So you'd probably want to use the speed of pandas for that, and not cast to python object. In pandas we can give filter conditions, so copying and modifying @Lukas answer -

import pandas
import numpy as np

df = pandas.DataFrame()
df["testA"] = [3,1,2,np.nan,0,np.nan,1]
df["testB"] = [np.nan,3,np.nan,1,0,np.nan,1]

setA = set([index_ for index_ in df.index if not np.isnan(df["testA"].loc[index_])])
setB = set([index_ for index_ in df.index if not np.isnan(df["testB"].loc[index_])])

both = df[df["testB"].notnull()| df["testA"].notnull()]
either = df[df["testB"].notnull()& df["testA"].notnull()]
right_side = df[df["testB"].notnull()]

Each give you part of the van diagram. Then you can len(var) to see how many results it gave you.