I have two dataframes df1, df2 as follows
>>> df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]])
>>> df2 = pd.DataFrame([1,2,3,4,5,6,7,8])
>>> df1
0 1
0 1 2
1 3 4
2 5 6
3 7 8
>>> df2
0
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
When trying to check if 1 is in df1, it yields True as expected.
>>> any(df1 == 1)
True
However, when trying the same on df2, I get, unexpectedly, False
>>> any(df2 == 1)
False
Despite that from a boolean perspective everything seems right.
>>> df1 == 1
0 1
0 True False
1 False False
2 False False
3 False False
>>> df2 == 1
0
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
>>>
Any ideas on why is that?
PS: I am not asking about the built in any function in pandas. I am just puzzled with the behavior of any.
You need to use pandas built in
anyinstead ofanyfrom base Python:When using
anyfrom python, it treats the data frame as an iterable/dictionary and thus only check the column names, without looking at the values of the data frame; If you simply loop throughdf1anddf2, you can see it only returns the column names, which is how a dictionary behaves; Sincedf1contains column names of0and1,any([0,1])will returnTrue;df2, on the other hand, contains only one column of[0],any([0])returnsFalse. Soany(df == 1)is somewhat equivalent toany(df)orany(df.columns):