I have two dataframes df1
, df2
as follows
>>> df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]])
>>> df2 = pd.DataFrame([1,2,3,4,5,6,7,8])
>>> df1
0 1
0 1 2
1 3 4
2 5 6
3 7 8
>>> df2
0
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
When trying to check if 1
is in df1
, it yields True as expected.
>>> any(df1 == 1)
True
However, when trying the same on df2
, I get, unexpectedly, False
>>> any(df2 == 1)
False
Despite that from a boolean perspective everything seems right.
>>> df1 == 1
0 1
0 True False
1 False False
2 False False
3 False False
>>> df2 == 1
0
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
>>>
Any ideas on why is that?
PS: I am not asking about the built in any function in pandas. I am just puzzled with the behavior of any.
You need to use pandas built in
any
instead ofany
from base Python:When using
any
from python, it treats the data frame as an iterable/dictionary and thus only check the column names, without looking at the values of the data frame; If you simply loop throughdf1
anddf2
, you can see it only returns the column names, which is how a dictionary behaves; Sincedf1
contains column names of0
and1
,any([0,1])
will returnTrue
;df2
, on the other hand, contains only one column of[0]
,any([0])
returnsFalse
. Soany(df == 1)
is somewhat equivalent toany(df)
orany(df.columns)
: