Inconsistent behavior of any(df == value) on pandas dataframe

883 Views Asked by At

I have two dataframes df1, df2 as follows

>>> df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]]) 
>>> df2 = pd.DataFrame([1,2,3,4,5,6,7,8]) 
>>> df1
   0  1
0  1  2
1  3  4
2  5  6
3  7  8
>>> df2 
   0
0  1
1  2
2  3
3  4
4  5
5  6
6  7
7  8

When trying to check if 1 is in df1, it yields True as expected.

>>> any(df1 == 1) 
True

However, when trying the same on df2, I get, unexpectedly, False

>>> any(df2 == 1)
False

Despite that from a boolean perspective everything seems right.

>>> df1 == 1
       0      1
0   True  False
1  False  False
2  False  False
3  False  False
>>> df2 == 1
       0
0   True
1  False
2  False
3  False
4  False
5  False
6  False
7  False
>>> 

Any ideas on why is that?

PS: I am not asking about the built in any function in pandas. I am just puzzled with the behavior of any.

3

There are 3 best solutions below

2
On BEST ANSWER

You need to use pandas built in any instead of any from base Python:

df1.eq(1).any().any()
# True

df2.eq(1).any().any()
# True

When using any from python, it treats the data frame as an iterable/dictionary and thus only check the column names, without looking at the values of the data frame; If you simply loop through df1 and df2, you can see it only returns the column names, which is how a dictionary behaves; Since df1 contains column names of 0 and 1, any([0,1]) will return True; df2, on the other hand, contains only one column of [0], any([0]) returns False. So any(df == 1) is somewhat equivalent to any(df) or any(df.columns):

[x for x in df1]
# [0, 1]

[x for x in df2]
# [0]
1
On

You need to use (df2 == 1).any() instead

1
On

In pandas better use DataFrame.any.

Numpy solutions:

print ((df1 == 1).values.any())
True
print ((df2 == 1).values.any())
True