Using df.loc[] vs df[] shorthand with boolean masks, pandas

175 Views Asked by At

Both df[booleanMask] and df.loc[booleanMask] are working for me but I don't understand why. The shorthand df[] without using .loc I thought applied to the column whereas I am trying to apply to the row, so I thought I needed to use .loc

Here is the specific code:

# Boolean operators
# All the games where a team scored at least 4 goals and won to nil
hw_4_0 = (pl23['FTHG'] >= 4) & (pl23['FTAG'] == 0)
aw_0_4 = (pl23['FTHG'] == 0) & (pl23['FTAG'] >= 4)
pl23.loc[aw_0_4 | hw_4_0]

For example, pl23.loc[aw_0_4 | hw_4_0, :] also works, but pl23.loc[:, aw_0_4 | hw_4_0] doesn't. I thought that df[boolean mask] was short hand for the latter (as with indexing), so why does it work in this instance?

Used pl23.loc[aw_0_4 | hw_4_0] which returned the data frame the query was designed for, whereas I was expecting IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

1

There are 1 best solutions below

0
On BEST ANSWER

df[…] vs df.loc[…] applies on columns vs index, when you use labels.

If you pass a boolean Series (or other iterable) for boolean indexing, then they both act on the index level. To perform boolean indexing on columns, you need df.loc[:, …]

Example:

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})

# select "col1" in the columns
df['col1']

# select "0" in the index
df.loc[0]


# boolean indexing on the index
df[df['col1'].ge(2)]
# or
df.loc[df['col1'].ge(2)]
# or
df[[False, True, True]]
# or
df.loc[[False, True, True]]


# boolean indexing on the columns
df.loc[:, df.loc[0].ge(2)]
# or
df.loc[:, [False, True]]