Explanation of boolean indexing behaviors

Question

Explanation of boolean indexing behaviors

308 Views Asked by mon At 06 January 2021 at 04:49

For the 2D array y:

y = np.arange(20).reshape(5,4)
---
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]

All indexing select 1st, 3rd, and 5th rows. This is clear.

print(y[
    [0, 2, 4],
    ::
])
print(y[
    [0, 2, 4],
    ::
])
print(y[
    [True, False, True, False, True],
    ::
])
---
[[ 0  1  2  3]
 [ 8  9 10 11]
 [16 17 18 19]]

Questions

Please help understand what rules or mechanism are working to produce the results.

Replacing [] with tuple produces an empty array with shape (0, 5, 4).

y[
    (True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)

Use single True adds a new axis.

y[True]
---
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]]])


y[True].shape
---
(1, 5, 4)

Adding additional boolean True produces the same.

y[True, True]
---
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]]])

y[True, True].shape
---
(1, 5, 4)

However, adding False boolean causes the empty array again.

y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)

Not sure the documentation explains this behavior.

Boolean array indexing

In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].

If there is only one Boolean array and no integer indexing array present, this is straight forward. Care must only be taken to make sure that the boolean index has exactly as many dimensions as it is supposed to work with.

Original Q&A

There are 1 best solutions below

**jakevdp** · Accepted Answer · 2021-01-06T15:09:47.290000

Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:

/*
* This can actually be well defined. A new axis is added,
* but at the same time no axis is "used". So if we have True,
* we add a new axis (a bit like with np.newaxis). If it is
* False, we add a new axis, but this axis has 0 entries.
*/

So if an index is a scalar boolean, a new axis is added. If the value is True the size of the axis is 1, and if the value is False, the size of the axis is zero.

This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:

x = np.ones((2, 2))
assert x[x > 0].ndim == 1

x = np.ones(2)
assert x[x > 0].ndim == 1

x = np.ones(())
assert x[x > 0].ndim == 1  # scalar boolean here!

The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e. HAS_0D_BOOL is treated as HAS_FANCY in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.

Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.

Explanation of boolean indexing behaviors

Questions

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in BOOLEAN-INDEXING

Trending Questions

Popular # Hahtags

Popular Questions