For the 2D array y:
y = np.arange(20).reshape(5,4)
---
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
All indexing select 1st, 3rd, and 5th rows. This is clear.
print(y[
[0, 2, 4],
::
])
print(y[
[0, 2, 4],
::
])
print(y[
[True, False, True, False, True],
::
])
---
[[ 0 1 2 3]
[ 8 9 10 11]
[16 17 18 19]]
Questions
Please help understand what rules or mechanism are working to produce the results.
Replacing []
with tuple produces an empty array with shape (0, 5, 4).
y[
(True, False, True, False, True)
]
---
array([], shape=(0, 5, 4), dtype=int64)
Use single True
adds a new axis.
y[True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True].shape
---
(1, 5, 4)
Adding additional boolean True produces the same.
y[True, True]
---
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]]])
y[True, True].shape
---
(1, 5, 4)
However, adding False boolean causes the empty array again.
y[True, False]
---
array([], shape=(0, 5, 4), dtype=int64)
Not sure the documentation explains this behavior.
In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].
If there is only one Boolean array and no integer indexing array present, this is straight forward. Care must only be taken to make sure that the boolean index has exactly as many dimensions as it is supposed to work with.
Boolean scalar indexing is not well-documented, but you can trace how it is handled in the source code. See for example this comment and associated code in the numpy source:
So if an index is a scalar boolean, a new axis is added. If the value is
True
the size of the axis is 1, and if the value is False, the size of the axis is zero.This behavior was introduced in numpy#3798, and the author outlines the motivation in this comment; roughly, the aim was to provide consistency in the output of filtering operations. For example:
The interesting thing is that any subsequent scalar booleans after the first do not add additional dimensions! From an implementation standpoint, this seems to be due to consecutive 0D boolean indices being treated as equivalent to consecutive fancy indices (i.e.
HAS_0D_BOOL
is treated asHAS_FANCY
in some cases) and thus are combined in the same way as fancy indices. From a logical standpoint, this corner-case behavior does not appear to be intentional: for example, I can't find any discussion of it in numpy#3798.Given that, I would recommend considering this behavior poorly-defined, and avoid it in favor of well-documented indexing approaches.