How can I get the number of indexed elements of array of known shape without actually indexing the array?

241 Views Asked by At

I have an index IDX (which may be either list of indices, boolean mask, tuple of slices etc.) indexing some abstract numpy array of known shape shape (possibly big).

I know I can create a dummy array, index it and count the elements:

A = np.zeros(shape)
print(A[IDX].size)

Is there any sensible way I can get the number of indexed elements without creating any (potentially big) array?

I need to tabularize a list of functions at certain points in 3D space. The points are subset of a rectangular grid given as X, Y, Z lists and IDX is indexing their Cartesian product:

XX, YY, ZZ = [A[IDX] for A in np.meshgrid(X, Y, Z)]

The functions accept either X, Y, Z arguments (and return values for their Cartesian product which needs to be indexed) or XX, YY, ZZ. At the moment I create XX, YY and ZZ arrays whether they are used or not, then I allocate an array for function values:

self.TAB = np.full((len(functions), XX.size),
                   np.nan)

but I want to create XX, YY and ZZ only if they are necessary. I also want to separate TAB allocation from filling its rows, thus I need to know the number of columns in advance.

1

There are 1 best solutions below

6
On

Just for fun, let's see if we can make a passable approximation here. Your input can be any of the following:

  • slice
  • array-like (including scalars)
    • integer arrays do fancy indexing
    • boolean arrays do masking
  • tuple

If the input isn't explicitly a tuple to begin with, make it one. Now you can iterate along the tuple and match it to the shape. You can't quite zip them together because boolean arrays eat up multiple element of the shape, and trailing axes are included wholesale.

Something like this should do it:

def pint(x):
    """ Mimic numpy errors """
    if isinstance(x, bool):
        raise TypeError('an integer is required')
    try:
        y = int(x)
    except TypeError:
        raise TypeError('an integer is required')
    else:
        if y < 0:
            raise ValueError('negative dimensions are not allowed')
    return y


def estimate_size(shape, index):
    # Ensure input is a tuple
    if not isinstance(index, tuple):
        index = (index,)

    # Clean out Nones: they don't change size
    index = tuple(i for i in index if i is not None)

    # Check shape shape and type
    try:
        shape = tuple(shape)
    except TypeError:
        shape = (shape,)
    shape = tuple(pint(s) for s in shape)

    size = 1

    # Check for scalars
    if not shape:
        if index:
            raise IndexError('too many indices for array')
        return size

    # Process index dimensions
    # you could probably use iter(shape) instead of shape[s]
    s = 0

    # fancy indices need to be gathered together and processed as one
    fancy = []

    def get(n):
        nonlocal s
        s += n
        if s > len(shape):
            raise IndexError('too many indices for array')
        return shape[s - n:s]

    for ind in index:
        if isinstance(ind, slice):
            ax, = get(1)
            size *= len(range(*ind.indices(ax)))
        else:
            ind = np.array(ind, ndmin=1, subok=True, copy=False)
            if ind.dtype == np.bool_:
                # Boolean masking
                ax = get(ind.ndim)
                if ind.shape != ax:
                    k = np.not_equal(ind.shape, ax).argmax()
                    IndexError(f'IndexError: boolean index did not match indexed array along dimension {s - n.ndim + k}; dimension is {shape[s - n.ndim + k]} but corresponding boolean dimension is {ind.shape[k]}')
                size *= np.count_nonzero(ind)
            elif np.issubdtype(ind.dtype, np.integer):
                # Fancy indexing
                ax, = get(1)
                if ind.min() < -ax or ind.max() >= ax:
                    k = ind.min() if ind.min() < -ax else ind.max()
                    raise IndexError(f'index {k} is out of bounds for axis {s} with size {ax}')
                fancy.append(ind)
            else:
                raise IndexError('arrays used as indices must be of integer (or boolean) type')

    # Add in trailing dimensions
    size *= np.prod(shape[s:])

    # Add fancy indices
    if fancy:
        size *= np.broadcast(*fancy).size

    return size

This is only an approximation. You will need to change it any time the API changes, and it already has some incomplete features. Testing, fixing and, expanding is left as an exercise for the reader.