define a numpy signature for output array of unknown length (for use in numba.guvectorize)

408 Views Asked by At

Is it possible to create a signature for a numpy ufunc that returns an 1d array of unknown length?

I have a function that takes in one array x of length (n) and an array of labels y of length (m), performs a reduction and returns the array out of unknown size.

The function itself will be wrapped with numba.guvectorize:

@guvectorize([(int16[:], float64[:], int32[:], int16[:])], "(n),(m) -> (l)", nopython=True)
def fun(x, y, out):
    #perform stuff
    # ...
    # no return in guvectorize

This returns the following error:

NameError: undefined output symbols: l

My solution would be to pass in a template array of length l, but it wouldn't be used for any calculation, so I'd like to avoid it.

Any ways around this, or is passing in a template the best (and maybe not so bad) solution?

Edit:

Some valid points made in the comments I want to address:

The function is supposed to be applied on an array with dimensions (x, y, z) along the z dimension, which has length n.

The intended purpose of the function is to take each 1d array along z

[z,z,z,z,z,z,z,z,...,z]

and expand it to length m

[z1,z1,z1,z2,z2,z2,z2,z3,...,zz]

and finally the corresponding values are averaged

[z1,z2,z3,z4,z5,z6,z7,z8,...,zz]

resulting in an array with length l.

I know beforehand exactly what the sizes m, n and l will be - I just need to "tell" it to the function. This is why I also don't expect any jagged outputs.

The fastest way to apply this to a big 3d array using xarray is with guvectorize. But this results in the issue above.

A working solution is to pass in a template of length l.

For comparison, I've created a @njit wrapped function, that manually loops over the first two dimension, applying the same functionality.

Unfortunately this is still about 4 times slower than using guvectorize., so I'd like to use guvectorize for this application.

0

There are 0 best solutions below