Is it possible to create a signature for a numpy
ufunc
that returns an 1d array of unknown length?
I have a function that takes in one array x
of length (n)
and an array of labels y
of length (m)
, performs a reduction and returns the array out
of unknown size.
The function itself will be wrapped with numba.guvectorize
:
@guvectorize([(int16[:], float64[:], int32[:], int16[:])], "(n),(m) -> (l)", nopython=True)
def fun(x, y, out):
#perform stuff
# ...
# no return in guvectorize
This returns the following error:
NameError: undefined output symbols: l
My solution would be to pass in a template array of length l
, but it wouldn't be used for any calculation, so I'd like to avoid it.
Any ways around this, or is passing in a template the best (and maybe not so bad) solution?
Edit:
Some valid points made in the comments I want to address:
The function is supposed to be applied on an array with dimensions (x, y, z)
along the z
dimension, which has length n
.
The intended purpose of the function is to take each 1d array along z
[z,z,z,z,z,z,z,z,...,z]
and expand it to length m
[z1,z1,z1,z2,z2,z2,z2,z3,...,zz]
and finally the corresponding values are averaged
[z1,z2,z3,z4,z5,z6,z7,z8,...,zz]
resulting in an array with length l
.
I know beforehand exactly what the sizes m
, n
and l
will be - I just need to "tell" it to the function. This is why I also don't expect any jagged outputs.
The fastest way to apply this to a big 3d array using xarray
is with guvectorize
. But this results in the issue above.
A working solution is to pass in a template of length l
.
For comparison, I've created a @njit
wrapped function, that manually loops over the first two dimension, applying the same functionality.
Unfortunately this is still about 4 times slower than using guvectorize
., so I'd like to use guvectorize
for this application.