Translate accumarray (Matlab) to python

200 Views Asked by At

I have to translate a Matlab script to python, it transforms some complicated data into an array. And I don't know how to translate this part of the code:

accumarray([j2,i2],iq,[],[],NaN)

That is in Matlab, the shapes of j2, i2 and iq are (1362730 x 1). But the shape of [j2, i2] would be (1362730 x 2).

I found this function in python to use accumarray:

def accum(accmap, a, func=None, size=None, fill_value=0, dtype=None):
"""
An accumulation function similar to Matlab's `accumarray` function.

Parameters
----------
accmap : ndarray
    This is the "accumulation map".  It maps input (i.e. indices into
    `a`) to their destination in the output array.  The first `a.ndim`
    dimensions of `accmap` must be the same as `a.shape`.  That is,
    `accmap.shape[:a.ndim]` must equal `a.shape`.  For example, if `a`
    has shape (15,4), then `accmap.shape[:2]` must equal (15,4).  In this
    case `accmap[i,j]` gives the index into the output array where
    element (i,j) of `a` is to be accumulated.  If the output is, say,
    a 2D, then `accmap` must have shape (15,4,2).  The value in the
    last dimension give indices into the output array. If the output is
    1D, then the shape of `accmap` can be either (15,4) or (15,4,1) 
a : ndarray
    The input data to be accumulated.
func : callable or None
    The accumulation function.  The function will be passed a list
    of values from `a` to be accumulated.
    If None, numpy.sum is assumed.
size : ndarray or None
    The size of the output array.  If None, the size will be determined
    from `accmap`.
fill_value : scalar
    The default value for elements of the output array. 
dtype : numpy data type, or None
    The data type of the output array.  If None, the data type of
    `a` is used.

Returns
-------
out : ndarray
    The accumulated results.

    The shape of `out` is `size` if `size` is given.  Otherwise the
    shape is determined by the (lexicographically) largest indices of
    the output found in `accmap`.


Examples
--------
>>> from numpy import array, prod
>>> a = array([[1,2,3],[4,-1,6],[-1,8,9]])
>>> a
array([[ 1,  2,  3],
       [ 4, -1,  6],
       [-1,  8,  9]])
>>> # Sum the diagonals.
>>> accmap = array([[0,1,2],[2,0,1],[1,2,0]])
>>> s = accum(accmap, a)
array([9, 7, 15])
>>> # A 2D output, from sub-arrays with shapes and positions like this:
>>> # [ (2,2) (2,1)]
>>> # [ (1,2) (1,1)]
>>> accmap = array([
        [[0,0],[0,0],[0,1]],
        [[0,0],[0,0],[0,1]],
        [[1,0],[1,0],[1,1]],
    ])
>>> # Accumulate using a product.
>>> accum(accmap, a, func=prod, dtype=float)
array([[ -8.,  18.],
       [ -8.,   9.]])
>>> # Same accmap, but create an array of lists of values.
>>> accum(accmap, a, func=lambda x: x, dtype='O')
array([[[1, 2, 4, -1], [3, 6]],
       [[-1, 8], [9]]], dtype=object)
"""

# Check for bad arguments and handle the defaults.
if accmap.shape[:a.ndim] != a.shape:
    raise ValueError("The initial dimensions of accmap must be the same as a.shape")
if func is None:
    func = np.sum
if dtype is None:
    dtype = a.dtype
if accmap.shape == a.shape:
    accmap = np.expand_dims(accmap, -1)
adims = tuple(range(a.ndim))
if size is None:
    size = 1 + np.squeeze(np.apply_over_axes(np.max, accmap, axes=adims))
size = np.atleast_1d(size)

# Create an array of python lists of values.
vals = np.empty(size, dtype='O')
for s in product(*[range(k) for k in size]):
    vals[s] = []
for s in product(*[range(k) for k in a.shape]):
    indx = tuple(accmap[s])
    val = a[s]
    vals[indx].append(val)

# Create the output array.
out = np.empty(size, dtype=dtype)
for s in product(*[range(k) for k in size]):
    if vals[s] == []:
        out[s] = fill_value
    else:
        out[s] = func(vals[s])

return out

But it doesnt work when the shapes of accmap and a are different, which is the case because my accmap would be [j2, i2] with shape (1362730 x 2) and a would be iq with shape (1362730 x 1). I don't quite understand what does Matlab do when the inputs are of different sizes. Is there a way to modify the python function to be able to do that, or just another way to translate that line to python?

1

There are 1 best solutions below

0
On

I had a project in Matlab where I used accumarray(). I recently ported it to Python using numpy.histogramdd() as its closest replacement.