numpy: creating recarray fast with different column types

Question

numpy: creating recarray fast with different column types

851 Views Asked by K.Doe At 09 October 2018 at 17:24

I am trying to create a recarray from a series of numpy arrays with column names and mixed variable types.

The following works but is slow:

    import numpy as np
    a = np.array([1,2,3,4], dtype=np.int)
    b = np.array([6,6,6,6], dtype=np.int)
    c = np.array([-1.,-2.-1.,-1.], dtype=np.float32)
    d = np.array(list(zip(a,b,c,d)),dtype = [('a',np.int),('b',np.int),('c',np.float32)])
    d = d.view(np.recarray())

I think there should be a way to do this with np.stack((a,b,c), axis=-1), which is faster than the list(zip()) method. However, there does not seem to be a trivial way to do the stacking an preserving column types. This link does seem to show how to do it, but its pretty clunky and I hope there is a better way.

Thanks for the help!

Original Q&A

There are 2 best solutions below

**Paul Panzer** · Answer 1 · 2018-10-09T17:40:16.770000

np.rec.fromarrays is probably what you want:

>>> np.rec.fromarrays([a, b, c], names=['a', 'b', 'c'])
rec.array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
          dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])

**hpaulj** · Answer 2 · 2018-10-09T18:14:52.940000

Here's the field by field approach that I commented on:

In [308]:     a = np.array([1,2,3,4], dtype=np.int)
     ...:     b = np.array([6,6,6,6], dtype=np.int)
     ...:     c = np.array([-1.,-2.,-1.,-1.], dtype=np.float32)
     ...:     dt = np.dtype([('a',np.int),('b',np.int),('c',np.float32)])
     ...: 
     ...:

(I had to correct your copy-n-pasted c).

In [309]: arr = np.zeros(a.shape, dtype=dt)
In [310]: for name, x in zip(dt.names, [a,b,c]):
     ...:     arr[name] = x
     ...:     
In [311]: arr
Out[311]: 
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])

Since typically the array will have many more records (rows) than fields this should be faster than the list of tuples approach. In this case it probably is comprable in speed.

In [312]: np.array(list(zip(a,b,c)), dtype=dt)
Out[312]: 
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])

rec.fromarrays, after some setup to determine the dtype, does:

_array = recarray(shape, descr)
# populate the record array (makes a copy)
for i in range(len(arrayList)):
    _array[_names[i]] = arrayList[i]

The only way to use stack is to create recarrays first:

In [315]: [np.rec.fromarrays((i,j,k), dtype=dt) for i,j,k in zip(a,b,c)]
Out[315]: 
[rec.array((1, 6, -1.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
 rec.array((2, 6, -2.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
 rec.array((3, 6, -1.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
 rec.array((4, 6, -1.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])]
In [316]: np.stack(_)
Out[316]: 
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
      dtype=(numpy.record, [('a', '<i8'), ('b', '<i8'), ('c', '<f4')]))

numpy: creating recarray fast with different column types

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in RECARRAY

Trending Questions

Popular # Hahtags

Popular Questions