python - Initialize an empty array with named columns (and data types)

3.2k Views Asked by At

I have three constraints, and, as usual, I can only figure out how to satisfy any two of them at the same time:

  1. Multi-dimensional array
  2. Named columns
  3. Different columns include different data types (so everything in col 1 is a string, but col 2 is all Decimal, etc.)

I'm currently using numpy ndarrays to store my data with different types in each column. I've initialized the array so it can store multiple data types:

norm = numpy.empty((79, len(header)), dtype=numpy.object)

I've been using a header (a list of string names) as a proxy for column names (and then looking up the index of the values in the header) but this seems really cludgy.

I've looked around but as far as I can tell, when you initialize an array with column names (and types) you have to fill the array with values as you do so, as in: Store NumPy Row and Column Headers

Because when I try something like this:

n=numpy.empty((5,2), dtype=[("sub", "str"), ("words", Decimal)])
n[0] = ['06', Decimal(10)]

I get this error:

Traceback (most recent call last):
File "<string>", line 1, in <fragment>
ValueError: Setting void-array with object members using buffer.
1

There are 1 best solutions below

0
On

Try this:

>>> n = numpy.empty((5,2), dtype=[("sub", "S10"), ("words", Decimal)])
>>> n[0] = ('06', Decimal(10))
>>> print n
[[('06', Decimal('10')) ('06', Decimal('10'))]
 [('', None) ('', None)]
 [('', None) ('', None)]
 [('', None) ('', None)]
 [('', None) ('', None)]]

As @seberg mentioned, you want to set with a tuple not a list. Also you need to specify the length of the string to be stored in "sub". Numpy structured arrays do not store arbitrary length strings, you need to pick a max length. If you really cannot pick a max lenght, use object instead of str.