Referring to a field in a numpy structured array is the same size as the entire array

61 Views Asked by At

Issue

I have a numpy structured array and I want to take two small fields. When I do, I get an item that is as large as the original

Example

>>> A = np.zeros(100, dtype=[('f',float),('x',float,2),('large',float,500000)])
>>> A.itemsize
4000024
>>> A['f'].itemsize
8
>>> A['x'].itemsize
8
>>> A[['x','f']].itemsize
4000024
>>> A[['x']].itemsize
4000024

Question

Why does taking a slice of fields in a numpy array produce an array that is as large as the original? (I'm using python3.8 and numpy version 1.18.3)

2

There are 2 best solutions below

1
On BEST ANSWER

The numpy function that is needed is repack_fields. The example then becomes:

>>> from numpy.lib.recfunctions import repack_fields
>>> A = np.zeros(100, dtype=[('f',float),('x',float,2),('large',float,500000)])
>>> A[['x']].itemsize
4000024
>>> repack_fields(A[['x']]).itemsize
16

Note that repacking the fields of A will necessarily use more memory. This may be desired, for example when using mpi4py to communicate A[['x']] between ranks (and all of A is too large to communicate).

0
On

Make an array that's small enough to actually display:

In [151]: A = np.zeros(3, dtype=[('f',float),('x',float,2),('large',float,10)])
In [152]: A
Out[152]: 
array([(0., [0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
       (0., [0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
       (0., [0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])],
      dtype=[('f', '<f8'), ('x', '<f8', (2,)), ('large', '<f8', (10,))])

Select one field:

In [153]: A['f']
Out[153]: array([0., 0., 0.])

Select a list of fields:

In [154]: A[['f']]
Out[154]: 
array([(0.,), (0.,), (0.,)],
      dtype={'names':['f'], 'formats':['<f8'], 'offsets':[0], 'itemsize':104})

As of something like version 1.17, indexing with a list of fields returns a view. Thus itemsize is the same as the original.

In [155]: A.itemsize
Out[155]: 104
In [156]: A[['x']].itemsize
Out[156]: 104

The difference between indexing with a list versus a field name may be clearer when looking at the last field. One is still a structured array, the other is a 2d array.

In [159]: A[['large']].dtype
Out[159]: dtype({'names':['large'], 'formats':[('<f8', (10,))], 'offsets':[24], 'itemsize':104})

In [160]: A[['large']].shape
Out[160]: (3,)
In [161]: A['large'].shape
Out[161]: (3, 10)

https://numpy.org/doc/stable/user/basics.rec.html#accessing-multiple-fields

Note that unlike for single-field indexing, the dtype of the view has the same itemsize as the original array, and has fields at the same offsets as in the original array, and unindexed fields are merely missing.