Suppose I have the following numpy structured array:
In [250]: x
Out[250]:
array([(22, 2, -1000000000, 2000), (22, 2, 400, 2000),
(22, 2, 804846, 2000), (44, 2, 800, 4000), (55, 5, 900, 5000),
(55, 5, 1000, 5000), (55, 5, 8900, 5000), (55, 5, 11400, 5000),
(33, 3, 14500, 3000), (33, 3, 40550, 3000), (33, 3, 40990, 3000),
(33, 3, 44400, 3000)],
dtype=[('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<i4')])
I am trying to modify a subset of the above array to a regular numpy array. It is essential for my application that no copies are created (only views).
Fields are retrieved from the above structured array by using the following function:
def fields_view(array, fields):
return array.getfield(numpy.dtype(
{name: array.dtype.fields[name] for name in fields}
))
If I am interested in fields 'f2' and 'f3', I would do the following:
In [251]: y=fields_view(x,['f2','f3'])
In [252]: y
Out [252]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
There is a way to directly get an ndarray from the 'f2' and 'f3' fields of the original structured array. However, for my application, it is necessary to build this intermediary structured array as this data subset is an attribute of a class.
I can't convert the intermediary structured array to a regular numpy array without doing a copy.
In [253]: y.view(('<f4', len(y.dtype.names)))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-f8fc3a40fd1b> in <module>()
----> 1 y.view(('<f4', len(y.dtype.names)))
ValueError: new type not compatible with array.
This function can also be used to convert a record array to an ndarray:
def recarr_to_ndarr(x,typ):
fields = x.dtype.names
shape = x.shape + (len(fields),)
offsets = [x.dtype.fields[name][1] for name in fields]
assert not any(np.diff(offsets, n=2))
strides = x.strides + (offsets[1] - offsets[0],)
y = np.ndarray(shape=shape, dtype=typ, buffer=x,
offset=offsets[0], strides=strides)
return y
However, I get the following error:
In [254]: recarr_to_ndarr(y,'<f4')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-65-2ebda2a39e9f> in <module>()
----> 1 recarr_to_ndarr(y,'<f4')
<ipython-input-62-8a9eea8e7512> in recarr_to_ndarr(x, typ)
8 strides = x.strides + (offsets[1] - offsets[0],)
9 y = np.ndarray(shape=shape, dtype=typ, buffer=x,
---> 10 offset=offsets[0], strides=strides)
11 return y
12
TypeError: expected a single-segment buffer object
The function works fine if I create a copy:
In [255]: recarr_to_ndarr(np.array(y),'<f4')
Out[255]:
array([[ 2.00000000e+00, -1.00000000e+09],
[ 2.00000000e+00, 4.00000000e+02],
[ 2.00000000e+00, 8.04846000e+05],
[ 2.00000000e+00, 8.00000000e+02],
[ 5.00000000e+00, 9.00000000e+02],
[ 5.00000000e+00, 1.00000000e+03],
[ 5.00000000e+00, 8.90000000e+03],
[ 5.00000000e+00, 1.14000000e+04],
[ 3.00000000e+00, 1.45000000e+04],
[ 3.00000000e+00, 4.05500000e+04],
[ 3.00000000e+00, 4.09900000e+04],
[ 3.00000000e+00, 4.44000000e+04]], dtype=float32)
There seems to be no difference between the two arrays:
In [66]: y
Out[66]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
In [67]: np.array(y)
Out[67]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
This answer is a bit long and rambling. I started with what I knew from previous work on taking array views, and then tried to relate that to your functions.
================
In your case, all fields are 4 bytes long, both floats and ints. I can then view it as all ints or all floats:
This view is 1d. I can reshape and slice the 2 float columns
That this is a view can be verified by doing a bit of inplace math, and seeing the results in
x:If the field sizes differed this might be impossible. For example if the floats were 8 bytes. The key is picturing how the structured data is stored, and imagining whether that can be viewed as a simple dtype of multiple columns. And field choice has to be equivalent to a basic slice. Working with ['f1','f4'] would be equivalent to advanced indexing with [:,[0,3], which has to be a copy.
==========
The 'direct' field indexing is:
modifies
zbut with afuturewarning. Also it does not modifyx;zhas become a copy. I can also see this by looking atz.__array_interface__['data'], the data buffer location (and comparing with that ofxandy).=================
Your
fields_viewdoes create a structured view:which can be used to modify
x,w['f2'] -= .5. So it is more versatile than the 'direct'x[['f2','f3']].The
wdtype isAdding
print(shape, typ, offsets, strides)to yourrecarr_to_ndarr, I get (py3)That
contiguousproblem must be refering to the values shown inw.flags:It's interesting that
w.dtype.descrconverts the 'offsets' into a unnamed field:One way or other,
whas a non-contiguous data buffer, which can't be used to create a new array. Flattened, the data buffer looks something likeThe
yI constructed above has:So it accesses the
obytes with a 4 byte offset, and then (16,4) strides, and (12,2) shape.If I modify your
ndarraycall to use the originalx.data, it works:with the same array_interface as my
y: