How to write to single field in structured array without getting warning

126 Views Asked by At

I am trying to normalize all data contained in different fields of my structured array if the field contains floats. However, even though I am looping through each field one-by-one I am receiving a warning.

for idt, dt in enumerate(data.dtype.names):
    if "float32" in data.dtype[idt].name:
        stds = np.std(data[dt])
        means = np.mean(data[dt])
        data[dt] = (data[dt] - means) / stds

After executing the last line this pops up:

FutureWarning: Numpy has detected that you (may be) writing to an array returned by numpy.diagonal or by selecting multiple fields in a structured array. This code will likely break in a future numpy release -- see numpy.diagonal or arrays.indexing reference docs for details. The quick fix is to make an explicit copy (e.g., do arr.diagonal().copy() or arr[['f0','f1']].copy()). data[dt] = (data[dt] - means) / stds

I can run it line by line in a debugger to make sure everything is as expected, e.g.:

In[]: data.dtype
Out[]: dtype([('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<i4')])
In[]: dt
Out[]: 'a'
In[]: data[dt].shape
Out[]: (2000, 8)

Following the suggestion in the warning message, copying the array works:

data2 = data.copy()
for idt, dt in enumerate(data2.dtype.names):
    if "float32" in data2.dtype[idt].name:
        stds = np.std(data2[dt])
        means = np.mean(data2[dt])
        data2[dt] = (data2[dt] - means) / stds
data = data2

What would be a more elegant way to get rid of the warning? And what did the copy change in this case?

1

There are 1 best solutions below

1
On BEST ANSWER
def foo(data):
    for idt, dt in enumerate(data.dtype.names):
        if "float32" in data.dtype[idt].name:
            data[dt] = data[dt] + idt

In [23]: dt = np.dtype([('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<i4')])
In [24]: data = np.ones((3,), dtype=dt)
In [25]: foo(data)
In [26]: data
Out[26]: 
array([( 1.,  2.,  3., 1), ( 1.,  2.,  3., 1), ( 1.,  2.,  3., 1)],
      dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<i4')])

This works without a warning. But if it try to use a multifield selection of data I get the warning:

In [27]: data1 = data[['a','d']]
In [28]: foo(data1)
/usr/local/bin/ipython3:4: FutureWarning: Numpy has detected that you (may be) writing to an array returned
by numpy.diagonal or by selecting multiple fields in a structured
array. This code will likely break in a future numpy release --
see numpy.diagonal or arrays.indexing reference docs for details.
The quick fix is to make an explicit copy (e.g., do
arr.diagonal().copy() or arr[['f0','f1']].copy()).
  import re

Operating on the copy is ok:

In [38]: data1 = data[['d','a']].copy()
In [39]: foo(data1)
In [40]: data1
Out[40]: 
array([(1,  2.), (1,  2.), (1,  2.)],
      dtype=[('d', '<i4'), ('a', '<f4')])

(Next I'll try saving and retrieving this array with h5py and see if that makes a difference.)

With h5py,

d1 = f['data']
foo(d1)    # operate directly on the dataset
data1 = d1[:]; foo(data1)    # operate on a copy
data1 = d1[:,'a','b']          # also a copy

I can't reproduce the warning with h5py datasets.

It is also possible to suppress warnings. But first you need to clearly understand the meaning of the warning and any consequences.