Handling NaN / Inf in a numpy dnarray

Question

Handling NaN / Inf in a numpy dnarray

212 Views Asked by AudioBubble At 17 August 2025 at 10:45

Working on a 4D numpy array (array of arrays). Each nested array is of shape (1, 100, 4)

trainset.shape
(159984, 1, 100, 4)

But then within the nested arrays, are found some nan values which I would like to handle. For example the first nested array in trainset contains such:

trainset[0]
array([[[ 7.10669020e-02,  4.91383899e-03, -1.43700407e-02,
          1.52228864e-04],
        [ 7.59807410e-02, -9.45620170e-03,             nan,
          1.35892100e-04],
        [ 6.65245393e-02,             nan,             nan,
          8.98521456e-05],
        [            nan,             nan,             nan,
          1.41090006e-05],
        [            nan,             nan,             nan,
          6.68319391e-06],
        [            nan,             nan,             nan,
         -3.27272689e+01],
        [            nan,             nan,             nan,
         -1.09090911e+01],
        [            nan,             nan,             nan,
          8.25973981e+01],
        [            nan,             nan,             nan,
          1.12207785e+02],
        [            nan,             nan,             nan,
          1.65194797e+02],
        [            nan,             nan,             nan,
          2.25974015e+02],
        [            nan,             nan,             nan,
          2.78961026e+02],
        [ 3.87926649e-03,  1.81274134e-04, -1.08764481e-03,
          3.41298685e+02],
        ...
        [ 4.06054062e-03, -9.06370679e-04,  1.30517379e-03,
          3.10129855e+02]]])

How do I check all arrays in trainset for nan values and where found, replaces that with column's median value?

EDIT

Using:

from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='median')

for data in trainset:
  trainsfrom_data = imp_mean.fit(trainset)

ValueError: Found array with dim 3. Estimator expected <= 2.

gives the indicated error, as above.

Original Q&A

There are 1 best solutions below

**Aramakus** · Accepted Answer

The simplest way would be to use SimpleImputer, and select the median imputing strategy. I am not sure if nan are replaced column-wise or row-wise, you may have to reshape your array before passing it through the SimpleImputer(), and then reshape it back.

To your edit: reshape array into 2D, preserving column size, and then make a reshape to original form. Also, use fit_transform for every column to get the result in one go. Reshape will be something like this:

import numpy as np

A = np.random.rand(15, 1, 100, 4)
print(A.shape)

init_shape = A.shape

B = A.reshape(np.prod(init_shape[:-1]), init_shape[-1])
print(B.shape)

# SimpleImputer goes here

B = B.reshape(init_shape)
print(B.shape)

Handling NaN / Inf in a numpy dnarray

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in ARRAYS

Related Questions in NUMPY

Related Questions in MULTIDIMENSIONAL-ARRAY

Related Questions in ARRAY-INDEXING

Trending Questions

Popular # Hahtags

Popular Questions