I'm trying to use SMOGN to balance my data but it's giving TypeError or UFuncTypeError how to solve this problem?

39 Views Asked by At

I have data as images(arrays) with their labels uploaded from folders. the data is imbalanced and i'm trying to balance it using smgon after creating dataframe. the data histogram

here's the code:

    r_labels=[]
    im=[]
    for filename in os.listdir(folder):
        img = cv.imread(os.path.join(folder, filename))
        if img is not None:
            aio_plant = filename.split("_")
            flowering_time = aio_plant[2].split(".")[0]
            im.append(np.asarray(img).astype(np.float32))
            r_labels.append(np.uint8(flowering_time))
    df = pd.DataFrame({'images': im, 'labels':r_labels})  
    sm= smogn.smoter(
        data = df,  ## pandas dataframe
        y = 'labels'  ## string ('header name')
        )

this is giving an error: TypeError: unhashable type: 'numpy.ndarray' I tried to change the type like this:

            r_labels.append(flowering_time)

and it gives: UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U2'), dtype('<U2')) -> None

the data looks like this:

                                                 images  labels
0     [[[0.0, 0.0, 255.0], [0.0, 255.0, 0.0], [0.0, ...      86
1     [[[255.0, 0.0, 0.0], [255.0, 0.0, 0.0], [0.0, ...      53
2     [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...      46
3     [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [0.0, ...      44
4     [[[255.0, 0.0, 0.0], [255.0, 0.0, 0.0], [255.0...      63
...                                                 ...     ...
998   [[[0.0, 0.0, 255.0], [0.0, 255.0, 0.0], [255.0...      86
999   [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...     215
1000  [[[0.0, 0.0, 255.0], [0.0, 0.0, 255.0], [0.0, ...      92
1001  [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...      61
1002  [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...     183
1

There are 1 best solutions below

0
Рим On BEST ANSWER

I solved the problem by converting labels to hashable integers and images column to string representation of NumPy array then converting them back after smote.

# Convert labels to hashable integers
    df['labels'] = df['labels'].astype(int)
    # Convert images column to string representation of NumPy array
    df['images'] = df['images'].apply(lambda x: np.array2string(x.flatten(), separator=','))

    sm= smogn.smoter(
        data = df,  ## pandas dataframe
        y = 'labels',  ## string ('header name')
        )
    sm['images'] = sm['images'].apply(lambda x: np.fromstring(x[1:-1], sep=','))
    df['labels'] = df['labels'].astype(int)