Regression with Self Organizing Map (SOM) / Kohonen Map

245 Views Asked by At

I am evaluating an SOM/Kohonen Map as a regressor for a dataset. Unfortunately it performs extremely bad - so bad, that I think I might have an error in my code. While the R2 score for the training dataset is usually roughly only around 1-5%, the R2 score for the test dataset is ALWAYS extremely negative; example:

Train: 1.09 %

Test: -5668908.61 %

Even though I went over my code over and over again, I just want to make sure, that I did not make a mistake with scaling the data or such, which might cause the bad performance. Basically I split the data into X and y and then use sklearns test_train_split() to get the respective datasets.

I use sklearns MinMaxScaler() to fit_transform() X_train and apply the same transformation on X_test so that there is no data leakage. For y_train I use a separate scaler (scalery).

After each model is trained, I use the y_train scaler (scalery) to inverse the scaling on y_pred, y_pred_train and y_train.

Is there some mistake in my approach? I just want to make sure, that this type of model performs just inherently badly and not because of an error on my side.

Here is my code:

data = load_dataset(currency, 1440, predictor, data_range)
X = data.drop(predictor, axis =1)
y = data[[predictor]]

scaler = MinMaxScaler(feature_range=(0, 1))
scalery = MinMaxScaler(feature_range=(0, 1))

X_train, X_test, y_train, y_test = train_test_split(
           X,
           y,
           test_size=0.2,
           shuffle=False,
        )
        
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
y_train = scalery.fit_transform(y_train)

map_size= int(5* math.sqrt(X_test.shape[0])) #vesanto
        
info_dict = {
            'currency': currency,
            'data_range': data_range,
            'epochs': 0
            }
        
for i in range(100,2100,100):
    info_dict['epochs'] = i
    print(f"GridSearch Configuration: {map_size}x{map_size}")
    print(currency, data_range, i)
    som = susi.SOMRegressor(
                n_rows=map_size,
                n_columns=map_size,
                n_iter_unsupervised=i,
                n_iter_supervised=i,
                neighborhood_mode_unsupervised="linear",
                neighborhood_mode_supervised="linear",
                learn_mode_unsupervised="min",
                learn_mode_supervised="min",
                learning_rate_start=0.5,
                learning_rate_end=0.05,
                # do_class_weighting=True,
                random_state=None,
                n_jobs=1)

    som.fit(X_train, y_train.ravel())
       
    y_pred = som.predict(X_test)
    y_pred_train = som.predict(X_train)

    y_pred = scalery.inverse_transform(pd.DataFrame(y_pred))
    y_train = scalery.inverse_transform(pd.DataFrame(y_train))
    y_pred_train = scalery.inverse_transform(pd.DataFrame(y_pred_train))
            
    print("Train: {0:.2f} %".format(r2_score(y_train, y_pred_train)*100))
    print("Test: {0:.2f} %".format(r2_score(y_test, y_pred)*100))
0

There are 0 best solutions below