Struggling to solve indexing error while cleaning data to train SOM

52 Views Asked by At

I am working on a project & I'm coming across a recurring problem. I will provide a snippet of code for context & would appreciate a quick solution.

SOM_dataset = calgdf.drop(['Station Name', 'Total Duration (hh:mm:ss)', 'Charging Time (hh:mm:ss)',  'Latitude', 'Longitude', 'geometry'], axis=1)
# Check if there are any missing values
SOM_dataset.isna().any(),'\n'
SOM_dataset[SOM_dataset.isna().any(axis = 1)]
# Drop missing values
SOM_dataset_clean = SOM_dataset.dropna()
# Check for missing values again
print(SOM_dataset_clean.isna().any(),'\n')
# Check the datatypes and the shape of the dataset
print(SOM_dataset_clean.dtypes, SOM_dataset_clean.shape,'\n')
# print(SOM_dataset)
print('Initial Index: ', SOM_dataset_clean.shape, '\n\n\n')
# # Reset index to ensure continuous indices
SOM_dataset_clean.reset_index(drop=True, inplace=True)
print('New Index:', SOM_dataset_clean.shape)

from sklearn_som.som import SOM
# Setting up SOM
model = SOM(m=50, n=50, dim=5, lr=0.1)
model.fit_transform(SOM_dataset_clean)
print('Success') # confirmation

The error that I come across after running is:

{
    "name": "KeyError",
    "message": "27353",
    "stack": "---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File f:\\Projects\\SmartGrid\\venv\\lib\\site-packages\\pandas\\core\\indexes\\base.py:3653, in Index.get_loc(self, key)
   3652 try:
-> 3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:

File f:\\Projects\\SmartGrid\\venv\\lib\\site-packages\\pandas\\_libs\\index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File f:\\Projects\\SmartGrid\\venv\\lib\\site-packages\\pandas\\_libs\\index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas\\_libs\\hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\\_libs\\hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 27353

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
f:\\Projects\\SmartGrid\\SOM.ipynb Cell 15 line 4
      <a href='vscode-notebook-cell:/f%3A/Projects/SmartGrid/SOM.ipynb#X26sZmlsZQ%3D%3D?line=1'>2</a> # Setting up SOM
      <a href='vscode-notebook-cell:/f%3A/Projects/SmartGrid/SOM.ipynb#X26sZmlsZQ%3D%3D?line=2'>3</a> model = SOM(m=50, n=50, dim=5, lr=0.1)
----> <a href='vscode-notebook-cell:/f%3A/Projects/SmartGrid/SOM.ipynb#X26sZmlsZQ%3D%3D?line=3'>4</a> model.fit_transform(SOM_dataset_clean)
      <a href='vscode-notebook-cell:/f%3A/Projects/SmartGrid/SOM.ipynb#X26sZmlsZQ%3D%3D?line=4'>5</a> print('Success') # confirmation

File f:\\Projects\\SmartGrid\\venv\\lib\\site-packages\\sklearn_som\\som.py:277, in SOM.fit_transform(self, X, **kwargs)
    258 \"\"\"
    259 Convenience method for calling fit(X) followed by transform(X). Unlike
    260 in sklearn, this is not implemented more efficiently (the efficiency is
   (...)
    274     from each item in X to each cluster center.
    275 \"\"\"
    276 # Fit to data
--> 277 self.fit(X, **kwargs)
    279 # Return points in cluster distance space
    280 return self.transform(X)

File f:\\Projects\\SmartGrid\\venv\\lib\\site-packages\\sklearn_som\\som.py:162, in SOM.fit(self, X, epochs, shuffle)
    160 if global_iter_counter > self.max_iter:
    161     break
--> 162 input = X[idx]
    163 # Do one step of training
    164 self.step(input)

File f:\\Projects\\SmartGrid\\venv\\lib\\site-packages\\pandas\\core\\frame.py:3761, in DataFrame.__getitem__(self, key)
   3759 if self.columns.nlevels > 1:
   3760     return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
   3762 if is_integer(indexer):
   3763     indexer = [indexer]

File f:\\Projects\\SmartGrid\\venv\\lib\\site-packages\\pandas\\core\\indexes\\base.py:3655, in Index.get_loc(self, key)
   3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
   3656 except TypeError:
   3657     # If we have a listlike key, _check_indexing_error will raise
   3658     #  InvalidIndexError. Otherwise we fall through and re-raise
   3659     #  the TypeError.
   3660     self._check_indexing_error(key)

KeyError: 27353"
}

I tried to change all string values to int values, and removed the time columns as well. I thought those were hindering my SOM training, yet the problem is not solved. P.S. this is part of a notebook, hence the unusual import in the middle.

0

There are 0 best solutions below