AttributeError: 'NoneType' object has no attribute 'split' SMOTE

6.3k Views Asked by At

I'm resampling some data using SMOTE and getting an error like this:

AttributeError: 'NoneType' object has no attribute 'split'

my code :

sm = SMOTE(random_state = 42)
X_train_resampled, y_train_resampled = sm.fit_resample(X_train_final, y_train)

can someone help me fixing this problem? because seems i dont have any problem with my data

full problem :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-49-9465f7b6ac21> in <module>
      1 #resample data using SMOTE
      2 sm = SMOTE(random_state = 42)
----> 3 X_train_resampled, y_train_resampled = sm.fit_resample(X_train_final, y_train)

~\AppData\Roaming\Python\Python38\site-packages\imblearn\base.py in fit_resample(self, X, y)
     81         )
     82 
---> 83         output = self._fit_resample(X, y)
     84 
     85         y_ = (

~\AppData\Roaming\Python\Python38\site-packages\imblearn\over_sampling\_smote\base.py in _fit_resample(self, X, y)
    322 
    323             self.nn_k_.fit(X_class)
--> 324             nns = self.nn_k_.kneighbors(X_class, return_distance=False)[:, 1:]
    325             X_new, y_new = self._make_samples(
    326                 X_class, y.dtype, class_sample, X_class, nns, n_samples, 1.0

~\AppData\Roaming\Python\Python38\site-packages\sklearn\neighbors\_base.py in kneighbors(self, X, n_neighbors, return_distance)
    761         )
    762         if use_pairwise_distances_reductions:
--> 763             results = PairwiseDistancesArgKmin.compute(
    764                 X=X,
    765                 Y=self._fit_X,

sklearn\metrics\_pairwise_distances_reduction.pyx in sklearn.metrics._pairwise_distances_reduction.PairwiseDistancesArgKmin.compute()

~\AppData\Roaming\Python\Python38\site-packages\sklearn\utils\fixes.py in threadpool_limits(limits, user_api)
    149         return controller.limit(limits=limits, user_api=user_api)
    150     else:
--> 151         return threadpoolctl.threadpool_limits(limits=limits, user_api=user_api)
    152 
    153 

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in __init__(self, limits, user_api)
    169             self._check_params(limits, user_api)
    170 
--> 171         self._original_info = self._set_threadpool_limits()
    172 
    173     def __enter__(self):

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in _set_threadpool_limits(self)
    266             return None
    267 
--> 268         modules = _ThreadpoolInfo(prefixes=self._prefixes,
    269                                   user_api=self._user_api)
    270         for module in modules:

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in __init__(self, user_api, prefixes, modules)
    338 
    339             self.modules = []
--> 340             self._load_modules()
    341             self._warn_if_incompatible_openmp()
    342         else:

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in _load_modules(self)
    371             self._find_modules_with_dyld()
    372         elif sys.platform == "win32":
--> 373             self._find_modules_with_enum_process_module_ex()
    374         else:
    375             self._find_modules_with_dl_iterate_phdr()

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in _find_modules_with_enum_process_module_ex(self)
    483 
    484                 # Store the module if it is supported and selected
--> 485                 self._make_module_from_path(filepath)
    486         finally:
    487             kernel_32.CloseHandle(h_process)

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in _make_module_from_path(self, filepath)
    513             if prefix in self.prefixes or user_api in self.user_api:
    514                 module_class = globals()[module_class]
--> 515                 module = module_class(filepath, prefix, user_api, internal_api)
    516                 self.modules.append(module)
    517 

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in __init__(self, filepath, prefix, user_api, internal_api)
    604         self.internal_api = internal_api
    605         self._dynlib = ctypes.CDLL(filepath, mode=_RTLD_NOLOAD)
--> 606         self.version = self.get_version()
    607         self.num_threads = self.get_num_threads()
    608         self._get_extra_info()

C:\ProgramData\Anaconda3\lib\site-packages\threadpoolctl.py in get_version(self)
    644                              lambda: None)
    645         get_config.restype = ctypes.c_char_p
--> 646         config = get_config().split()
    647         if config[0] == b"OpenBLAS":
    648             return config[1].decode("utf-8")

AttributeError: 'NoneType' object has no attribute 'split'

I have tried to look further into the data I have but I can't seem to find any problems.

3

There are 3 best solutions below

0
On

I am not providing a solution to the above problem(which I also faced when using SMOTE), but a small suggestion to work around the problem.

You can use other methods related to SMOTE like BorderlineSMOTE etc. In my case, using BorderlineSMOTE solved the problem.

Further, I felt that BorderlineSMOTE is slightly better. As SMOTE can have some problems. You can refer to the following sites for more info:-

  1. Compare over-sampling samplers
  2. Oversampling with SMOTE and ADASYN

I hope it provides some kind of help.

0
On

These kinds of errors occur due to inconsistent dependencies. Run these commands to check the versions of your libraries. Next, go to the official site of the distributor, in this case, https://scikit-learn.org/. Then check whether your packages' versions match the versions of the dependencies mentioned on the site.

import sklearn
print(sklearn.show_versions())

The above commands generate the results:

Python dependencies:
      sklearn: 1.2.1
          pip: 21.2.4
   setuptools: 58.0.4
        numpy: 1.22.4
        scipy: 1.7.1
       Cython: 0.29.24
       pandas: 1.3.4
   matplotlib: 3.4.3
       joblib: 1.2.0
threadpoolctl: 3.1.0

If you will install these versions you will not face any issues. Numpy and threadpoolctl versions matters most.

0
On

I found out that SMOTE cannot handle more than 15 columns in your data frame. More columns cause issues in SMOTE and a "None Type" exception will raise. You have to install threadpoolctl package to resolve the issue.

pip install -U threadpoolctl