Visualization (2D) of SVM in Python

959 Views Asked by At

I have an assignment, which is below. I have done the first 5 tasks and have a problem with the last one. To plot it. Please give instruction on how to do it. Thank you in advance.

*(I have started learning SVM and ML just several days ago, please take it into account)

**(As I think the sequence of actions should be the same for plotting for all types of kernels. If you show even for one of them it would be great. I will try to adapt your code for others)

The procedure to follow:

  1. Randomly take the samples from this map. (#100) and take this into Python for SVC. This dataset includes Easting, Northing and Rock information.

  2. With these 100 randomly selected samples, split again randomly to train and test datasets.

  3. Try to run the SVC with the kernels of linear, polynomial, radial basis function, and tangent.

  4. Find the best of each, for instance, if you are using a radial basis function, which "C" and "gamma" can be the optimum one based on the accuracy that you get from accuracy scores.

  5. Once you have the fitted model and you calculated the accuracy scores (obtained from test dataset), then import the whole dataset into the obtained FIT MODELS and predict the output of all that 90,000 sample points that we have in the reference.csv.

  6. Show me the obtained maps and also the accuracy scores that you get from each FIT MODEL.

The dataset looks like:

enter image description here

90000 points in the same style.

Here is the code:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

### Importing Info

df = pd.read_csv("C:/Users/Admin/Desktop/RA/step 1/reference.csv", header=0)
df_model = df.sample(n = 100)
df_model.shape

## X-y split

X = df_model.loc[:,df_model.columns!="Rock"]
y = df_model["Rock"]
y_initial = df["Rock"]

### for whole dataset

X_wd = df.loc[:, df_model.columns!="Rock"]
y_wd = df["Rock"]

## Test-train split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Standardizing the Data

from sklearn.preprocessing import StandardScaler

sc = StandardScaler().fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

## Linear
### Grid Search

from sklearn.model_selection import GridSearchCV
from sklearn import svm
from sklearn.metrics import accuracy_score, confusion_matrix

params_linear = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500,1000)}
clf_svm_l = svm.SVC(kernel = 'linear')
svm_grid_linear = GridSearchCV(clf_svm_l, params_linear, n_jobs=-1,
                              cv = 3, verbose = 1, scoring = 'accuracy')

svm_grid_linear.fit(X_train_std, y_train)
svm_grid_linear.best_params_
linsvm_clf = svm_grid_linear.best_estimator_
accuracy_score(y_test, linsvm_clf.predict(X_test_std))

### training svm

clf_svm_l = svm.SVC(kernel = 'linear', C = 0.1)
clf_svm_l.fit(X_train_std, y_train)

### predicting model

y_train_pred_linear = clf_svm_l.predict(X_train_std)
y_test_pred_linear = clf_svm_l.predict(X_test_std)
y_test_pred_linear
clf_svm_l.n_support_

### whole dataset

y_pred_linear_wd = clf_svm_l.predict(X_wd)

### map
        


## Poly
### grid search for poly

params_poly = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500,1000),
         'degree' : (1,2,3,4,5,6)}
clf_svm_poly = svm.SVC(kernel = 'poly')
svm_grid_poly = GridSearchCV(clf_svm_poly, params_poly, n_jobs = -1,
                            cv = 3, verbose = 1, scoring = 'accuracy')
svm_grid_poly.fit(X_train_std, y_train)
svm_grid_poly.best_params_
polysvm_clf = svm_grid_poly.best_estimator_
accuracy_score(y_test, polysvm_clf.predict(X_test_std))

### training svm

clf_svm_poly = svm.SVC(kernel = 'poly', C = 50, degree = 2)
clf_svm_poly.fit(X_train_std, y_train)

### predicting model

y_train_pred_poly = clf_svm_poly.predict(X_train_std)
y_test_pred_poly = clf_svm_poly.predict(X_test_std)

clf_svm_poly.n_support_

### whole dataset

y_pred_poly_wd = clf_svm_poly.predict(X_wd)

### map            


## RBF

### grid search rbf

params_rbf = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500,1000),
         'gamma' : (0.001, 0.01, 0.1, 0.5, 1)}
clf_svm_r = svm.SVC(kernel = 'rbf')
svm_grid_r = GridSearchCV(clf_svm_r, params_rbf, n_jobs = -1,
                         cv = 10, verbose = 1, scoring = 'accuracy')
svm_grid_r.fit(X_train_std, y_train)
svm_grid_r.best_params_
rsvm_clf = svm_grid_r.best_estimator_
accuracy_score(y_test, rsvm_clf.predict(X_test_std))

### training svm

clf_svm_r = svm.SVC(kernel = 'rbf', C = 500, gamma = 0.5)
clf_svm_r.fit(X_train_std, y_train)

### predicting model

y_train_pred_r = clf_svm_r.predict(X_train_std)
y_test_pred_r = clf_svm_r.predict(X_test_std)

### whole dataset

y_pred_r_wd = clf_svm_r.predict(X_wd)

### map            


## Tangent

### grid search

params_tangent = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50),
         'gamma' : (0.001, 0.01, 0.1, 0.5, 1)}
clf_svm_tangent = svm.SVC(kernel = 'sigmoid')
svm_grid_tangent = GridSearchCV(clf_svm_tangent, params_tangent, n_jobs = -1,
                            cv = 10, verbose = 1, scoring = 'accuracy')
svm_grid_tangent.fit(X_train_std, y_train)
svm_grid_tangent.best_params_
tangentsvm_clf = svm_grid_tangent.best_estimator_
accuracy_score(y_test, tangentsvm_clf.predict(X_test_std))

### training svm

clf_svm_tangent = svm.SVC(kernel = 'sigmoid', C = 1, gamma = 0.1)
clf_svm_tangent.fit(X_train_std, y_train)

### predicting model

y_train_pred_tangent = clf_svm_tangent.predict(X_train_std)
y_test_pred_tangent = clf_svm_tangent.predict(X_test_std)

### whole dataset

y_pred_tangent_wd = clf_svm_tangent.predict(X_wd)

### map
2

There are 2 best solutions below

0
On

Here is the answer for plotting linear visualization, for those who will encounter the same problem as me. It will be easy to adapt these code for other kernels.

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train_std, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, clf_svm_l.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('darkblue', 'yellow')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('blue', 'gold'))(i), label = j)
plt.title('SVM (Training set)')
plt.xlabel('Easting')
plt.ylabel('Northing')
plt.legend()
plt.show()
0
On

From your sample data, it looks like you are dealing with regularly spaced data, and the rows / cols are iterated in a monotonously increasing fashion. Here is one way to reshape this dataset into 2d array (by reshaping the array into rows) and plot it accordingly:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# create sample data
data = {
    'Easting': [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3],
    'Northing': [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
    'Rocks': [0, 0, 1, 0, 0, 2, 0, 0, 0, 1, 0, 0],
}
df = pd.DataFrame(data)

# reshape data into 2d matrix (assuming easting / northing steps from 0 to max value)
max_easting = np.max(df['Easting'])
img_data = np.reshape(data['Rocks'], (max_easting, -1))

# plot as image
plt.imshow(img_data)
plt.show()

If you are dealing with irregular spaced data, i.e. not every easting/northing combination has a value, you might look into plotting irregular spaced data.