How to add a boundary to a figure (data set) using matplotlib and SVM algorithm?

302 Views Asked by At

My code:

import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('data/data.csv')
X = data[['x1','x2']]
y = data['y']

from sklearn.svm import SVC
classifier = SVC()
classifier.fit(X,y)

plt.scatter(data['x1'], data['x2'], c=y, s=50)
plt.show()

My data:

x1,x2,y
0.336493583877,-0.985950993354,0.0
-0.0110425297266,-0.10552856162,1.0
0.238159509297,-0.61741666482,1.0
-0.366782883496,-0.713818716912,1.0
1.22192307438,-1.03939898614,0.0

My current output: enter image description here

Probably Support Vector Machine isn't the best algorithm to be used there, but I would like to see the boundary generated for that. How to do it?


And applying the perfect Paul's answer, this is the result: enter image description here

2

There are 2 best solutions below

0
On BEST ANSWER

Building off of Sun Yi's answer, you can use the example code from here. For example, you don't have all the points in your data.csv in your question but we can produce a plot with the decision boundary like this:

import pandas as pd
import numpy as np
from matplotlib.colors import ListedColormap
from sklearn.svm import SVC
import matplotlib.pyplot as plt

# load the data
data = pd.read_csv('data/data.csv')
X = data[['x1','x2']]
y = data['y']

# fit the classifier
classifier = SVC(kernel='rbf')
classifier.fit(X,y)

# first we determine the grid of points -- i.e. the min and max  for each of 
# the axises and then build a grid
resolution=0.02
x1_min, x1_max = X["x1"].min() - 1, X["x1"].max() + 1
x2_min, x2_max = X["x2"].min() - 1, X["x2"].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
   np.arange(x2_min, x2_max, resolution))

# setup marker generator and color map
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
cmap = ListedColormap(colors[:len(np.unique(y))])

# plot the classifier decision boundaries
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())

# plot the data points
for idx, cl in enumerate(np.unique(y)):
    plt.scatter(x=X["x1"][y == cl].values, 
                y=X["x2"][y == cl].values,
                alpha=0.6, 
                c=cmap(idx),
                edgecolor='black',
                marker=markers[idx], 
                label=cl)    
plt.show()

This is taken heavily from the example code in the link above. I tried to only include what was needed to keep it simple. Here is the output image: enter image description here

You'll notice that I explicitly used the rbf kernel as the full data in your example isn't linearly separable. For a nice, more general than mine, answer on these contours this answer is good.

2
On

your data is not Linearly Separable you can use svm algorithm
you data is 2d and this alorithm can tranfer your data to 3d by using kernel function
you can find this algorthm in sklearn