How to count points inside a contour obtained from svm.SVC?

105 Views Asked by At

I generated a data-set from a bivariate gaussian to simulate a signal and a background. Then I tried to separate the two components using svm.SVC with kernel 'rbf'. Now I'd like to count how many background points lay inside the acceptance region, how many signal points lay outside the region etc. in order to study the efficiency of the separation method.

I have to say that I got parts of this code from an online example because it is the first time for me doing such type of analysis; so I'm not sure I really understand how the contour line is created, consequently I'm not able to handle it and to analyse its position respect to the points.

Thanks

mu_vec1 = np.array([0,0])
cov_mat1 = np.array([[0.3**2,0.5*0.3*0.3],[0.5*0.3*0.3,0.3**2]])
x1_samples = np.random.multivariate_normal(mu_vec1, cov_mat1, 1000)
#mu_vec1 = mu_vec1.reshape(1,2).T

mu_vec2 = np.array([2,1])
cov_mat2 = np.array([[1,0.4],[0.4,1]])
x2_samples = np.random.multivariate_normal(mu_vec2, cov_mat2, 1000)
#mu_vec2 = mu_vec2.reshape(1,2).T

X = np.concatenate((x1_samples,x2_samples), axis = 0)
Y = np.array([0]*1000 + [1]*1000)

plt.scatter(x1_samples[:,0],x1_samples[:,1], label='Signal')
plt.scatter(x2_samples[:,0],x2_samples[:,1],label='Background')

C = 1.0  # SVM regularization parameter
clf = svm.SVC(kernel = 'rbf', C=C, probability = True )
clf.fit(X, Y)

h = .02  # step size in the mesh
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

print(clf.score(X,Y))

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)

plt.contour(xx, yy, Z, cmap=plt.cm.Paired)

plt.xlabel('x')
plt.ylabel('y')
plt.legend()
1

There are 1 best solutions below

0
On

I actually solve it using the method 'predict' this way:

false_pos=0
true_pos=0
true_neg=0
false_neg=0

pred_sig = clf.predict(np.c_[x1_samples[:,0], x1_samples[:,1]])

for i in range(0,len(pred_sig)):
    if pred_sig[i] == 1:
        false_neg=false_neg + 1
        
    if pred_sig[i] == 0:
        true_pos=true_pos + 1

pred_back = clf.predict(np.c_[x2_samples[:,0], x2_samples[:,1]])

for i in range(0,len(pred_back)):
    if pred_back[i] == 1:
        true_neg=true_neg + 1
        
    if pred_back[i] == 0:
        false_pos=false_pos + 1
        
print(true_pos+false_pos+true_neg+false_neg)

purezza = true_pos/(true_pos+false_pos)
rig_fondo = true_neg/(true_neg+false_neg)