scikit-learn: Intercept computed from dual coefficients

186 Views Asked by At

I'm trying to compute the intercept of a SVM classifier (linear kernel) using the dual coefficients. I'm using the SVC class instead of LinearSVC, because the later does not have a dual_coef_ attribute.

Here is the code:

from sklearn import svm
from sklearn.utils import shuffle
from sklearn.inspection import DecisionBoundaryDisplay
import matplotlib.pyplot as plt
import numpy as np
from numpy.testing import assert_array_almost_equal


seed = np.random.seed(123)

# Data
m = 1000

c2 = (-1, 1)
c1 = (2, 2.5)

r1 = 1.5
r2 = 3.5

rand_1_theta = np.random.uniform(0, 1, size=m)
rand_1_radius = np.random.uniform(0, 1, size=m)
ds1_x = (c1[0] + np.sqrt(rand_1_radius * r1) * np.cos(rand_1_theta * 2 * np.pi))
ds1_y = (c1[1] + np.sqrt(rand_1_radius * r1) * np.sin(rand_1_theta * 2 * np.pi))

rand_2_theta = np.random.uniform(0, 1, size=m)
rand_2_radius = np.random.uniform(0, 1, size=m)
ds2_x = (c2[0] + np.sqrt(rand_2_radius * r2) * np.cos(rand_2_theta * 2 * np.pi))
ds2_y = (c2[1] + np.sqrt(rand_2_radius * r2) * np.sin(rand_2_theta * 2 * np.pi))

X = np.stack((np.concatenate((ds1_x, ds2_x)), 
              np.concatenate((ds1_y, ds2_y))), 
              axis=1)
y = np.concatenate((np.zeros(m), np.ones(m)))
rgb = [(1,0,0) if val==0 else (0,0,1) for val in y]

# Preprocessing
X, y, rgb = shuffle(X, y, rgb, random_state=123)


# SVM
model = svm.SVC(probability=True, kernel='linear', random_state=123)
model.fit(X, y)


# Evaluation
support_vectors = model.support_vectors_
support_indexes = model.support_
coef = model.coef_
dual_coeff = model.dual_coef_
intercept = model.intercept_

primal_coeff = np.matmul(dual_coeff, support_vectors)


print(primal_coeff )
print(coef)

y_support = y[support_indexes]
all_bias = y_support - np.dot(support_vectors, primal_coeff.squeeze())

intercept_estimate_1 = np.mean(all_bias)
intercept_estimate_2 = np.min(all_bias)

print(intercept)
print(intercept_estimate_1)
print(intercept_estimate_2)

Notice the intercepts estimates using the mean and the minimum printed at the end of the script. As far as I know, the intercept is the mean of all intercept estimates. At least, most books present it this way. But seems scikit-learn is using the minimum value instead. I don't understand why, and the documentation does not explain how it is done.

Anyone can provide an explanation?

2

There are 2 best solutions below

3
hychou On BEST ANSWER

The intercept is neither the mean nor the min of all bias.

Instead, for libsvm (and therefore sklearn) it is the mean of y*gradient of all free vectors. You can see this in the LIBSVM paper, 4.1.5.

enter image description here

Because in prediction, only support vector matters and libsvm does not save all coef, I don’t think you can easily check this property without look deeply into the core.

But if you are really interested in it, you can check libsvm code for calculating the bias(=-rho) here

double Solver::calculate_rho()
{
    double r;
    int nr_free = 0;
    double ub = INF, lb = -INF, sum_free = 0;
    for(int i=0;i<active_size;i++)
    {
        double yG = y[i]*G[i];

        if(is_upper_bound(i))
        {
            if(y[i]==-1)
                ub = min(ub,yG);
            else
                lb = max(lb,yG);
        }
        else if(is_lower_bound(i))
        {
            if(y[i]==+1)
                ub = min(ub,yG);
            else
                lb = max(lb,yG);
        }
        else
        {
            ++nr_free;
            sum_free += yG;
        }
    }

    if(nr_free>0)
        r = sum_free/nr_free;
    else
        r = (ub+lb)/2;

    return r;
}

0
Kevin On

I would like to complement the accepted answer. I had a recent post exactly on this problem. It seems that StackOverflow does not handle math very well, so I didn't paste the post source here. So, please, refer to my original post for clarification.

My claim

My claim is that, according to the LIBSVM paper 4.1.5, the intercept IS the mean of all intercept estimates:

In LIBSVM, for numerical stability, we average all these values.

where "all these values" refer to the intercept estimates. In my post, I show that negative y times gradient (in @hychou's answer) is just another form to represent the intercept. You are right in this part.

Where you are wrong

Basically,

  • The intercept estimates are NOT taken from all support vectors (@hychou's answer is right in this part). Only free alpha's are taken into account, and the bounded ones, in particular, upper bounded ones, are removed. Your code didn't remove those alpha's and support vectors.
  • The y in SVM formulation takes values +1 and -1, but in your code, y_support takes 1 and 0. Yes, when feeding to SVC, y takes 1 and 0; but when manually computing the intercept following the math, +1 and -1 are your friends.

I rewrote your code, and it ran perfectly on my computer, using your provided seed, at least. I've highlighted where you should notice with ###:

from sklearn import svm
from sklearn.utils import shuffle
from sklearn.inspection import DecisionBoundaryDisplay
import matplotlib.pyplot as plt
import numpy as np
from numpy.testing import assert_array_almost_equal


seed = np.random.seed(123)

# Data
m = 1000

c2 = (-1, 1)
c1 = (2, 2.5)

r1 = 1.5
r2 = 3.5

rand_1_theta = np.random.uniform(0, 1, size=m)
rand_1_radius = np.random.uniform(0, 1, size=m)
ds1_x = (c1[0] + np.sqrt(rand_1_radius * r1) * np.cos(rand_1_theta * 2 * np.pi))
ds1_y = (c1[1] + np.sqrt(rand_1_radius * r1) * np.sin(rand_1_theta * 2 * np.pi))

rand_2_theta = np.random.uniform(0, 1, size=m)
rand_2_radius = np.random.uniform(0, 1, size=m)
ds2_x = (c2[0] + np.sqrt(rand_2_radius * r2) * np.cos(rand_2_theta * 2 * np.pi))
ds2_y = (c2[1] + np.sqrt(rand_2_radius * r2) * np.sin(rand_2_theta * 2 * np.pi))

X = np.stack((np.concatenate((ds1_x, ds2_x)), 
              np.concatenate((ds1_y, ds2_y))), 
              axis=1)
y = np.concatenate((np.zeros(m), np.ones(m)))
rgb = [(1,0,0) if val==0 else (0,0,1) for val in y]

# Preprocessing
X, y, rgb = shuffle(X, y, rgb, random_state=123)


# SVM
model = svm.SVC(probability=True, kernel='linear', random_state=123)
model.fit(X, y)


# Evaluation
support_vectors = model.support_vectors_
support_indexes = model.support_
coef = model.coef_
dual_coeff = model.dual_coef_
intercept = model.intercept_

primal_coeff = np.matmul(dual_coeff, support_vectors)


print(primal_coeff )
print(coef)

### BEGIN NOTICE HERE ###
y_support = y[support_indexes] * 2 - 1
S = np.ravel(np.abs(dual_coeff)) < 1
all_bias = y_support[S] - np.dot(support_vectors[S], primal_coeff.squeeze())
### END NOTICE HERE ###

intercept_estimate_1 = np.mean(all_bias)
intercept_estimate_2 = np.min(all_bias)

print(intercept)
print(intercept_estimate_1)
print(intercept_estimate_2)