Log-odds decision boundary of multivariate distribution

288 Views Asked by At

I have some 2d data about 2 classes and I am trying to compute the log odds:

ln(P(class=a|x)/P(class=b|x))

Then I want to plot the decision boundary, namely all the points that have log odds = 0. I have done this for 1d data but for 2d data, my intuition is I have to use a 2d histogram to get P(x) and P(x|class = a), P(x | class = b). Is what I am doing correct? One question I have is where do I get P(class = a)? Is it just 0.5 because there are 2 classes with equal number of samples? I also think the way I plot the decision boundary might be wrong as it is not really what I expected.

N = 1000

mean_a = [0, 0]
cov_a = [[2, 0], [0, 2]]  # diagonal covariance

mean_b = [1, 2]
cov_b = [[1, 0], [0, 1]]  # diagonal covariance

#generate data
Xa = np.random.multivariate_normal(mean_a, cov_a, N)
Xb = np.random.multivariate_normal(mean_b, cov_b, N)
Xall = np.vstack((Xa,Xb))

def logratio(a, b, eps=1e-14): 
    # take log ( ratio of probabilities of (y vs not-y) )   
    a=a+eps # to prevent taking logs of 0 or infinity
    b=b+eps # to prevent taking logs of 0 or infinity
    return np.log(a/b)

P_a = 0.5 # since each class has equal number of samples
P_b = 0.5

(P_xn_if_a, x_bins, y_bins) = np.histogram2d(Xa[:, 0], Xa[:, 1])
(P_xn, x_bins, y_bins) = np.histogram2d(Xall[:, 0], Xall[:, 1])
(P_xn_if_b, x_bins, y_bins) = np.histogram2d(Xb[:, 0], Xb[:, 1])

P_b_if_xn = P_xn_if_b * P_a / (P_xn + 1e-16)
P_a_if_xn = P_xn_if_a * P_a / (P_xn + 1e-16)
log_odds = logratio(P_a_if_xn, P_b_if_xn)

#plot only boundary
for i in range(0,10):
    for j in range(0,10):
        if log_odds[i][j] != 0:
            log_odds[i][j] = 0
        else:
            log_odds[i][j] = 1



fig, ax6 = plt.subplots(nrows=1, ncols=1,figsize=(15,8))
ax6.contour(x_bins[:-1], y_bins[:-1], log_odds,levels=[0], cmap="Greys_r")
ax6.scatter(Xa[:,0],Xa[:,1],color='r')
ax6.scatter(Xb[:,0],Xb[:,1],color='b')

enter image description here

0

There are 0 best solutions below