why accuracy of my code is not improving even after 10000 iterations?

70 Views Asked by At

I am doing a binary classification for 2 classes (0,1), and I generated some 2d random points using make_blobs for semisupervised learning. it is an optimization problem and I want to use GradientDescent to minimize my cost function. but whenever I run my code whether for 50 or 10000 iterations, the accuracy of the algorithm is stuck at 0.5. I am not sure which part of my code is wrong and that accuracy doesn't increase.

Here is my code:

import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics import accuracy_score

x_labelled = np.random.randint(0,10,(3,2))
x_unlabelled = np.random.randint(0,10,(97,2)) 
y_labelled = np.random.randint(0,2,3)
y_unlabelled = np.random.randint(0,2,97)
w1 = 1 / (euclidean_distances(x_labelled, x_unlabelled) + 0.1)
w2 = 1 / (euclidean_distances(x_unlabelled, x_unlabelled) + 0.1)


def grad_func(y_labelled, y_unlabelled):
    
    grad = np.sum(w1* ( y_unlabelled.reshape(1, -1) - y_labelled.reshape(-1, 1))) +\
    np.sum(w2* ( y_unlabelled.reshape(1, -1) - y_unlabelled.reshape(-1, 1)))
    return grad    

def gradient_descent(max_iterations = 1000, learning_rate = 0.00001)
     
    # Initializing y_unlabelled, learning rate and iterations
    current_y = np.random.uniform(0,1,97)
     
    for i in range(max_iterations):
   
        # Calculating the gradients
        y_derivative = grad_func(y_labelled,current_y)
        
         
        # Updating current_y 
        current_y = current_y - (learning_rate * y_derivative)
        
                 
        # Printing the parameters for each iteration
        print(f"Iteration {i+1}:  y {current_y}")
     

    current_y[current_y>=0.5] = 1 #if your y's are >=0.5, insert them in class 1
    current_y[current_y<0.5] = 0  #if your y's are <0.5, insert them in class 0

        
    print("accuracy",accuracy_score(y_unlabelled, current_y)) 

    return current_y
 
gradient_descent()

1

There are 1 best solutions below

2
On

You created problem that is impossible to solve by any classifier. Your classes are from the same exact distribution. Its like you want it to distinguish two identical twins.

Below I added two lines that make that problem a bit easier for classifier - it changes distribution of class 0. I got accuracy around 60%.

x_labelled = np.random.randint(0, 10, (3, 2))
x_unlabelled = np.random.randint(0, 10, (97, 2))
y_labelled = np.random.randint(0, 2, 3)
y_unlabelled = np.random.randint(0, 2, 97)

x_labelled[y_labelled == 0] -= 1
x_unlabelled[y_unlabelled == 0] -= 1