why accuracy of my code is not improving even after 10000 iterations?

90 Views Asked by At

I am doing a binary classification for 2 classes (0,1), and I generated some 2d random points using make_blobs for semisupervised learning. it is an optimization problem and I want to use GradientDescent to minimize my cost function. but whenever I run my code whether for 50 or 10000 iterations, the accuracy of the algorithm is stuck at 0.5. I am not sure which part of my code is wrong and that accuracy doesn't increase.

Here is my code:

import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics import accuracy_score

x_labelled = np.random.randint(0,10,(3,2))
x_unlabelled = np.random.randint(0,10,(97,2)) 
y_labelled = np.random.randint(0,2,3)
y_unlabelled = np.random.randint(0,2,97)
w1 = 1 / (euclidean_distances(x_labelled, x_unlabelled) + 0.1)
w2 = 1 / (euclidean_distances(x_unlabelled, x_unlabelled) + 0.1)


def grad_func(y_labelled, y_unlabelled):
    
    grad = np.sum(w1* ( y_unlabelled.reshape(1, -1) - y_labelled.reshape(-1, 1))) +\
    np.sum(w2* ( y_unlabelled.reshape(1, -1) - y_unlabelled.reshape(-1, 1)))
    return grad    

def gradient_descent(max_iterations = 1000, learning_rate = 0.00001)
     
    # Initializing y_unlabelled, learning rate and iterations
    current_y = np.random.uniform(0,1,97)
     
    for i in range(max_iterations):
   
        # Calculating the gradients
        y_derivative = grad_func(y_labelled,current_y)
        
         
        # Updating current_y 
        current_y = current_y - (learning_rate * y_derivative)
        
                 
        # Printing the parameters for each iteration
        print(f"Iteration {i+1}:  y {current_y}")
     

    current_y[current_y>=0.5] = 1 #if your y's are >=0.5, insert them in class 1
    current_y[current_y<0.5] = 0  #if your y's are <0.5, insert them in class 0

        
    print("accuracy",accuracy_score(y_unlabelled, current_y)) 

    return current_y
 
gradient_descent()

1

There are 1 best solutions below

2
dankal444 On

You created problem that is impossible to solve by any classifier. Your classes are from the same exact distribution. Its like you want it to distinguish two identical twins.

Below I added two lines that make that problem a bit easier for classifier - it changes distribution of class 0. I got accuracy around 60%.

x_labelled = np.random.randint(0, 10, (3, 2))
x_unlabelled = np.random.randint(0, 10, (97, 2))
y_labelled = np.random.randint(0, 2, 3)
y_unlabelled = np.random.randint(0, 2, 97)

x_labelled[y_labelled == 0] -= 1
x_unlabelled[y_unlabelled == 0] -= 1