I am trying to implement gradient descent on a dataset. Even though I tried everything, I couldn't make it work. So, I created a test case. I am trying my code on a random data and try to debug.
More specifically, what I am doing is, I am generating random vectors between 0-1 and random labels for these vectors. And try to over-fit the training data.
However, my weight vector gets bigger and bigger in each iteration. And then, I have infinities. So, I do not actually learn anything. Here is my code:
import numpy as np
import random
def getRandomVector(n):
return np.random.uniform(0,1,n)
def getVectors(m, n):
return [getRandomVector(n) for i in range(n)]
def getLabels(n):
return [random.choice([-1,1]) for i in range(n)]
def GDLearn(vectors, labels):
maxIterations = 100
stepSize = 0.01
w = np.zeros(len(vectors[0])+1)
for i in range(maxIterations):
deltaw = np.zeros(len(vectors[0])+1)
for i in range(len(vectors)):
temp = np.append(vectors[i], -1)
deltaw += ( labels[i] - np.dot(w, temp) ) * temp
w = w + ( stepSize * (-1 * deltaw) )
return w
vectors = getVectors(100, 30)
labels = getLabels(100)
w = GDLearn(vectors, labels)
print w
I am using LMS for loss function. So, in all iterations, my update is the following,
where w^i is the ith weight vector and R is the stepSize and E(w^i) is the loss function.
Here is my loss function. (LMS)
and here is how I derivated the loss function,
,
Now, my questions are:
- Should I expect good results in this random scenario using Gradient Descent? (What is the theoretical bounds?)
- If yes, what is my bug in my implementation?
PS: I tried several other maxIterations
and stepSize
parameters. Still not working.
PS2: This is the best way I can ask the question here. Sorry if the question is too specific. But it made me crazy. I really want to learn the problem.
Your code has a couple of faults:
GetVectors()
method, you did not actually use the input variablem
;GDLearn()
method, you have a double loop, but you use the same variablei
as the loop variables in both loops. (I guess the logic is still right, but it's confusing).labels[i] - np.dot(w, temp)
) has the wrong sign.Here is my revised code based on your original code.
Running result -- you can see the cost is decreasing with each iteration but with a diminishing return.