Computing mean of list of generator objects

179 Views Asked by At
// Initialize 2-D array, each entry is some neural network
phi = [[None] * n for _ in range(m)]
for i in range(m):
    for j in range(n):
        phi[i][j] = NeuralNetwork()

// Let k, i be arbitrary indices
p1 = torch.nn.utils.parameters_to_vector(phi[k][i - 1].parameters())
p2 = torch.nn.utils.parameters_to_vector(mean of phi[:][i-1])

I want to basically compute the mean squared error between the parameters phi[k][i-1] and average of the entire column phi[:][i-1] i.e. ((p1 - p2)**2).sum() I tried in the following way:

tmp = [x.parameters() for x in self.phi[:][i - 1]]
mean_params = torch.mean(torch.stack(tmp), dim=0)
p2 = torch.nn.utils.parameters_to_vector(mean_params)

But this doesn't work out because tmp is a list of generator objects. More specifically, I guess my problem is to compute the mean from that generator object.

1

There are 1 best solutions below

0
On BEST ANSWER

First we can define a function that computes the average parameters for a list of models. To avoid creating a copy of the parameters of each model all at the same time we probably want to compute this as a running sum. For example

def average_parameters_vector(model_list):
    n = len(model_list)
    avg = 0 
    for model in model_list:
        avg = avg + torch.nn.utils.parameters_to_vector(model.parameters()) / n 
    return avg 

Then you can just create p1 and p2 and compute the mean-squared error

p1 = torch.nn.utils.parameters_to_vector(phi[k][i - 1].parameters())
p2 = average_parameters_vector(phi[:][i - 1])

mse = ((p1 - p2)**2).mean()

If you really want a one-line solution that's also possibly the fastest you could compute this by making a single tensor containing all the parameters of the models in phi[:][i - 1], then mean reducing them. But as mentioned earlier this will significantly increase memory usage, especially if your models have millions of parameters as is often the case.

# Uses lots of memory but potentially the fastest solution
def average_parameters_vector(model_list):
    return torch.stack([torch.nn.utils.parameters_to_vector(model.parameters()) for model in model_list]).mean(dim=0)

On the other extreme, if you're very concerned about memory usage then you could compute the average of each individual parameter at a time.

# more memory efficient than original solution but probably slower
def average_parameters_vector(model_list):
    n = len(model_list)
    num_params = len(list(model_list[0].parameters()))
    averages = [0] * num_params
    for model in model_list:
        for pidx, p in enumerate(model.parameters()):
            averages[pidx] = averages[pidx] + p.data.flatten() / n 
    return torch.cat(averages)