I have multiple pre-trained neural networks with the same architecture and different weights. I want to take a weighted average of the weights in these networks to make one network of the same size (in an attempt to improve generalizability without sacrificing size).
To be clear: I only want to learn the weights for the average: NOT the weights inside the pre-trained networks.
This is what I have so far:
class Ensemble(layers.Layer):
def __init__(self, modelWeights, model):
super().__init__()
self.modelWeights = modelWeights
self.model = model
self.w = self.add_weight(f'Weights', shape = (len(modelWeights),), initializer = initializers.Constant(1 / len(modelWeights)))
self.b = self.add_weight('Bias', shape = (1,), initializer = 'zeros')
def call(self, inputs):
newWeights = []
for weightsTuple in zip(*self.modelWeights):
temp = []
for weights in zip(*weightsTuple):
weights = tf.convert_to_tensor(weights)
temp += [tf.tensordot(weights, self.w, axes = [[0], [0]]) + self.b]
newWeights += [temp]
self.model.set_weights(newWeights)
return self.model(inputs)
modelWeights is a list of model.get_weights()
Besides the error I'm currently getting (ValueError: Layer model weight shape (3, 4, 64) is not compatible with provided weight shape ()) I don't think keras is going to let me do self.model.set_weights(newWeights) inside the call function.
Does anyone have a better way to do this?
Thanks in advance
I hope I get your idea, otherwise correct me.
To average out the weights of multiple trained models you can do the following (example with 3 models):
Edit: The only other approach I can think of is to freeze all models and connect their output to a
Dense(1, use_bias=False)node, with a kernel constrain that all weights has to be between 0 and 1 and sum up to 1. This way a model could potentially learn how much say has a network in the final decision. But this will not merge the network weights together and only weights the outputs.