Implement evolutionary algorithm with TensorFlow

78 Views Asked by At

I am trying to implement training through an evolutionary algorithm for a neural network using TensorFlow and Keras, but I think something is not working with my implementation as it doesn't seem to improve in the task.

Currently, I am using the Tic-Tac-Toe game as the task. Essentially, my algorithm does the following:

It creates 2 models with the same architecture. In my case, these models consist of a positional embedding, a transformer block comprising an attention layer and a dense layer, and another transformer block with an attention layer and an LSTM-type network. Both transformers use causal masking, and their outputs are normalized. Finally, the output passes through a dense layer before the output layer. The output type is a probability distribution for the best move.

Once the "parent" models are created, they are saved as .h5 files. These models predict using the .predict function, receiving a one-dimensional array with the values of the board positions encoded so that 0 corresponds to player 1's positions, empty positions are represented by 1, and positions used by the opponent are represented by 2. At the end of each round, if there is a winner, their settings are saved with .save in an .h5 file, and it is adjusted randomly to create the opponent. The adjustment is conditioned by a penalty score that penalizes some actions such as choosing an occupied box, but ultimately, it is a random adjustment. If they tie or exceed 20 moves, both models are trained.

At first, it seems like they are improving, but almost immediately they plateau. I will leave the code I am using for the adjustment, hoping someone can tell me what is failing.

Code:

def training(model, penalization):
    # Variable initialization
    aleatory_value = tf.random.normal(shape=(1,), mean=0, stddev=3.0)
    learning_rate = 0.025
    if penalization <= 1:
        learning_rate = 0.0025
        aleatory_value = tf.random.normal(shape=(1,), mean=0, stddev=1.0)
    if penalization >= 2:
        learning_rate = 0.4
        aleatory_value = tf.random.normal(shape=(1,), mean=0, stddev=3.0)
      # Generate a random decimal value between -3.0 and 3.0 with 4 decimal places of precision
    penalty = tf.cast(penalization, tf.float32)  # Convert the penalty to float32
    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.Dense):
            for neuron in layer.weights:
                current_weight = neuron.numpy()
                adjusted_weight = current_weight * (1 - learning_rate * penalty) + learning_rate * aleatory_value
                neuron.assign(adjusted_weight)
    model.save("model1.h5")

I'm not sure if this method really works for anything since I built it from scratch, and consider that for the real task, it is impossible to obtain a labeled dataset.

0

There are 0 best solutions below