Keras-RL2 and Tensorflow 1-2 Incompatibility

415 Views Asked by At

I am getting;

tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: Using a symbolic `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

Error while I'm trying to fit DDPG agent over custom environment.

Here is the CustomEnv()

class CustomEnv(Env):
def __init__(self):
    print("Test_3 : Init")
    """NOTE: Bool array element definition for Box action space needs to be determined !!!!"""
    self.action_space = Tuple((Box(low=4, high=20, shape=(1, 1)),
                               Box(low=0, high=1, shape=(1, 1)),
                               MultiBinary(1),
                               MultiBinary(1),
                               Box(low=4, high=20, shape=(1, 1)),
                               Box(low=0, high=1, shape=(1, 1)),
                               MultiBinary(1),
                               MultiBinary(1),
                               Box(low=0, high=100, shape=(1, 1)),
                               Box(low=0, high=100, shape=(1, 1))))
    """Accuracy array"""
    self.observation_space = Box(low=np.asarray([0]), high=np.asarray([100]))
    """Initial Space"""
    self.state = return_Acc(directory=source_dir, input_array=self.action_space.sample())
    self.episode_length = 20
    print(f"Action Space sample = {self.action_space.sample()}")
    print("Test_3 : End Init")



def step(self, action):
    print(f"Model Action Space Output = {action}")
    print("Test_2 : Step")
    accuracy_of_model = random.randint(0,100)#return_Acc(directory=source_dir, input_array=action)
    self.state = accuracy_of_model#round(100*abs(accuracy_of_model))
    self.episode_length -= 1
    # Calculating the reward
    print(f"self.state = {self.state}, accuracy_of_model = {accuracy_of_model}")
    if (self.state > 60):
        reward = self.state
    else:
        reward = -(60-self.state)*10

    if self.episode_length <= 0:
        done = True
    else:
        done = False
    # Setting the placeholder for info
    info = {}
    # Returning the step information
    print("Test_2 : End Step")
    return self.state, reward, done, info


def reset(self):
    print("Test_1 : Reset")
    self.state = 50
    print(f"Self state = {self.state}")
    self.episode_length = 20
    print("Test_1 : End Reset")
    return self.state

return_Acc function runs a Random Decision Forrest Model and return it's accuracy to DDPG model for determining next step's parameters. For the last my DDPG model as given below;

states = env.observation_space.shape
actions = np.asarray(env.action_space.sample()).size

print(f"states = {states}, actions = {actions}")

def model_creation(states, actions):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Dense(32, activation='relu', input_shape=states))
    model.add(tf.keras.layers.Dense(24, activation='relu'))
    model.add(tf.keras.layers.Dense(actions, activation='linear'))
    model.build()
    return model

model = model_creation(states, actions)
model.summary()



def build_agent(model, actions, critic):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=50000, window_length=1)
    nafa = DDPGAgent(nb_actions=actions, actor=model, memory=memory, critic=critic, critic_action_input=action_input)
    #dqn = DQNAgent(model=model, memory=memory, policy=policy,
     #             nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
    return nafa

action_input = Input(shape=(actions,), name='action_input')
observation_input = Input(shape=(1,) + env.observation_space.shape, name='observation_input')

flattened_observation = Flatten()(observation_input)
x = Concatenate()([action_input, flattened_observation])
x = Dense(32)(x)
x = Activation('relu')(x)
x = Dense(32)(x)
x = Activation('relu')(x)
x = Dense(32)(x)
x = Activation('relu')(x)
x = Dense(1)(x)
x = Activation('linear')(x)
critic = Model(inputs=[action_input, observation_input], outputs=x)
print(critic.summary())

dqn = build_agent(model, actions, critic)
dqn.compile(tf.keras.optimizers.Adam(learning_rate=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=200, visualize=False, verbose=1)

results = dqn.test(env, nb_episodes=500, visualize=False)
print(f"episode_reward = {np.mean(results.history['episode_reward'])}")

I tried most of the solutions that I found here like

tf.compat.v1.enable_eager_execution()

and combination of this with other functions. (Such as enable_v2_behaviour()) But I couldn't able to make this worked. If I don't run RDF model inside DDPG then there is no problem occurring. If it's possible how can I connect RDf model accuracy output to self.state as an input.

keras-rl2                   1.0.5
tensorflow-macos            2.10.0

And I'm using M1 based mac if that's matter.

1

There are 1 best solutions below

0
On

To anyone interested with the solution I came up with a slower but at least working solution. It's actually simpler than expected. Just insert a command which runs the model script from terminal and write its output to a text file, than read that text file from RL agent script and again write the action space values to a text file which then can be red from model to create observation.