PPO algorithm converges on only one action

1k Views Asked by JAYDEEP GHOSE At 29 July 2025 at 17:52

I have taken some reference implementations of PPO algorithm and am trying to create an agent which can play space invaders . Unfortunately from the 2nd trial onwards (after training the actor and critic N Networks for the first time) , the probability distribution of the actions converges on only action and the PPO loss and the critic loss converges on only one value.

Wanted to understand the probable reasons why this might occur . I really cant run the code in my cloud VMs without being sure that I am not missing anything as the VMs are very costly to use . I would appreciate any help or advice in this regarding .. if required I can post the code as well . Hyperparameters used are as follows :

clipping_val = 0.2 critic_discount = 0.5 entropy_beta = 0.001 gamma = 0.99 lambda = 0.95

Original Q&A

There are 1 best solutions below

nsidn98 On 06 May 2020 at 06:24

One of the reasons could be that you are not normalising the inputs to the CNN in the range [0,1] and thus saturating your neural networks. I suggest that you use the preprocess() function in your code to transform your states(inputs) to the network.

def preprocess(self,img):
    width = img.shape[1]
    height = img.shape[0]
    dim = (abs(width/2), abs(height/2))
    resized = cv2.resize(img,(80,105) ) #interpolation = cv2.INTER_AREA)
    resized = resized/255.0 # convert all pixel values in [0,1] range
    resized = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
    resized = resized.reshape(resized.shape+(1,))
    return resized

PPO algorithm converges on only one action

There are 1 best solutions below

Related Questions in ARTIFICIAL-INTELLIGENCE

Related Questions in REINFORCEMENT-LEARNING

Related Questions in POLICY-GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions