How to clamp output of nueron in pytorch

1.5k Views Asked by At

I am using simple nn linear model(20,64,64,2) for deep reinforcement learning. This model I am using to approximate the policy gradients by the PPO algorithm. Hence the output layer gives 2 values, which are mean and standard deviation. These parameters are further used in the environment to sample more data. For the environment, the parameter values should be between threshold. mean = [max, min] and std = [max, min].

While training, after some iteration, the parameter values of the output layer increases suddenly and because of that the environment fails to sample more data. The environment uses the output from the model and I can not change in environment to make it stable. Hence, is there any way to restrict the parameter value for a threshold or to clamp. (The loss is calculated from the data sampled by environment and then the back propagation)

The sample code for the model you can find below:

    def forward(self, x):
        # feed forwards to layers
        x = F.relu(self.linear_0(x))
        x = F.relu(self.linear_1(x))
        return self.linear_2(x)

I have tried the following structure...

    def forward(self, x):
        # feed forwards to layers
        x = F.relu(self.linear_0(x))
        x = F.relu(self.linear_1(x))
        x = self.linear_2(x)
        x[0] = torch.clamp(x[0], min=min, max=max)
        x[1] = torch.clamp(x[1], min=min, max=max)
        return x

but it gives an error while back propagating:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [499, 2]], which is output 0 of SelectBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Thanks in advance!

0

There are 0 best solutions below