Petting Zoo Classic Environments

240 Views Asked by At

I am currently trying to implement my own version of a Connect Four Environment based on the version available on the PettingZoo Library github (https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/classic/connect_four/connect_four.py).

From their documentation, in the page of the classic environments (https://pettingzoo.farama.org/environments/classic/) there is written the following thing:

" Most [classic] environments only give rewards at the end of the games once an agent wins or losses, with a reward of 1 for winning and -1 for losing. "

It is not clear to me how to model the learning for non-terminating states, if the reward signal (on which I guess the whole learning of the agents is based) occurs only for terminating states.

I thought to modify the setup by allowing the environment to emit rewards at each turn, something like:

  • +1 for each (non-terminating) step of the game

  • +100 for a winning state

  • 0 for a draw

  • -100 for illegal moves (and quitting the current game/episode) However, this setup would require very high exploratory rates for a $\epsilon$-greedy agent, given my current setup. This is because, for each state that has just been observed, the agent takes a random move and, if the state is not terminating, it will assign a state-action value of 1 to the just taken action, and zero for all the others. Otherwise, the agent will always pick the already taken action with very high probability, thus not allowing actual learning...

I am not so syure on how to solve this problem, as allowing for very high exploratory rates doesnt seem to me to be a good choice... My code is available on https://github.com/FMGS666/RLProject

Probably i should use the same setup as theirs in the github repo, but i didnt really quite understand how to do it for the aforementioned problem.

Probably im missing something important, but thank you very much for the help anyway!

0

There are 0 best solutions below