Petting Zoo Classic Environments

239 Views Asked by Lorenzo CONSOLI At 18 August 2025 at 05:58

I am currently trying to implement my own version of a Connect Four Environment based on the version available on the PettingZoo Library github (https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/classic/connect_four/connect_four.py).

From their documentation, in the page of the classic environments (https://pettingzoo.farama.org/environments/classic/) there is written the following thing:

" Most [classic] environments only give rewards at the end of the games once an agent wins or losses, with a reward of 1 for winning and -1 for losing. "

It is not clear to me how to model the learning for non-terminating states, if the reward signal (on which I guess the whole learning of the agents is based) occurs only for terminating states.

I thought to modify the setup by allowing the environment to emit rewards at each turn, something like:

+1 for each (non-terminating) step of the game
+100 for a winning state
0 for a draw
-100 for illegal moves (and quitting the current game/episode) However, this setup would require very high exploratory rates for a $\epsilon$-greedy agent, given my current setup. This is because, for each state that has just been observed, the agent takes a random move and, if the state is not terminating, it will assign a state-action value of 1 to the just taken action, and zero for all the others. Otherwise, the agent will always pick the already taken action with very high probability, thus not allowing actual learning...

I am not so syure on how to solve this problem, as allowing for very high exploratory rates doesnt seem to me to be a good choice... My code is available on https://github.com/FMGS666/RLProject

Probably i should use the same setup as theirs in the github repo, but i didnt really quite understand how to do it for the aforementioned problem.

Probably im missing something important, but thank you very much for the help anyway!

Original Q&A

Petting Zoo Classic Environments

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in REINFORCEMENT-LEARNING

Related Questions in PETTINGZOO

Trending Questions

Popular # Hahtags

Popular Questions