I am using the 3DBall example environment, but I am getting some really weird results that I don't understand why they are happening. My code so far is just a for range loop that views the reward and fills in the inputs needed with random values. However when I was doing it, never a negative reward was shown, and randomly there would be no decision steps, which would make sense, but shouldn't it just keep on simulating until there is a decision step? Any help would be greatly appreciated as other then the documentation there are little to no recourses out there for this.
env = UnityEnvironment()
env.reset()
behavior_names = env.behavior_specs
for i in range(50):
arr = []
behavior_names = env.behavior_specs
for i in behavior_names:
print(i)
DecisionSteps = env.get_steps("3DBall?team=0")
print(DecisionSteps[0].reward,len(DecisionSteps[0].reward))
print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not
for i in range(len(DecisionSteps[0])):
arr.append([])
for b in range(2):
arr[-1].append(random.uniform(-10,10))
if(len(DecisionSteps[0])!= 0):
env.set_actions("3DBall?team=0",numpy.array(arr))
env.step()
else:
env.step()
env.close()
I think that your problem is that when the simulation terminates and needs to be reset, the agent does not return a
decision_step
but rather aterminal_step
. This is because the agent has dropped the ball and the reward returned in the terminal_step will be -1.0. I have taken your code and made some changes and now it runs fine (except that you probably want to change so that you don't reset every time one of the agents drops its ball).