Why doesn't Monkey generate the same sequence of events?

121 Views Asked by At

I'm running stress tests on Android using Monkey to generate the workload. For the type of analysis I have to do, I need to generate the same sequence of events for each experiment. So, in every experiment, I always use the adb shell monkey command with the same seed. For instance:

while true; do


adb shell monkey -p com.android.chrome -s 1 --pct-appswitch 100 --ignore-crashes --ignore-timeouts --ignore-security-exceptions --monitor-native-crashes -v -v 1 


sleep 1
adb shell monkey -p com.facebook.katana -s 1 --pct-appswitch 100 --ignore-crashes --ignore-timeouts --ignore-security-exceptions --monitor-native-crashes -v -v 1


sleep 1
adb shell monkey -p com.google.android.apps.maps -s 1 --pct-appswitch 100 --ignore-crashes --ignore-timeouts --ignore-security-exceptions --monitor-native-crashes -v -v 1 

done

The applications I launch are ten in total. However, looking at the measurement data, it seems to me that the system is stressed differently with each run. In particular, I find that the average launch time of applications degrades to a different extent in each experiment. Is everything normal or is it possible that I am doing something wrong?

1

There are 1 best solutions below

2
On

Because the entire point of monkey is to ensure the entire app is tested by randomly causing events, increasing the chance that you'll find unusual test patterns that break your app. That's literally why it's called "monkey"- it emulates a monkey hitting the keyboard (or screen in this case). So this is totally expected. Also expect weird cases that aren't humanly possible- I had plenty of bugs found by monkey that required millisecond precision to bring up.

You can set a seed, but you aren't doing this here- maybe you expected the second -v to be -s? That would use the same random number generator, but would not assure the same behavior- a few milliseconds difference in response by the app could cause touches to come on different screens. It would cause a butterfly effect- as time goes on two test runs would diverge more and more.