Java Google App Engine Task Queue freezes / stalls

386 Views Asked by At

I have been using task queues to simulate usage in my app. From time to time, the task queue will stall and some tasks in the queue will get stuck. The ETA of the tasks that are frozen can be up to a few minutes ago.

Here are the logs of an example task queue to register 50 users. Notice that after executing 11 tasks, there is a 2 minute pause until the last 39 are executed. There are no errors or retries in any of the tasks.

 I 2014-11-19 17:16:08.436 200 0 B 2301ms /registerWorker
 I 2014-11-19 17:16:08.727 200 0 B 2661ms /registerWorker
 I 2014-11-19 17:16:08.728 200 0 B 2800ms /registerWorker
 I 2014-11-19 17:16:08.729 200 0 B 2663ms /registerWorker
 I 2014-11-19 17:16:08.730 200 0 B 2667ms /registerWorker
 I 2014-11-19 17:16:08.731 200 0 B 2662ms /registerWorker
 I 2014-11-19 17:16:08.731 200 0 B 2666ms /registerWorker
 I 2014-11-19 17:16:08.883 200 0 B 2748ms /registerWorker
 I 2014-11-19 17:16:08.940 200 0 B 2875ms /registerWorker
 I 2014-11-19 17:16:08.941 200 0 B 3014ms /registerWorker
 I 2014-11-19 17:16:09.070 200 0 B 630ms /registerWorker
 I 2014-11-19 17:18:09.132 200 0 B 1116ms /registerWorker
 I 2014-11-19 17:18:09.134 200 0 B 1116ms /registerWorker
 I 2014-11-19 17:18:09.224 200 0 B 1208ms /registerWorker
 I 2014-11-19 17:18:09.227 200 0 B 1210ms /registerWorker
 I 2014-11-19 17:18:09.228 200 0 B 1212ms /registerWorker
 I 2014-11-19 17:18:09.229 200 0 B 1213ms /registerWorker
 I 2014-11-19 17:18:09.231 200 0 B 1213ms /registerWorker
 I 2014-11-19 17:18:09.231 200 0 B 1215ms /registerWorker
 I 2014-11-19 17:18:09.232 200 0 B 1215ms /registerWorker
 I 2014-11-19 17:18:09.233 200 0 B 1216ms /registerWorker
 I 2014-11-19 17:18:12.128 200 0 B 952ms /registerWorker
 I 2014-11-19 17:18:12.135 200 0 B 961ms /registerWorker
 I 2014-11-19 17:18:12.232 200 0 B 1053ms /registerWorker
 I 2014-11-19 17:18:12.233 200 0 B 1057ms /registerWorker
 I 2014-11-19 17:18:12.325 200 0 B 1149ms /registerWorker
 I 2014-11-19 17:18:12.326 200 0 B 1151ms /registerWorker
 I 2014-11-19 17:18:12.327 200 0 B 1152ms /registerWorker
 I 2014-11-19 17:18:12.328 200 0 B 1150ms /registerWorker
 I 2014-11-19 17:18:12.328 200 0 B 1151ms /registerWorker
 I 2014-11-19 17:18:12.329 200 0 B 1154ms /registerWorker
 I 2014-11-19 17:18:13.735 200 0 B 1032ms /registerWorker
 I 2014-11-19 17:18:13.736 200 0 B 1034ms /registerWorker
 I 2014-11-19 17:18:13.737 200 0 B 1035ms /registerWorker
 I 2014-11-19 17:18:13.737 200 0 B 1035ms /registerWorker
 I 2014-11-19 17:18:13.827 200 0 B 1124ms /registerWorker
 I 2014-11-19 17:18:13.828 200 0 B 1126ms /registerWorker
 I 2014-11-19 17:18:13.830 200 0 B 1127ms /registerWorker
 I 2014-11-19 17:18:13.831 200 0 B 1128ms /registerWorker
 I 2014-11-19 17:18:13.834 200 0 B 1131ms /registerWorker
 I 2014-11-19 17:18:14.052 200 0 B 1350ms /registerWorker
 I 2014-11-19 17:18:14.331 200 0 B 544ms /registerWorker
 I 2014-11-19 17:18:14.333 200 0 B 544ms /registerWorker
 I 2014-11-19 17:18:14.334 200 0 B 546ms /registerWorker
 I 2014-11-19 17:18:14.925 200 0 B 731ms /registerWorker
 I 2014-11-19 17:18:14.928 200 0 B 733ms /registerWorker
 I 2014-11-19 17:18:14.929 200 0 B 735ms /registerWorker
 I 2014-11-19 17:18:14.930 200 0 B 736ms /registerWorker
 I 2014-11-19 17:18:14.931 200 0 B 736ms /registerWorker
 I 2014-11-19 17:18:14.937 200 0 B 743ms /registerWorker

My settings for this task as defined in queue.xml are:

  <queue>
    <name>register-user</name>
    <rate>25/s</rate>
    <bucket-size>100</bucket-size>
    <max-concurrent-requests>10</max-concurrent-requests>
    <retry-parameters>
      <task-retry-limit>3</task-retry-limit>
      <task-age-limit>1m</task-age-limit>
    </retry-parameters>
  </queue>

Sometimes all of the tasks execute as fast as expected, and sometimes they don't. Is this an app engine bug?

1

There are 1 best solutions below

7
On

Your tasks take between 1 and 2.5 seconds to execute. You set the concurrent limit at 10 and the execution rate at 25/s. These settings are obviously impossible to achieve with the aforementioned execution time.

Notice how App Engine quickly accepted your first 10 tasks, accepted 11th after a slight pause, then realized that none of the tasks completed and decided to wait. Then it accepted another 10 tasks after this long pause, paused again, accepted 9 more, paused, and finally accepted the final 10.

I do not see any bug here - I see a pretty consistent performance based on your settings.