How to Quickly Create X Tasks in Google App Engine
We're pushing alerts from GAE and saying we need to issue 50,000 alerts on CD2M (Messaging 2 Device Messaging). For this we:
- Read Anyone Who Wants To Receive Alerts From The Data Warehouse
- Scroll and create a "push task" for each notification
The problem is that it takes a while to create the issue, so it doesn't scale as the user base grows. In my experience, we get 20-30 seconds just creating tasks when there are many. The reason for one task, etc. push message is that we can repeat the task if something fails and it will only affect one subscriber. Also C2DM only supports sending to one user at a time.
Will it be faster if we:
- Read Anyone Who Wants To Receive Alerts From The Data Warehouse
- Scroll down and create a "pool task" for every 100 subscribers.
- Each Pool task will generate 100 push tasks when executed
The task execution is very fast, so in our scenario it seems that creating tasks is a bottleneck and not performing tasks. This is why I thought about this scenario in order to be able to increase the parallelism of the application. I would guess that this would lead to faster execution, but again I may be wrong :-)
source to share
We do something similar with APNS (Apple Push Notification Server): we create a task for a series of notifications at a time (= pool task, as you call it). When the task is done, we iterate over the packet and send it to the server.
The difference with your setup is that we have a separate server for push communication as APNS only supports socket communication.
The only drawback is if there is an error, then the whole task will be repeated, and some users may receive two notifications.
source to share
It sounds like it depends on the number of alerts you need to send, how long it takes to send each alert, and the number of active instances you have.
My guess is that it takes a few milliseconds to tens of milliseconds to send a CD2M alert, while it takes a few seconds to speed up an instance, so you can probably issue a few hundred or several thousand alerts to justify another instance of the task. The ratio of the time it takes to send each CD2M message to the time it takes to start the instance will determine the number of messages you want to send in one task.
If you already have a sufficient number of instances, although you have no delay waiting for instances to appear.
By the way, this looks like a perfect application of the MapReduce API. Basically it does what you describe in the second version, except that it takes your original query and breaks it down into subqueries, each of which returns a "page" of the result set. The task runs for each subquery that processes all the elements on its "page". This is an improvement from what you are describing because you don't have to spend time looping through the original result set.
I suppose the default implementation for the MapReduce API just queries all objects of a particular type (i.e. all user objects), but you can change the filter used.
source to share