Celery + event pooling does not improve the speed of asynchronous web requests
As mentioned in the celery docs , the event pool should be faster than the presale pool for I / O events like asynchronous HTTP requests.
They even mention that
"In an informal test with a channel concentrator system, the Eventlet pool could retrieve and process hundreds of channels every second, while the preproduction pool took 14 seconds to process 100 channels."
However, we cannot produce any results like this. By doing the sample tasks , urlopening and crawling exactly as described, and opening thousands of URLs, it looks like the pre-production pool almost always performs better.
We tested all sorts of matches (prefork with concurrency 200, eventlet with matches 200, 2000, 5000). In all these cases, tasks are completed in fewer seconds using the ancestor pool. The running computer is a 2014 MacBook Pro with RabbitMQ server.
We're aiming for thousands of asynchronous HTTP requests at the same time and wondering if it's even worth implementing a thumbnail pool? If so, what are we missing?
The python -V && freeze result is:
Python 2.7.6 amqp==1.4.6 anyjson==0.3.3 billiard==3.3.0.20 bitarray==0.8.1 celery==3.1.18 dnspython==1.12.0 eventlet==0.17.3 greenlet==0.4.5 kombu==3.0.26 pybloom==1.1 pytz==2015.2 requests==2.6.2 wsgiref==0.1.2
The test code used (almost exactly from the docs):
>>> from tasks import urlopen
>>> from celery import group
>>> LIST_OF_URLS = ['http://127.0.0.1'] * 10000 # 127.0.0.1 was just a local web server, also used 'http://google.com' and others
>>> result = group(urlopen.s(url)
... for url in LIST_OF_URLS).apply_async()
source to share
No one has answered this question yet
Check out similar questions: