Processing computationally intensive tasks in the Django webapp
I have a desktop application that I am porting to Django webapp. The app has some pretty computationally intensive parts (using numpy, scipy and pandas, among other libraries). Obviously, importing computationally intensive code into a webapp and running it is not a great idea, as it will keep the client waiting for a response.
So you'll have to bundle these tasks into a background process that notifies the client (via AJAX, I think) and / or stores the results in the database when finished.
You also don't want all of these tasks to run at the same time in the case of multiple concurrent users, as this is a great way to bring your server to its knees even with a small number of concurrent requests. Ideally, you want each instance of your web client to place their tasks in a job queue and then automatically execute them in an optimal way (based on the number of cores, available memory, etc.).
Are there any good Python libraries for solving this problem? Are there general strategies that people use in these situations? Or is it just a matter of choosing a good batch scheduler and creating a new Python interpreter for each process?
source to share
We've developed a Django web application that does heavy computation (each process will take 11 to 88 hours for top-tier servers).
Celery: Celery is an asynchronous task queue / task queue based on distributed message passing. It is focused on real-time work, but also supports scheduling.
Celery offers
- Execute tasks asynchronously.
- Distributed execution of expensive processes.
- Periodic and / or scheduled tasks.
- Re-performing tasks if something goes wrong.
This is just the tip of the iceberg. There are many possibilities that celery has to offer. Take a look at the documentation and FAQ .
You also need to create a very nice canvas for your workflow . For example, you don't want all tasks to run at the same time in case of multiple concurrent users, as this is resource consumption. Also you can schedule tasks based on users who are currently online.
Also you need very good database design, efficient algorithms, etc.
source to share