The best way to test multiple tasks in celery?
After some research I found more problems ... Below is more Emxaple detail:
- Load the list of urls, set them job_and_id (you need to create a dynamic queue name to clean up).
- use Celery tasks to crawl each url like
extract.delay(job_id, url)
and save to db. - (there are probably many jobs here --- job1, job2, job3) all tasks in jobs are the same
extract
, but only one worker process - all queues (How? I can't specify all the queue name for a worker) - check db
select count(id) in xxx where job_id = yyy
equallen(urls)
, or else as celery told mejob_id yyy
. - show this job status (running or completed) on the website, can clear the job queue on the internet.
I have never met this situation, does celery have an easy way to solve my problem?
I need to add work dynamically, one task contains many tasks. All tasks are the same. How can I make different jobs, has different queue name, but only one worker process - all queues? Programmatically.
source to share
I don't know the details of your web application, but it might be pretty simple.
(using Django syntax)
you can create two DB models / tables. One of them represents your party. And one to represent each job URL
class ScrapeBatch(models.Model):
id = models.IntegerField()
class ScrapeJob(models.Model):
batch = models.ForeignKey(ScrapeBatch)
url = models.CharField(max_length=100) # for example
done = models.BooleanField(default=False)
Then when you run celery tasks use the model ScrapeJob
as reference
def scrape_url_celery_task(job_id):
job = ScrapeJob.objects.get(id=job_id)
scrape_url(job)
job.done=True
job.save()
So, in your webview, you can simply check if all your batch jobs have completed or not:
def batch_done(batch):
return not batch.scrapejob_set.filter(done=False).exists()
So, in a nutshell: - DB table containing your urls - DB table for storing something like batch number (with foreign key relationships to your urls table)
- celery marks are marked as scraping in the database after the task is completed
- A simple lookup in the URL table indicates whether the job is running. You can show this value on the website
source to share