The best way to test multiple tasks in celery?

After some research I found more problems ... Below is more Emxaple detail:

  • Load the list of urls, set them job_and_id (you need to create a dynamic queue name to clean up).
  • use Celery tasks to crawl each url like extract.delay(job_id, url)

    and save to db.
  • (there are probably many jobs here --- job1, job2, job3) all tasks in jobs are the same extract

    , but only one worker process - all queues (How? I can't specify all the queue name for a worker)
  • check db select count(id) in xxx where job_id = yyy

    equal len(urls)

    , or else as celery told me job_id yyy

    .
  • show this job status (running or completed) on the website, can clear the job queue on the internet.

I have never met this situation, does celery have an easy way to solve my problem?

I need to add work dynamically, one task contains many tasks. All tasks are the same. How can I make different jobs, has different queue name, but only one worker process - all queues? Programmatically.

+3


source to share


1 answer


I don't know the details of your web application, but it might be pretty simple.

(using Django syntax)

you can create two DB models / tables. One of them represents your party. And one to represent each job URL

class ScrapeBatch(models.Model):
   id = models.IntegerField()

class ScrapeJob(models.Model):
   batch = models.ForeignKey(ScrapeBatch)
   url = models.CharField(max_length=100) # for example
   done = models.BooleanField(default=False)

      

Then when you run celery tasks use the model ScrapeJob

as reference



def scrape_url_celery_task(job_id):
     job = ScrapeJob.objects.get(id=job_id)
     scrape_url(job)
     job.done=True
     job.save()

      

So, in your webview, you can simply check if all your batch jobs have completed or not:

def batch_done(batch):
    return not batch.scrapejob_set.filter(done=False).exists()

      

So, in a nutshell: - DB table containing your urls - DB table for storing something like batch number (with foreign key relationships to your urls table)

  • celery marks are marked as scraping in the database after the task is completed
  • A simple lookup in the URL table indicates whether the job is running. You can show this value on the website
0


source







All Articles