What's the easiest way to quickly check Scrapy behavior / error?

Question

What's the easiest way to quickly check Scrapy behavior / error?

I sometimes try to solve this StackOverflow question but I usually don't test my ideas as I don't know how to quickly get it done without setting up the whole Scrapy project and parsing a real web page.

What's the fastest way to test problems / solutions with a standalone example file and without having to create a whole new scrapy project?

+3

python scrapy

Aufziehvogel 04 dec. 14 at 18:51

source to share

1 answer

elias · Accepted Answer · 2014-12-04T23:39:01+0000

To run a spider from one file

If your spider doesn't depend on pipelines or any common stuff used in Scrapy projects, one idea is to create a standalone file and start the spider with the command:

scrapy runspider file_with_my_spider.py

Scrapy will look for the first spider in the file (the class extending scrapy.Spider

or its derivative scrapy.CrawlSpider

) and launch it.

If you are trying to isolate the spider code that is originally in a Scrapy project, you will also have to copy the element classes and any other dependencies to that single file.

To run a spider for a test site

For offline testing, you can replicate the site structure by putting HTML pages in a directory and then running python -m SimpleHTTPServer

on it: this will start a local server on http://localhost:8000/

which you can run from it.

To make it easy to decide when you want to work with a local server and a real site, you can make your spider look like this:

import scrapy

class MySpider(scrapy.Spider):
    name = 'my-spider'
    start_urls = ['http://www.some-real-site-url.com']

    def __init__(self, start_url=None, *args, **kwargs):
        if start_url:
            self.start_urls = [start_url]

    ...

With this in your spider, you will be able to:

scrapy runspider file_with_my_spider.py -a start_url=http://localhost:8000/

to run a spider on a site displayed on a local server.

What's the easiest way to quickly check Scrapy behavior / error?

To run a spider from one file

To run a spider for a test site

More articles: