What's the easiest way to quickly check Scrapy behavior / error?
I sometimes try to solve this StackOverflow question but I usually don't test my ideas as I don't know how to quickly get it done without setting up the whole Scrapy project and parsing a real web page.
What's the fastest way to test problems / solutions with a standalone example file and without having to create a whole new scrapy project?
source to share
To run a spider from one file
If your spider doesn't depend on pipelines or any common stuff used in Scrapy projects, one idea is to create a standalone file and start the spider with the command:
scrapy runspider file_with_my_spider.py
Scrapy will look for the first spider in the file (the class extending scrapy.Spider
or its derivative scrapy.CrawlSpider
) and launch it.
If you are trying to isolate the spider code that is originally in a Scrapy project, you will also have to copy the element classes and any other dependencies to that single file.
To run a spider for a test site
For offline testing, you can replicate the site structure by putting HTML pages in a directory and then running python -m SimpleHTTPServer
on it: this will start a local server on http://localhost:8000/
which you can run from it.
To make it easy to decide when you want to work with a local server and a real site, you can make your spider look like this:
import scrapy
class MySpider(scrapy.Spider):
name = 'my-spider'
start_urls = ['http://www.some-real-site-url.com']
def __init__(self, start_url=None, *args, **kwargs):
if start_url:
self.start_urls = [start_url]
...
With this in your spider, you will be able to:
scrapy runspider file_with_my_spider.py -a start_url=http://localhost:8000/
to run a spider on a site displayed on a local server.
source to share