Using Scrapy, getting "Error: ImportError: No module named testspiders.spiders.followall"
I am trying to run Scrapy from a script and am following the tutorial here . I am running the error message that states Error: ImportError: No module named testspiders.spiders.followall
. I've searched for a solution but haven't found a match yet.
I am actually running this python script through node.js which has a module named python-shell , which simply allows you to run the python script using the following simple code:
var PythonShell = require('python-shell');
PythonShell.run('my_script.py', function (err) {
if (err) throw err;
console.log('finished');
});
Verbatim, my code is copied from scrapy site:
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from testspiders.spiders.followall import FollowAllSpider
from scrapy.utils.project import get_project_settings
spider = FollowAllSpider(domain='scrapinghub.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()
My directory structure was only changed from the express framework by adding a directory and python file, and a few lines of code that uses the python shell:
-python-node
-bin
-node_modules
-public
-python
-my_script.py
-routes
-views
-app.js
-package.json
NOTE. It also doesn't work if I go into the python directory and run python my_script.py
and I get the same error message:ImportError: No module named testspiders.spiders.followall
source to share
when you run the crawler with scrapy
, then the root scraper dir (parent directory testpiders /) is automatically added to the path. When running a script with, python
it is not. You have a working directory and everything defined in PATH and PYTHONPATH.
You can check the current path in python with sys.path
So, to make the import instructions work with python
, you can:
- add testpiders / parent dir to the path using sys.path.append () (you need to do this before import testspiders ...)
- add parent directory to PYTHONPATH system variable
- execute command
python
from parent directory testpiders / - edit import operations (so they work according to your PATH)
source to share