Using Scrapy, getting "Error: ImportError: No module named testspiders.spiders.followall"

I am trying to run Scrapy from a script and am following the tutorial here . I am running the error message that states Error: ImportError: No module named testspiders.spiders.followall

. I've searched for a solution but haven't found a match yet.

I am actually running this python script through node.js which has a module named python-shell , which simply allows you to run the python script using the following simple code:

var PythonShell = require('python-shell');

PythonShell.run('my_script.py', function (err) {
  if (err) throw err;
  console.log('finished');
});

      

Verbatim, my code is copied from scrapy site:

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from testspiders.spiders.followall import FollowAllSpider
from scrapy.utils.project import get_project_settings

spider = FollowAllSpider(domain='scrapinghub.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() 

      

My directory structure was only changed from the express framework by adding a directory and python file, and a few lines of code that uses the python shell:

-python-node
    -bin
    -node_modules
    -public
    -python 
        -my_script.py
    -routes
    -views
    -app.js
    -package.json 

      

NOTE. It also doesn't work if I go into the python directory and run python my_script.py

and I get the same error message:ImportError: No module named testspiders.spiders.followall

+3


source to share


1 answer


when you run the crawler with scrapy

, then the root scraper dir (parent directory testpiders /) is automatically added to the path. When running a script with, python

it is not. You have a working directory and everything defined in PATH and PYTHONPATH.

You can check the current path in python with sys.path



So, to make the import instructions work with python

, you can:

  • add testpiders / parent dir to the path using sys.path.append () (you need to do this before import testspiders ...)
  • add parent directory to PYTHONPATH system variable
  • execute command python

    from parent directory testpiders /
  • edit import operations (so they work according to your PATH)
+3


source







All Articles