NameError: name "hxs" is not defined when using Scrapy

I launched the Scrapy shell and pinged Wikipedia successfully.

scrapy shell

I am sure this step is correct judging by the verbal nature of Scrapy's answer.

Next, I would like to see what happens when I write'/html').extract()

At this point, I am getting the error:

NameError: name 'hxs' is not defined

What is the problem? I know Scrapy is installed fine, it accepted URL for destination, but why is there a problem with the command hxs



source to share

3 answers

I suspect you are using a version of Scrapy that no longer has hxs

a wrapper.

Use sel

instead (deprecated after 0.24, see below):

$ scrapy shell
>>> sel.xpath('//title/text()').extract()[0]
u'Wikipedia, the free encyclopedia'


Or, as of Scrapy 1.0, you have to use the Selector object response

, with its convenience methods, .xpath

and .css


$ scrapy shell
>>> response.xpath('//title/text()').extract()[0]
u'Wikipedia, the free encyclopedia'


FYI, quote from Using Selectors in Scrapy Documentation:

... after loading the shell, you will receive the response as a shell variable response

and its attached selector in the attribute response.selector

Queries for answers using XPath and CSS are so common that answers include two convenient combinations: response.xpath()

and response.css()


>>> response.xpath('//title/text()')

[<Selector (text) xpath=//title/text()>]

>>> response.css('title::text')

[<Selector (text) xpath=//title/text()>]



You must use verbose nature of Scrapy response.

$ scrapy shell


if your verbose looks like this:

2014-09-20 23:02:14-0400 [scrapy] INFO: Scrapy 0.14.4 started (bot: scrapybot)
2014-09-20 23:02:14-0400 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Enabled item pipelines: 
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Telnet console listening on
2014-09-20 23:02:15-0400 [scrapy] DEBUG: Web service listening on
2014-09-20 23:02:15-0400 [default] INFO: Spider opened
2014-09-20 23:02:15-0400 [default] DEBUG: Crawled (200) <GET> (referer: None)
[s] Available Scrapy objects:
[s]   hxs        <HtmlXPathSelector xpath=None data=u'<html lang="en" dir="ltr" class="client-'>
[s]   item       {}
[s]   request    <GET>
[s]   response   <200>
[s]   settings   <CrawlerSettings module=None>
[s]   spider     <BaseSpider 'default' at 0xb5d95d8c>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser
Python 2.7.6 (default, Mar 22 2014, 22:59:38) 
Type "copyright", "credits" or "license" for more information.


your detail will display Available Scrapy objects

therefore hxs

or sel

depends on what you show in your details. hxs

Not available for your case , so you will need to use 'sel' (newer version with scrappy). So it is hxs

ok for some and others sel

is what they will need to use



The "sel" shortcut is deprecated, you must use response.xpath ('/ html'). extract ()



All Articles