Scrapy / Python / XPath - How to extract data from data?

Question

Scrapy / Python / XPath - How to extract data from data?

I am new to Scrapy and I have just started learning XPath.

I am trying to extract headers and links from html list items in a div. The following code is how I thought I would do it (selecting the ul div by id, then looping through the list items):

def parse(self, response):
    for t in response.xpath('//*[@id="categories"]/ul'):
        for x in t.xpath('//li'):
            item = TgmItem()
            item['title'] = x.xpath('a/text()').extract()
            item['link'] = x.xpath('a/@href').extract()
            yield item

But I got the same results as this attempt:

def parse(self, response):
    for x in response.xpath('//li'):
        item = TgmItem()
        item['title'] = x.xpath('a/text()').extract()
        item['link'] = x.xpath('a/@href').extract()
        yield item

If the exported csv file contains li data from source top to bottom ...

I am not an expert and I have made several attempts, if anyone could shed some light on this it would be appreciated.

+3

python xpath web-scraping scrapy

Alex legg 13 Sep 14 at 19:24

source to share

1 answer

alecxe · Accepted Answer · 2014-09-13T19:43:00+0000

You need to run the xpath expression used in the inner loop with a dot:

for t in response.xpath('//*[@id="categories"]/ul'):
    for x in t.xpath('.//li'):

This will force it to search in the area of the current element rather than the entire page.

See Working with Relative XPaths for more explanation .

Scrapy / Python / XPath - How to extract data from data?

More articles: