Scrapy / Python / XPath - How to extract data from data?
I am new to Scrapy and I have just started learning XPath.
I am trying to extract headers and links from html list items in a div. The following code is how I thought I would do it (selecting the ul div by id, then looping through the list items):
def parse(self, response):
for t in response.xpath('//*[@id="categories"]/ul'):
for x in t.xpath('//li'):
item = TgmItem()
item['title'] = x.xpath('a/text()').extract()
item['link'] = x.xpath('a/@href').extract()
yield item
But I got the same results as this attempt:
def parse(self, response):
for x in response.xpath('//li'):
item = TgmItem()
item['title'] = x.xpath('a/text()').extract()
item['link'] = x.xpath('a/@href').extract()
yield item
If the exported csv file contains li data from source top to bottom ...
I am not an expert and I have made several attempts, if anyone could shed some light on this it would be appreciated.
+3
source to share
1 answer
You need to run the xpath expression used in the inner loop with a dot:
for t in response.xpath('//*[@id="categories"]/ul'):
for x in t.xpath('.//li'):
This will force it to search in the area of the current element rather than the entire page.
See Working with Relative XPaths for more explanation .
+5
source to share