Need help with this regex

I am new to scrapy, I am trying to crawl a site with CrawlSpider, I want it to crawl it recursively based on the Next button. But it doesn't work. I think the problem is coming from the regex, but I've checked so many times, I can't find the error. It only scans the landing page without going to the next page.

# -*- coding: utf-8 -*-

start_urls = ['https://shopping.yahoo.com/merchantrating/?mid=13652']

rules = (
    Rule(LinkExtractor(allow = "/merchantrating/;_ylt=Anf3hF19R8MGFPwuYuJUny4cEb0F\?mid=13652&sort=1&start=\d+"), callback = 'parse_start_url', follow = True),
)

def parse_start_url(self, response):
    sel = Selector(response)
    contents = sel.xpath('//p')
    for content in contents:
        item = BedbugsItem()
        item['pageContent'] = content.xpath('text()').extract()
        self.items.append(item)
    return self.items

      

+3


source to share


1 answer


Use XPath instead:



rules = (
    Rule(LinkExtractor(
        restrict_xpaths = [
            "//div[@class='pagination']//a[contains(., 'Next')]"
        ]),
    callback = 'parse_start_url',
    follow = True),
)

      

0


source







All Articles