SOLVED with updated codes: Scrapy cannot clear second page with itemloader
Update: 7/29 at 22:06. Issue resolved with updated codes
Update: 7/29, 9:29 PM: After reading this post, I updated my codes.
UPDATE: 7/28/15 at 7:35 pm, following Martin's suggestion, the post has changed, but still there is no item listing or database entry.
ORIGINAL: I can successfully clear one page (base page). Now I tried to clear one of the items from another url found from the "base" page using the Query and Callback command. But it doesn't work. The spider is here:
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy import Request
import re
from datetime import datetime, timedelta
from CAPjobs.items import CAPjobsItem
from CAPjobs.items import CAPjobsItemLoader
class CAPjobSpider(Spider):
name = "naturejob3"
download_delay = 2
#allowed_domains = ["nature.com/naturejobs/"]
start_urls = [
"http://www.nature.com/naturejobs/science/jobs?utf8=%E2%9C%93&q=pathologist&where=&commit=Find+Jobs"]
def parse_subpage(self, response):
il = response.meta['il']
location = response.xpath('//div[@id="extranav"]//ul[@class="job-addresses"]/li/text()').extract()
il.add_value('loc_pj', location)
yield il.load_item()
def parse(self, response):
hxs = Selector(response)
sites = hxs.xpath('//div[@class="job-details"]')
for site in sites:
il = CAPjobsItemLoader(CAPjobsItem(), selector = site)
il.add_xpath('title', 'h3/a/text()')
il.add_xpath('post_date', 'normalize-space(ul/li[@class="when"]/text())')
il.add_xpath('web_url', 'concat("http://www.nature.com", h3/a/@href)')
url = il.get_output_value('web_url')
yield Request(url, meta={'il': il}, callback=self.parse_subpage)
The scraper is now fully functional. :)
source to share
You initialize ItemLoader
like this:
il = CAPjobsItemLoader(CAPjobsItem, sites)
The documentation does it like this:
l = ItemLoader(item=Product(), response=response)
So, I think you are missing the parentheses on CAPjobsItem
, and your line should read:
il = CAPjobsItemLoader(CAPjobsItem(), sites)
source to share