Scrapy: If the key exists, why am I getting a KeyError?
C items.py
determined:
import scrapy
class CraigslistSampleItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
and populating each element through a spider, thus:
item = CraigslistSampleItem()
item["title"] = $someXpath.extract()
item["link"] = $someOtherXpath.extract()
When I add them to the list (returned by parse ()) and save it like csv, I get two columns of data, header and links, as expected. If I comment out the XPath for the reference and store as csv, I still get two columns of data and the values in the reference column are empty strings. This seems to make sense since both titles and links are attributes of every CraigslistSampleItem class. I would have thought that I could do something like this (with XPath for reference still commented):
if item["link"] == '':
print "link has not been given a value"
However, trying to get the link attribute for each element fails like this:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/item.py", line 50, in __getitem__
return self._values[key]
exceptions.KeyError: 'link'
If every instance of an element does indeed have a value for the link (albeit an empty string), why can't I access that key?
source to share
The Scrapy Item
class provides a dictionary-like interface for storing the extracted data. No default values have been set for element fields.
To check if a field has been set or not, just check the field key in the element instance:
if 'link' not in item:
print "link has not been given a value"
Demo:
In [1]: import scrapy
In [2]: class CraigslistSampleItem(scrapy.Item):
...: title = scrapy.Field()
...: link = scrapy.Field()
...:
In [3]: item = CraigslistSampleItem()
In [4]: item["title"] = "test"
In [5]: item
Out[5]: {'title': 'test'}
In [6]: "link" in item
Out[6]: False
In [7]: item["link"] = "test link"
In [8]: "link" in item
Out[8]: True
source to share