Scrapy: If the key exists, why am I getting a KeyError?

Question

Scrapy: If the key exists, why am I getting a KeyError?

C items.py

determined:

import scrapy 

class CraigslistSampleItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()

and populating each element through a spider, thus:

item = CraigslistSampleItem()
item["title"] = $someXpath.extract() 
item["link"] = $someOtherXpath.extract()

When I add them to the list (returned by parse ()) and save it like csv, I get two columns of data, header and links, as expected. If I comment out the XPath for the reference and store as csv, I still get two columns of data and the values in the reference column are empty strings. This seems to make sense since both titles and links are attributes of every CraigslistSampleItem class. I would have thought that I could do something like this (with XPath for reference still commented):

  if item["link"] == '':
      print "link has not been given a value"

However, trying to get the link attribute for each element fails like this:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/item.py", line 50, in __getitem__
    return self._values[key]
exceptions.KeyError: 'link'

If every instance of an element does indeed have a value for the link (albeit an empty string), why can't I access that key?

+3

python list key scrapy scrapy-spider

Pyderman May 28 '15 at 15:56

source to share

1 answer

alecxe · Accepted Answer · 2015-05-28T15:59:56+0000

The Scrapy Item

class provides a dictionary-like interface for storing the extracted data. No default values have been set for element fields.

To check if a field has been set or not, just check the field key in the element instance:

if 'link' not in item:
    print "link has not been given a value"

Demo:

In [1]: import scrapy

In [2]: class CraigslistSampleItem(scrapy.Item):
   ...:         title = scrapy.Field()
   ...:         link = scrapy.Field()
   ...:     

In [3]: item = CraigslistSampleItem()

In [4]: item["title"] = "test"

In [5]: item
Out[5]: {'title': 'test'}

In [6]: "link" in item
Out[6]: False

In [7]: item["link"] = "test link"

In [8]: "link" in item
Out[8]: True

Scrapy: If the key exists, why am I getting a KeyError?

More articles: