Scrapy: If the key exists, why am I getting a KeyError?

C items.py

determined:

import scrapy 

class CraigslistSampleItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()

      

and populating each element through a spider, thus:

item = CraigslistSampleItem()
item["title"] = $someXpath.extract() 
item["link"] = $someOtherXpath.extract()

      

When I add them to the list (returned by parse ()) and save it like csv, I get two columns of data, header and links, as expected. If I comment out the XPath for the reference and store as csv, I still get two columns of data and the values ​​in the reference column are empty strings. This seems to make sense since both titles and links are attributes of every CraigslistSampleItem class. I would have thought that I could do something like this (with XPath for reference still commented):

  if item["link"] == '':
      print "link has not been given a value"

      

However, trying to get the link attribute for each element fails like this:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/item.py", line 50, in __getitem__
    return self._values[key]
exceptions.KeyError: 'link'

      

If every instance of an element does indeed have a value for the link (albeit an empty string), why can't I access that key?

+3


source to share


1 answer


The Scrapy Item

class provides a dictionary-like interface for storing the extracted data. No default values ​​have been set for element fields.

To check if a field has been set or not, just check the field key in the element instance:

if 'link' not in item:
    print "link has not been given a value"

      



Demo:

In [1]: import scrapy

In [2]: class CraigslistSampleItem(scrapy.Item):
   ...:         title = scrapy.Field()
   ...:         link = scrapy.Field()
   ...:     

In [3]: item = CraigslistSampleItem()

In [4]: item["title"] = "test"

In [5]: item
Out[5]: {'title': 'test'}

In [6]: "link" in item
Out[6]: False

In [7]: item["link"] = "test link"

In [8]: "link" in item
Out[8]: True

      

+4


source







All Articles