Why is Scrapy Field a voice recorder?

Basically I have a really standard setup, the spider is subclassed from CrawlSpider

and an item with three fields that looks like this:

class AppdexItem(Item):
    name = Field()
    url = Field()
    desc = Field()

      

When my spider parses the response, it fills in an element like this:

i = AppdexItem()
name = hxs.select("//h1[@class='doc-banner-title']/text()")
i['name'] = name.extract()[0]

      

Now I am confused when I read what "Really" is. This is literally its implementation :

 class Field(dict):
     """Container of field metadata"""

      

It's simple simple dict

. I wondered why this is the case and have been looking at the implementation for a while. It still didn't make any sense. So I ran scrapy shell

on a page that was supposed to be parsed into an element and this is what I got:

In [16]: item = spider.parse_app(response)

In [17]: item.fields
Out[17]: {'desc': {}, 'name': {}, 'url': {}}

In [18]: item['name']
Out[18]: u'Die Kleine Meerjungfrau'

      

What? Either I am doing something completely wrong (I did everything like the official lessons and examples told me), or Field

, being dict

, is completely pointless.

Can someone explain this to me?

+3


source to share


2 answers


Historical reasons. There used to be metadata attached to fields that were stored in a dict. I am assuming the dict is being used because it has a convenience constructor (key = value). You can see that the last use of this was removed in this commit . At the moment this makes very little difference, and it might just be a simple object (although it might be difficult to change if there is still code that assumes it is a dict for some reason).



+6


source


The field is used as a dict to store metadata; One use case is to specify input and output processors for the ItemLoader. Check out http://doc.scrapy.org/en/master/topics/loaders.html#declaring-input-and-output-processors .



I personally think it would be helpful for Scrapy to maintain simple dictation without any metadata, but this is another question.

+3


source







All Articles