Why is Scrapy Field a voice recorder?

Question

Why is Scrapy Field a voice recorder?

Basically I have a really standard setup, the spider is subclassed from CrawlSpider

and an item with three fields that looks like this:

class AppdexItem(Item):
    name = Field()
    url = Field()
    desc = Field()

When my spider parses the response, it fills in an element like this:

i = AppdexItem()
name = hxs.select("//h1[@class='doc-banner-title']/text()")
i['name'] = name.extract()[0]

Now I am confused when I read what "Really" is. This is literally its implementation :

 class Field(dict):
     """Container of field metadata"""

It's simple simple dict

. I wondered why this is the case and have been looking at the implementation for a while. It still didn't make any sense. So I ran scrapy shell

on a page that was supposed to be parsed into an element and this is what I got:

In [16]: item = spider.parse_app(response)

In [17]: item.fields
Out[17]: {'desc': {}, 'name': {}, 'url': {}}

In [18]: item['name']
Out[18]: u'Die Kleine Meerjungfrau'

What? Either I am doing something completely wrong (I did everything like the official lessons and examples told me), or Field

, being dict

, is completely pointless.

Can someone explain this to me?

+3

python scrapy

dAnjou 15 Feb 13 at 17:10

source to share

2 answers

The field is used as a dict to store metadata; One use case is to specify input and output processors for the ItemLoader. Check out http://doc.scrapy.org/en/master/topics/loaders.html#declaring-input-and-output-processors .

I personally think it would be helpful for Scrapy to maintain simple dictation without any metadata, but this is another question.

+3

Mikhail Korobov 01 Aug '14 at 20:00

source to share

Rcxdude · Accepted Answer · 2013-02-17T20:32:20+0000

Historical reasons. There used to be metadata attached to fields that were stored in a dict. I am assuming the dict is being used because it has a convenience constructor (key = value). You can see that the last use of this was removed in this commit . At the moment this makes very little difference, and it might just be a simple object (although it might be difficult to change if there is still code that assumes it is a dict for some reason).

Why is Scrapy Field a voice recorder?

More articles: