Why is Scrapy Field a voice recorder?
Basically I have a really standard setup, the spider is subclassed from CrawlSpider
and an item with three fields that looks like this:
class AppdexItem(Item):
name = Field()
url = Field()
desc = Field()
When my spider parses the response, it fills in an element like this:
i = AppdexItem()
name = hxs.select("//h1[@class='doc-banner-title']/text()")
i['name'] = name.extract()[0]
Now I am confused when I read what "Really" is. This is literally its implementation :
class Field(dict):
"""Container of field metadata"""
It's simple simple dict
. I wondered why this is the case and have been looking at the implementation for a while. It still didn't make any sense. So I ran scrapy shell
on a page that was supposed to be parsed into an element and this is what I got:
In [16]: item = spider.parse_app(response)
In [17]: item.fields
Out[17]: {'desc': {}, 'name': {}, 'url': {}}
In [18]: item['name']
Out[18]: u'Die Kleine Meerjungfrau'
What? Either I am doing something completely wrong (I did everything like the official lessons and examples told me), or Field
, being dict
, is completely pointless.
Can someone explain this to me?
source to share
Historical reasons. There used to be metadata attached to fields that were stored in a dict. I am assuming the dict is being used because it has a convenience constructor (key = value). You can see that the last use of this was removed in this commit . At the moment this makes very little difference, and it might just be a simple object (although it might be difficult to change if there is still code that assumes it is a dict for some reason).
source to share
The field is used as a dict to store metadata; One use case is to specify input and output processors for the ItemLoader. Check out http://doc.scrapy.org/en/master/topics/loaders.html#declaring-input-and-output-processors .
I personally think it would be helpful for Scrapy to maintain simple dictation without any metadata, but this is another question.
source to share