Can I specify any method as a callback when creating a Scrapy request object?
I am trying to create a request and have previously passed a function in my spider class as a callback. However, I have since moved this function to a subclass of Item because I would like to have different item types, and the callback may be different for each item type (for example, at the moment I am going to raise the DropItem if the content type is wrong as expected, and there is a different set of valid MIME types for each item type). So what I was wondering was if I could pass a function from the Item subclass as a callback parameter? Basically, like this:
item = MyCustomItem() # Extends scrapy.item.Item
# bunch of code here...
req = Request(urlparse.urljoin(response.url, url), method="HEAD", callback=item.parse_resource_metadata)
item.parse_resource_metadata
Not called at the moment . Printing req.callback
gives
<bound method ZipResource.parse_resource_metadata of {(correct data for this Item object}>
so it at least constructs the query as I was hoping.
[edit] Mea culpa, callback was not called because the start page was not crawled (I had to override parse_start_url()
. But it turns out I was doing something wrong, so good I asked!
source to share
callback
is just a callable having
response
as an argument.
Although they Item
are only field containers, they are for storing data, you shouldn't put logic there.
Better to create a method in the spider and pass the instance Item
insidemeta
:
def parse(self, response):
...
item = MyCustomItem()
...
yield Request(urlparse.urljoin(response.url, url),
method="HEAD",
meta={'item': item},
callback=self.my_callback)
def my_callback(self, response):
item = response.meta['item']
...
I'm not entirely sure what you are trying to achieve, but you can also take a closer look at Item Loaders
and Input and Output Processors
.
source to share