Can I specify any method as a callback when creating a Scrapy request object?

I am trying to create a request and have previously passed a function in my spider class as a callback. However, I have since moved this function to a subclass of Item because I would like to have different item types, and the callback may be different for each item type (for example, at the moment I am going to raise the DropItem if the content type is wrong as expected, and there is a different set of valid MIME types for each item type). So what I was wondering was if I could pass a function from the Item subclass as a callback parameter? Basically, like this:

item = MyCustomItem()  # Extends scrapy.item.Item
# bunch of code here...
req = Request(urlparse.urljoin(response.url, url), method="HEAD", callback=item.parse_resource_metadata)

      

item.parse_resource_metadata

Not called at the moment . Printing req.callback

gives

<bound method ZipResource.parse_resource_metadata of {(correct data for this Item object}>

      

so it at least constructs the query as I was hoping.

[edit] Mea culpa, callback was not called because the start page was not crawled (I had to override parse_start_url()

. But it turns out I was doing something wrong, so good I asked!

+3


source to share


1 answer


In theory, this is doable because it callback

is just a callable having response

as an argument.

Although they Item

are only field containers, they are for storing data, you shouldn't put logic there.

Better to create a method in the spider and pass the instance Item

insidemeta

:



def parse(self, response):
    ...
    item = MyCustomItem()
    ...
    yield Request(urlparse.urljoin(response.url, url), 
                  method="HEAD", 
                  meta={'item': item},
                  callback=self.my_callback)

def my_callback(self, response):
    item = response.meta['item']
    ...

      


I'm not entirely sure what you are trying to achieve, but you can also take a closer look at Item Loaders

and Input and Output Processors

.

+3


source







All Articles