Throttling requests with multiple proxies

I am currently assigning random proxy requests to requests through specialized middleware. I would like the key to load up to a specific proxy that this request uses, but as far as I can tell, this is only possible when binding to domains or IP addresses. I am concerned that embedding the federation logic in the proxy middleware will cause thread safety issues. Has anyone done this before? Any pointers would be appreciated.

+3


source to share


1 answer


As recommended on the Scrapy mailing list , there is a special request meta variable that the "Autrotrottle middleware" obeys , called download_slot

- this allows you to program the grouping / throttling of requests.

In my custom proxy:



self.proxies = get_proxies() #list of proxies
proxy_address = random.choice(self.proxies)
request.meta['proxy'] = proxy_address
request.meta['download_slot'] = hash(proxy_address) % MAX_CONCURRENT_REQUESTS

      

I use a hash function as a cheap way to load requests with an externally defined constraint on requests.

+2


source







All Articles