Throttling requests with multiple proxies
I am currently assigning random proxy requests to requests through specialized middleware. I would like the key to load up to a specific proxy that this request uses, but as far as I can tell, this is only possible when binding to domains or IP addresses. I am concerned that embedding the federation logic in the proxy middleware will cause thread safety issues. Has anyone done this before? Any pointers would be appreciated.
source to share
As recommended on the Scrapy mailing list , there is a special request meta variable that the "Autrotrottle middleware" obeys , called download_slot
- this allows you to program the grouping / throttling of requests.
In my custom proxy:
self.proxies = get_proxies() #list of proxies
proxy_address = random.choice(self.proxies)
request.meta['proxy'] = proxy_address
request.meta['download_slot'] = hash(proxy_address) % MAX_CONCURRENT_REQUESTS
I use a hash function as a cheap way to load requests with an externally defined constraint on requests.
source to share