Throttling requests with multiple proxies

I am currently assigning random proxy requests to requests through specialized middleware. I would like the key to load up to a specific proxy that this request uses, but as far as I can tell, this is only possible when binding to domains or IP addresses. I am concerned that embedding the federation logic in the proxy middleware will cause thread safety issues. Has anyone done this before? Any pointers would be appreciated.


source to share

1 answer

As recommended on the Scrapy mailing list , there is a special request meta variable that the "Autrotrottle middleware" obeys , called download_slot

- this allows you to program the grouping / throttling of requests.

In my custom proxy:

self.proxies = get_proxies() #list of proxies
proxy_address = random.choice(self.proxies)
request.meta['proxy'] = proxy_address
request.meta['download_slot'] = hash(proxy_address) % MAX_CONCURRENT_REQUESTS


I use a hash function as a cheap way to load requests with an externally defined constraint on requests.



All Articles