Google crawl service 503 not available
I am having a very strange problem when I browse google search engine using wget, curl or python on my servers. Google redirects me to an address starting with [ipv4 | ipv6] .google.fr / sorry / IndexRedirect ... and finally sends 503 error, service unavailable ...
Sometimes the scan works correctly, and sometimes not during the day, and I've tried almost everything: forcing ipv4 / ipv6 instead of hostname, referrer, user agent, vpn, .com / .fr /, proxy and tor, ...
I am guessing this is a bug on Google Servers ... any idea? thank!
wget "http://google.fr/search?q=test"
--2015-06-03 10:19:52-- http://google.fr/search?q=test
Resolving google.fr (google.fr)... 2a00:1450:400c:c05::5e, 173.194.67.94
Connecting to google.fr (google.fr)|2a00:1450:400c:c05::5e|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://ipv6.google.com/sorry/IndexRedirect?continue=http://google.fr/search%3Fq%3Dtest&q=CGMSECABQdAAUQABAAAAAAAAH1QYqPG6qwUiGQDxp4NLQuHgP_i-oiUu0ZShPumAZRF3u_0 [following]
--2015-06-03 10:19:53-- http://ipv6.google.com/sorry/IndexRedirect?continue=http://google.fr/search%3Fq%3Dtest&q=CGMSECABQdAAUQABAAAAAAAAH1QYqPG6qwUiGQDxp4NLQuHgP_i-oiUu0ZShPumAZRF3u_0
Resolving ipv6.google.com (ipv6.google.com)... 2a00:1450:400c:c05::64
Connecting to ipv6.google.com (ipv6.google.com)|2a00:1450:400c:c05::64|:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2015-06-03 10:19:53 ERROR 503: Service Unavailable.
source to share
Google has triggers to ridicule bots and other abuse of their Terms of Service, so they set a limit (or "choke") on the number of calls that the same IP address can in a given period of time. I believe this is about 10 calls per minute. Example: If you are pasting your Url into the browser when it fails with a 503 error, you will receive a Captcha request from Google to prove you are not a bot.
I use the pattern.web module to do pretty much the same thing you do (for harmless research purposes, of course!), And the documentation for this library shows throttling limits for most of the popular APIs (Google, Bing, Twitter, Facebook .. .).
Try sending your requests every 15+ seconds or so to avoid disabling the throttle limit.
source to share