Limiting requests with the Google Python module
I am pulling about 100.00 values ββfrom a spreadsheet and getting the first results to see if they are http or https. The scripts work fine (enough for my purposes), but I get a 503 error after the 70th iteration of the loop.
Any thoughts / ideas / suggestions on how to get the number of requests I need?
code:
import pandas as pd
import re
import time
library_list = pd.read_csv("PLS_FY2014_AE_pupld14a.csv")
zero = 0
with_https = 0
for i in library_list['LIBNAME']:
for url in search(library_list['LIBNAME'][zero], num = 1, start = 0, stop = 1):
time.sleep(5)
zero += 1
print(zero)
if 'https' in url:
with_https += 1
source to share
I am trying to do the same and am getting 503 error after 30-50 results. I ended up making the search wait for a random amount of time between 30 and 60 seconds per search. I read about other issues with the same problem and they said that Google's crawlers are capped at around 50 per hour. the code i used is
import os, arcpy, urllib, ssl, time, datetime, random, errno
from datetime import datetime
from arcpy import env
from distutils.dir_util import copy_tree
try:
from google import search
except ImportError:
print("No module named 'google' found")
from google import search
with arcpy.da.UpdateCursor(facilities, ["NAME", "Weblinks", "ADDRESSSTATECODE", "MP_TYPE"]) as rows:
for row in rows:
if row[1] is None:
if row[3] != "xxxxxx":
query = str(row[0])
print("The query will be " + query)
wt = random.uniform(30,60)
print("Script will wait " + str(wt) + " seconds before the next search.")
for j in search("recreation.gov " + query + ", " + str(row[2]), tld="co.in", num=1, stop=1, pause=wt):
row[1] = str(j)
rows.updateRow(row)
print(row[1])
time.sleep(5)
print("")
My script has been running for 7 days non-stop with no errors. It might be slow, but it will get the job done eventually. I am doing about 18,000 searches this round.
source to share