Mechanize browser timeout not working

I have a problem with mechanize

timeout

. On most pages it works fine if the URL-address is not loaded in a reasonable time, it causes an error: urllib2.URLError: <urlopen error timed out>

. However, on some pages, the timer does not work and the program stops responding even to a keyboard interrupt. Here's an example page where this happens:

import mechanize

url = 'https://web.archive.org/web/20141104183547/http://www.dallasnews.com/'

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Firefox')]
html = br.open(url, timeout=0.01).read() #hangs on this page, timeout set extremely low to trigger timeout on all pages for debugging

      

First, does this script support other people for that particular url? Second, what could be wrong / how should I debug?

+3


source to share


1 answer


I don't know why this url request hangs for mechanization, but using urllib2; the request is returned ok. Maybe they have code that recognizes mechanization even though the robots are set to false.

I think urllib2 should be a good solution for your situation.



import mechanize
import urllib2
url = 'https://web.archive.org/web/20141104183547/http://www.dallasnews.com/'

try:
    br = mechanize.Browser()
    br.set_handle_robots(False)
    br.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
    html = br.open(url).read() #set_handle_robots
except:
    req = urllib2.Request(url, headers={'User-Agent' : 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16'}) 
    con = urllib2.urlopen( req )
    html = con.read()
print html

      

-2


source







All Articles