Python and mechanize.open ()

I have some code that uses a mechanized and password protected site. I can log in just fine and get the expected results. However, as soon as I log in, I don't want to bind the "click", I want to iterate over the list of urls. Unfortunately, every call to .open () just gets redirected to the login page, which is what I would expect if I were logged out or tried to login to another browser. This makes me think this is cookie handling, but I'm at a loss.

def main():
    browser = mechanize.Browser()
    browser.set_handle_robots(False)
    # The below code works perfectly
    page_stats = login_to_BOE(browser)
    print page_stats

    # This code ALWAYS gets the login page again NOT the desired 
    # behaviour of getting the new URL. This is the behaviour I would
    # expect if I had logged out of our site.
    for page in PAGES:
        print '%s%s' % (SITE, page)
        page = browser.open('%s%s' % (SITE, page))
        page_stats = get_page_statistics(page.get_data())
        print page_stats

      

+2


source to share


3 answers


This is not an answer, but it can lead you in the right direction. Attempt to enable Mechanize extensive debugging tools using some combination of the instructions below:

browser.set_debug_redirects(True)
browser.set_debug_responses(True)
browser.set_debug_http(True)

      

This will provide a stream of HTTP information which I found very useful when I developed my one and only engine based application.



I should note that I am not doing much (if anything) in my application than what you showed in your question. I create a browser object in the same way and then pass it to this login function:

def login(browser):
    browser.open(config.login_url)
    browser.select_form(nr=0)
    browser[config.username_field] = config.username
    browser[config.password_field] = config.password
    browser.submit()
    return browser

      

I can then open pages with the required authentication using browser.open (url) and all cookie handling is handled transparently and automatically for me.

+1


source


Instead of using for each link:

browser.open('www.google.com')

      

Try the following after your first login:



browser.follow_link(text = 'a href text')

      

I am assuming the call to open is about reloading your cookies.

+2


source


Will be,

Your suggestion pointed me in the right direction.

Every web browser I have ever used answered the following:

http://www.foo.com//bar/baz/trool.html

      

Since I hate that the content is not concatenated correctly, my SITE variable was " http://www.foo.com/ "

Also, all other urls were "/bar/baz/trool.html"

My calls for discovery turned out to be .open('http://www.foo.com//bar/baz/trool.html')

, and the mechanization browser is obviously not massaging like a "real" browser. Apache didn't like the url.

+2


source







All Articles