How can I clear this frame?

Question

How can I clear this frame?

If you visit this link right now, you are likely to get a VBScript error.

On the other hand, if you visit this link first and then the above link (in the same session), the page goes through.

As this application is configured, the first page is intended to serve as a frame on the second (main) page. If you click a little, you can see how it works.

My question is, how do I clear the first page using Python? I've tried everything I can think of - urllib, urllib2, mechanize - and all I get is 500 errors or timeouts.

I suspect the answers lie in mechanization, but my mechanic-fu is not good enough to crack this. Can anyone please help?

+2

python vbscript screen-scraping mechanize

hanksims 21 Aug 09 at 20:28

source to share

2 answers

You can also try BeautifulSoup in addition to Mechanize. I'm not sure, but you have to parse the DOM into a framed page.

I also find Tamper Data quite a useful plugin when writing scrapers.

+1

Yancy 21 Aug 09 at 20:38

source to share

Joel coehoorn · Accepted Answer · 2009-08-21T20:46:55+0000

It always comes down to a request / response model. You just need to create a series of HTTP requests to get the responses you want. In this case, you also need a server to handle each request as part of the same session. To do this, you need to figure out how the server is tracking sessions. This can range from cookies to hidden inputs for generating actions, posting data or query strings. If I had to guess I would put my money in a cookie in this case (I didn't check the links). If this is correct, you need to send the first request, save the cookie that you return, and then send that cookie along with the second request.

It could also be that there will be buttons and links on the start page that will take you to the second page. These links will have something like <A href="http://cad.chp.ca.gov/iiqr.asp?Center=RDCC&LogNumber=0197D0820&t=Traffic%20Hazard&l=3358%20MYRTLE&b=">

where most of the gobbedlygook is generated by the first page.

The part "Center=RDCC&LogNumber=0197D0820&t=Traffic%20Hazard&l=3358%20MYRTLE&b="

encodes some of the session information you should get from the first page.

And of course, you may need to work around both.

How can I clear this frame?

More articles: