Link Checker (Spider Crawler)

Question

Link Checker (Spider Crawler)

I am looking for a link to check website links and invalid links, the problem is I have a login page at the beginning that is required. What I want is a link checker to view the login details and then scroll through the rest of the website.

Any ideas guys would be appreciated.

+2

python hyperlink web-crawler

hkshambesh 02 oct. '09 at 15:20

source to share

2 answers

You want to take a look at the cookielib module: http://docs.python.org/library/cookielib.html . It implements a full implementation of cookies that will allow you to store login details. Once you use CookieJar, you just need to get the user data from the user (say from the console) and send the correct POST request.

+2

pavpanchekha 02 oct. '09 at 15:48

source to share

fucx · Accepted Answer · 2009-10-02T20:20:23+0000

I recently solved a similar problem:

import urllib
import urllib2
import cookielib

login = 'user@host.com'
password = 'secret'

cookiejar = cookielib.CookieJar()
urlOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))

# adjust this to match the form field names
values = {'username': login, 'password': password}
data = urllib.urlencode(values)
request = urllib2.Request('http://target.of.POST-method', data)
url = urlOpener.open(request)
# from now on, we're authenticated and we can access the rest of the site
url = urlOpener.open('http://rest.of.user.area')

Link Checker (Spider Crawler)

More articles: