Link Checker (Spider Crawler)

I am looking for a link to check website links and invalid links, the problem is I have a login page at the beginning that is required. What I want is a link checker to view the login details and then scroll through the rest of the website.

Any ideas guys would be appreciated.

+2


source to share


2 answers


I recently solved a similar problem:



import urllib
import urllib2
import cookielib

login = 'user@host.com'
password = 'secret'

cookiejar = cookielib.CookieJar()
urlOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))

# adjust this to match the form field names
values = {'username': login, 'password': password}
data = urllib.urlencode(values)
request = urllib2.Request('http://target.of.POST-method', data)
url = urlOpener.open(request)
# from now on, we're authenticated and we can access the rest of the site
url = urlOpener.open('http://rest.of.user.area')

      

+3


source


You want to take a look at the cookielib module: http://docs.python.org/library/cookielib.html . It implements a full implementation of cookies that will allow you to store login details. Once you use CookieJar, you just need to get the user data from the user (say from the console) and send the correct POST request.



+2


source







All Articles