(Python) socket.gaierror on every add ... except http://www.reddit.com?
I am just playing around and I am trying to grab information from websites. Unfortunately with the following code:
import sys
import socket
import re
from urlparse import urlsplit
url = urlsplit(sys.argv[1])
sock = socket.socket()
sock.connect((url[0] + '://' + url[1],80))
path = url[2]
if not path:
path = '/'
print path
sock.send('GET ' + path + ' HTTP/1.1\r\n'
+ 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.3.154.9 Safari/525.19\r\n'
+ 'Accept: */*\r\n'
+ 'Accept-Language: en-US,en\r\n'
+ 'Accept-Charset: ISO-8859-1,*,utf-8\r\n'
+ 'Host: 68.33.143.182\r\n'
+ 'Connection: Keep-alive\r\n'
+ '\r\n')
I am getting the following error:
Traceback (last call last):
file "D: \ Development \ Python \ PyCrawler \ PyCrawler.py", line 10, in sock.connect ((url [0] + ': //' + url [1], 80) ) File "", line 1, in socket.gaierror: (11001, 'getaddrinfo failed')
The only time I don't get the error is if the URL passed is http://www.reddit.com . Every other url I've tried comes up with socket.gaierror. Can someone explain this? And maybe give a solution?
source to share
Please, please, please, please, please, please don't.
urllib and urllib2 are your friends.
Read the "missing" urllib2 manual if you have any problems with it.
source to share
Have you ever modified your hosts file ? If he has a Reddit entry but nothing more, that might explain the site's unique result.