(Python) socket.gaierror on every add ... except http://www.reddit.com?

I am just playing around and I am trying to grab information from websites. Unfortunately with the following code:

import sys
import socket
import re
from urlparse import urlsplit

url = urlsplit(sys.argv[1])


sock = socket.socket()
sock.connect((url[0] + '://' + url[1],80))
path = url[2]
if not path:
    path = '/'

print path
sock.send('GET ' + path + ' HTTP/1.1\r\n'
    + 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.3.154.9 Safari/525.19\r\n'
    + 'Accept: */*\r\n'
    + 'Accept-Language: en-US,en\r\n'
    + 'Accept-Charset: ISO-8859-1,*,utf-8\r\n'
    + 'Host: 68.33.143.182\r\n'
    + 'Connection: Keep-alive\r\n'
    + '\r\n')

      

I am getting the following error:

Traceback (last call last):
file "D: \ Development \ Python \ PyCrawler \ PyCrawler.py", line 10, in sock.connect ((url [0] + ': //' + url [1], 80) ) File "", line 1, in socket.gaierror: (11001, 'getaddrinfo failed')

The only time I don't get the error is if the URL passed is http://www.reddit.com . Every other url I've tried comes up with socket.gaierror. Can someone explain this? And maybe give a solution?

0


source to share


5 answers


Please, please, please, please, please, please don't.

urllib and urllib2 are your friends.



Read the "missing" urllib2 manual if you have any problems with it.

+3


source


sock.connect((url[0] + '://' + url[1],80))

      

Don't do this, do this instead:

sock.connect((url[1], 80))

      



connect

expects a hostname, not a URL.

In fact, you should probably use something more high-level than HTTP sockets. Maybe httplib .

+3


source


Have you ever modified your hosts file ? If he has a Reddit entry but nothing more, that might explain the site's unique result.

+2


source


you forgot to resolve the hostname:

addr = socket.gethostbyname(url[1])
...
sock.connect((addr,80))

      

+1


source


Use urllib2. Or BeautifulSoup .

0


source







All Articles