Python - get header information from url

I've searched all around for a Python 3.x example code to get the HTTP header information.

Something as simple as the PHP equivalent of get_headers is easy to find in Python. Or maybe I'm not sure how best to wrap my head around me.

In essence, I would like to code something where I can see if the url exists or not

something in line

h = get_headers(url)
if(h[0] == 200)
{
   print("Bingo!")
}

      

So far I have tried

h = http.client.HTTPResponse('http://docs.python.org/')

      

But I always got an error

+3


source to share


4 answers


Get HTTP response code in , use the module : urllib.request

>>> import urllib.request
>>> response =  urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
...     print('Bingo')
... 
Bingo

      

The returned Object will give you access to all headers. For example: HTTPResponse

>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'

      




If the call is urllib.request.urlopen()

completed . You can handle this to get the response code: HTTPError

Exception

import urllib.request
try:
    response = urllib.request.urlopen(url)
    if response.getcode() == 200:
        print('Bingo')
    else:
        print('The response code was not 200, but: {}'.format(
            response.get_code()))
except urllib.error.HTTPError as e:
    print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))

      

+9


source


For Python 2.x

urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 use httplib. So, depending on if you plan on doing this, check a lot (1000 times), it would be better to use httplib. More documentation and examples here .

Sample code:



import httplib
try:
    h = httplib.HTTPConnection("www.google.com")
    h.connect()
except Exception as ex:
    print "Could not connect to page."

      


For Python 3.x

A similar story with urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be faster. For more documentation and examples take a look here .

Sample code:



import http.client

try:
    conn = http.client.HTTPConnection("www.google.com")
    conn.connect()    
except Exception as ex:
    print("Could not connect to page.")

      

and if you want to check the status codes you will need to replace

conn.connect()

      

from

conn.request("GET", "/index.html")  # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302:  # Specify codes here.
    print("Page Found!")

      


Note that in both examples, if you want to catch a specific exception related to when the URL does not exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation ).

+2


source


you can use urllib2 library

import urllib2
if urllib2.urlopen(url).code == 200:
    print "Bingo"

      

+1


source


You can use the requests module to test it:

import requests
url = "http://www.example.com/"
res = requests.get(url)
if res.status_code == 200:
    print("bingo")

      

You can also check the content of the header before loading all the content of the web page using header .

0


source







All Articles