Python - get header information from url
I've searched all around for a Python 3.x example code to get the HTTP header information.
Something as simple as the PHP equivalent of get_headers is easy to find in Python. Or maybe I'm not sure how best to wrap my head around me.
In essence, I would like to code something where I can see if the url exists or not
something in line
h = get_headers(url)
if(h[0] == 200)
{
print("Bingo!")
}
So far I have tried
h = http.client.HTTPResponse('http://docs.python.org/')
But I always got an error
source to share
Get HTTP response code in python-3.x, use the module : urllib.request
>>> import urllib.request
>>> response = urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
... print('Bingo')
...
Bingo
The returned Object will give you access to all headers. For example: HTTPResponse
>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'
If the call is urllib.request.urlopen()
completed . You can handle this to get the response code: HTTPError
Exception
import urllib.request
try:
response = urllib.request.urlopen(url)
if response.getcode() == 200:
print('Bingo')
else:
print('The response code was not 200, but: {}'.format(
response.get_code()))
except urllib.error.HTTPError as e:
print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))
source to share
For Python 2.x
urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 use httplib. So, depending on if you plan on doing this, check a lot (1000 times), it would be better to use httplib. More documentation and examples here .
Sample code:
import httplib
try:
h = httplib.HTTPConnection("www.google.com")
h.connect()
except Exception as ex:
print "Could not connect to page."
For Python 3.x
A similar story with urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be faster. For more documentation and examples take a look here .
Sample code:
import http.client
try:
conn = http.client.HTTPConnection("www.google.com")
conn.connect()
except Exception as ex:
print("Could not connect to page.")
and if you want to check the status codes you will need to replace
conn.connect()
from
conn.request("GET", "/index.html") # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302: # Specify codes here.
print("Page Found!")
Note that in both examples, if you want to catch a specific exception related to when the URL does not exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation ).
source to share