Urllib3 download file using specified user agent
What is the correct way to update user agent information in urllib3
?
How can I verify that user agent information has indeed been changed and is being used?
For example:
user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
http = urllib3.PoolManager(10, headers=user_agent)
r1 = http.request('GET', 'http://example.com/')
if r1.status is 200:
with open('somefile','w+') as f:
f.write(r1.data)
When I create PoolManager
in http
, I looked at it dir(http)
and saw that http.headers
the default is empty and updated to the specified user agent information, but is it in use? Should I check at all without going through the logs apache
?
And actually check /var/log/apache2/access.log
after trying to update user agent:
>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
>>> http = urllib3.PoolManager(2, headers=user_agent)
>>> r = http.request('GET','localhost')
>>> with open('/var/log/apache2/access.log','r') as f:
... last_line = f.readlines()[-1]
...
>>> last_line
'127.0.0.1 - - [08/Dec/2014:20:42:04 -0500] "GET / HTTP/1.1" 200 461 "-" "-"\n'
source to share
header
the argument should be headers
:
http = urllib3.PoolManager(10, header=user_agent)
You can confirm that the headers are set correctly using sites such as httpbin.org
:
>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) ..'}
>>> http = urllib3.PoolManager(10, headers=user_agent)
>>> r1 = http.urlopen('GET', 'http://httpbin.org/headers')
>>> print(r1.data)
{
"headers": {
"Accept-Encoding": "identity",
"Connect-Time": "1",
"Connection": "close",
"Host": "httpbin.org",
"Total-Route-Time": "0",
"User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0",
"Via": "1.1 vegur",
"X-Request-Id": "5ef53f21-6caf-4e45-8123-98e417cd05ba"
}
}
or you can use a packet sniffer (like Wireshark ).
source to share