How do I download a file from urllib3?
This is based on another question on this site: What is the best way to upload a file using urllib3 However, I cannot comment, so I ask another question:
How do I download a (larger) file from urllib3?
I tried the same code that works with urllib2 ( Download file from internet in Python 3 ), but it doesn't work with urllib3:
http = urllib3.PoolManager()
with http.request('GET', url) as r, open(path, 'wb') as out_file:
#shutil.copyfileobj(r.data, out_file) # this writes a zero file
shutil.copyfileobj(r.data, out_file)
This indicates that the 'bytes' object has no 'read' attribute
Then I tried to use the code in this question but it gets stuck in an infinite loop because the data is always "0":
http = urllib3.PoolManager()
r = http.request('GET', url)
with open(path, 'wb') as out:
while True:
data = r.read(4096)
if data is None:
break
out.write(data)
r.release_conn()
However, if I read everything in memory, the file loads correctly:
http = urllib3.PoolManager()
r = http.request('GET', url)
with open(path, 'wb') as out:
out.write(data)
I don't want to do this as I can potentially upload very large files. Unfortunately, urllib documentation does not cover best practices in this topic.
(Also, please do not suggest requests or urllib2 because they are not flexible enough when it comes to self-signed certificates.)
source to share
You were very close, the part that was missing sets preload_content=False
(this will be the default in the next version). Also you can think of the answer as a file-like object, not an attribute .data
(which is a magic property that will hopefully be deprecated someday).
- with http.request('GET', url) ...
+ with http.request('GET', url, preload_content=False) ...
This code should work:
http = urllib3.PoolManager()
with http.request('GET', url, preload_content=False) as r, open(path, 'wb') as out_file:
shutil.copyfileobj(r, out_file)
urllib3's response object also supports an io
interface , so you can also do things like ...
import io response = http.request(..., preload_content=False) buffered_response = io.BufferedReader(response, 2048)
As long as you add preload_content=False
to any of the three tries and treat the response as a file-like object, they should all work.
Unfortunately, urllib documentation does not cover best practices in this topic.
You are completely correct, hopefully you will consider helping us document this use case by submitting a pull request here: https://github.com/shazow/urllib3
source to share