Create a large file and send it
I have a fairly large .csv file (up to 1 million lines) that I want to generate and send when the browser asks for it.
The current code has (apart from the fact that I don't actually generate the same data):
class CSVHandler(tornado.web.RequestHandler):
def get(self):
self.set_header('Content-Type','text/csv')
self.set_header('content-Disposition','attachement; filename=dump.csv')
self.write('lineNumber,measure\r\n') # File header
for line in range(0,1000000):
self.write(','.join([str(line),random.random()])+'\r\n') # mock data
app = tornado.web.Application([(r"/csv",csvHandler)])
app.listen(8080)
The problems I have with the above method are as follows:
- The web browser does not directly download the download chunks that are sent. It hangs while the web server seems to be preparing all the content.
- The web server is blocked while it processes this request and causes other clients to hang.
source to share
By default, all data is buffered in memory until the end of the request so that you can replace the error page if an exception occurs. To send the response gradually, your handler must be asynchronous (so it can be interleaved with both writing the response and other IOLoop requests) and use the RequestHandler.flush()
.
Note that "being asynchronous" is not the same as "using a decorator @tornado.web.asynchronous
"; in this case, I recommend using @tornado.gen.coroutine
instead @asynchronous
. This allows you to simply use the operator yield
with every reset:
class CSVHandler(tornado.web.RequestHandler):
@tornado.gen.coroutine
def get(self):
self.set_header('Content-Type','text/csv')
self.set_header('content-Disposition','attachment; filename=dump.csv')
self.write('lineNumber,measure\r\n') # File header
for line in range(0,1000000):
self.write(','.join([str(line),random.random()])+'\r\n') # mock data
yield self.flush()
self.flush()
starts the process of writing data to the network, and yield
waits until the data reaches the kernel. This allows other handlers to run, and it also helps manage memory consumption (by limiting how far ahead the client loading speed you can get). Flushing after every line of the CSV file is a little expensive, so you can only flush after every 100 or 1000 lines.
Note that if an exception starts after loading, there is no way to show the error page to the client; you can only cut off the download partially. Try to validate the request and do whatever may fail before the first call to flush ().
source to share
-
For your first problem, you need the
flush()
given chunks into the output buffer.From the documentation (bold highlighted for emphasis):
RequestHandler.write(chunk)[source]
Writes the given chunk to the output buffer.
To flush the output to the network, use the flush () method below.
-
Relatively your application hangs, you are serving the request from the main thread, so everything will wait for your operation to complete. Instead, you must use Tornado
iostream
for this operation. From thetornado.iostream
documentation :tornado.iostream - Convenient wrappers for non-blocking sockets Utility classes for writing and reading from non-blocking files and sockets.
source to share