Create a large file and send it

I have a fairly large .csv file (up to 1 million lines) that I want to generate and send when the browser asks for it.

The current code has (apart from the fact that I don't actually generate the same data):

class CSVHandler(tornado.web.RequestHandler): 
  def get(self):
    self.set_header('Content-Type','text/csv')
    self.set_header('content-Disposition','attachement; filename=dump.csv')  
    self.write('lineNumber,measure\r\n') # File header
    for line in range(0,1000000): 
      self.write(','.join([str(line),random.random()])+'\r\n') # mock data

app = tornado.web.Application([(r"/csv",csvHandler)])
app.listen(8080)

      

The problems I have with the above method are as follows:

  • The web browser does not directly download the download chunks that are sent. It hangs while the web server seems to be preparing all the content.
  • The web server is blocked while it processes this request and causes other clients to hang.
+3


source to share


2 answers


By default, all data is buffered in memory until the end of the request so that you can replace the error page if an exception occurs. To send the response gradually, your handler must be asynchronous (so it can be interleaved with both writing the response and other IOLoop requests) and use the RequestHandler.flush()

.

Note that "being asynchronous" is not the same as "using a decorator @tornado.web.asynchronous

"; in this case, I recommend using @tornado.gen.coroutine

instead @asynchronous

. This allows you to simply use the operator yield

with every reset:

class CSVHandler(tornado.web.RequestHandler): 
    @tornado.gen.coroutine
    def get(self):
        self.set_header('Content-Type','text/csv')
        self.set_header('content-Disposition','attachment; filename=dump.csv')  
        self.write('lineNumber,measure\r\n') # File header
        for line in range(0,1000000): 
            self.write(','.join([str(line),random.random()])+'\r\n') # mock data
            yield self.flush()

      



self.flush()

starts the process of writing data to the network, and yield

waits until the data reaches the kernel. This allows other handlers to run, and it also helps manage memory consumption (by limiting how far ahead the client loading speed you can get). Flushing after every line of the CSV file is a little expensive, so you can only flush after every 100 or 1000 lines.

Note that if an exception starts after loading, there is no way to show the error page to the client; you can only cut off the download partially. Try to validate the request and do whatever may fail before the first call to flush ().

+6


source


  • For your first problem, you need the flush()

    given chunks into the output buffer.

    From the documentation (bold highlighted for emphasis):

    RequestHandler.write(chunk)[source]

    Writes the given chunk to the output buffer.

    To flush the output to the network, use the flush () method below.

  • Relatively your application hangs, you are serving the request from the main thread, so everything will wait for your operation to complete. Instead, you must use Tornado iostream

    for this operation. From the tornado.iostream

    documentation
    :

    tornado.iostream - Convenient wrappers for non-blocking sockets Utility classes for writing and reading from non-blocking files and sockets.



+1


source







All Articles