How can I improve the performance of python cgi that reads a large file and returns it as a download?

Question

How can I improve the performance of python cgi that reads a large file and returns it as a download?

I have this python cgi script that checks to see if it has been accessed many times from the same IP, and if everything is ok, reads a large form file (11MB) and then returns it as an upload.

It works, but the performance sucks. The bottleneck seems to be reading this huge file over and over:

def download_demo():
    """
    Returns the demo file
    """

    file = open(FILENAME, 'r')
    buff = file.read()

    print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n%s" %    (os.path.split(FILENAME)[-1], len(buff), buff)

How can I make it faster? I was thinking about using a RAM disk to store the file, but there must be some better solution. Use mod_wsgi

script help instead of cgi? Can I save a large file in Apache memory?

Any help is greatly appreciated.

+2

performance python cgi mod-wsgi

cfischer 22 Sep 09 at 20:17

source to share

4 answers

Why are you printing everything in one print statement? Python has to generate multiple temporary strings to handle content headers, and because of the last% s, it has to contain the entire contents of the file in two different string vars. It should be better.

print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n" %    (os.path.split(FILENAME)[-1], len(buff))
print buff

You might also consider reading the file using the raw IO module so that Python doesn't create temp buffers you don't use.

+2

jmucchiello 22 Sep 09 at 20:25

source to share

Try to read and output (like buffering) a chunk of, say, 16KB at a time. Python is probably doing something slow behind the scenes and manual buffering could be faster.

You don't need to use eg. ramdisk - OS cache should cache file contents for you.

+1

Andrew Medico 22 Sep 09 at 20:24

source to share

mod_wsgi or FastCGI will help in the sense that you don't have to reload the Python interpreter every time you run the script. However, they would do little to improve the performance of reading the file (if that is really your bottleneck). I would advise you to use something along the memcached line.

+1

oggy 22 Sep 09 at 20:24

source to share

Graham dumpleton · Accepted Answer · 2009-09-22T23:53:29+0000

Use mod_wsgi and use something similar to:

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    file = open('/usr/share/dict/words', 'rb')
    return environ['wsgi.file_wrapper'](file)

In other words, use the WSGI wsgi.file_wrapper extension to have Apache / mod_wsgi perform an optimized response of the file content using sendfile / mmap. In other words, avoid the application even reading the file into memory.

How can I improve the performance of python cgi that reads a large file and returns it as a download?

More articles: