How can I improve the performance of python cgi that reads a large file and returns it as a download?

I have this python cgi script that checks to see if it has been accessed many times from the same IP, and if everything is ok, reads a large form file (11MB) and then returns it as an upload.

It works, but the performance sucks. The bottleneck seems to be reading this huge file over and over:

def download_demo():
    """
    Returns the demo file
    """

    file = open(FILENAME, 'r')
    buff = file.read()

    print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n%s" %    (os.path.split(FILENAME)[-1], len(buff), buff)

      

How can I make it faster? I was thinking about using a RAM disk to store the file, but there must be some better solution. Use mod_wsgi

script help instead of cgi? Can I save a large file in Apache memory?

Any help is greatly appreciated.

+2


source to share


4 answers


Use mod_wsgi and use something similar to:

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    file = open('/usr/share/dict/words', 'rb')
    return environ['wsgi.file_wrapper'](file)

      



In other words, use the WSGI wsgi.file_wrapper extension to have Apache / mod_wsgi perform an optimized response of the file content using sendfile / mmap. In other words, avoid the application even reading the file into memory.

+9


source


Why are you printing everything in one print statement? Python has to generate multiple temporary strings to handle content headers, and because of the last% s, it has to contain the entire contents of the file in two different string vars. It should be better.

print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n" %    (os.path.split(FILENAME)[-1], len(buff))
print buff

      



You might also consider reading the file using the raw IO module so that Python doesn't create temp buffers you don't use.

+2


source


Try to read and output (like buffering) a chunk of, say, 16KB at a time. Python is probably doing something slow behind the scenes and manual buffering could be faster.

You don't need to use eg. ramdisk - OS cache should cache file contents for you.

+1


source


mod_wsgi or FastCGI will help in the sense that you don't have to reload the Python interpreter every time you run the script. However, they would do little to improve the performance of reading the file (if that is really your bottleneck). I would advise you to use something along the memcached line.

+1


source







All Articles