Using gzip file as stdin for commands executed with subprocess.call

I have a python script executing multiple commands using subprocess.call (). I need to pipe data from a gzipped file to one of these commands using stdin, but no matter what I do, the command seems to be getting gzipped data.

This is what I think should work:

import gzip
from subprocess import call

in_fname = 'test.gz'
out_fname = 'test.txt'

gz = gzip.open(in_fname, 'rb')
txt = open(out_fname, 'w')

call(['cat'], stdin=gz, stdout=txt)

      

But at the end, "test.txt" is compressed and is exactly the same size as the gzipped input file.

If I call gz.read () then I get the correct unpacked data as expected. What do I need to do to use a gzipped file as stdin?

+3


source to share


1 answer


After a bit of research, the root of the problem stems from the fact that your operating system doesn't know that the file descriptor for a gzipped file is anything special. In principle, it gzip

provides a "file" interface, but the subprocess ( cat

in this case) does not know that it is a special file to be unpacked. So it just reads the file byte for byte and prints the pills to the screen that it reads.

My next idea was to read the entire file into python (which knows it is compressed and will decompress it) and then pass the string to the subprocess. I confused with wrapping the unzipped content in an object StringIO

, but it turned out that it didn't work. Another answer ( Use StringIO as stdin with Popen ) mentioned a slightly different call subprocess

:



import gzip
from subprocess import Popen, PIPE

in_fname = 'test.gz'
out_fname = 'test.txt'

with gzip.open(in_fname, 'rb') as f:
  gz = f.read()
txt = open(out_fname, 'w')


process = Popen(['cat'], stdin=PIPE, stdout=txt)
process.communicate(gz)

      

What works. Note that this requires reading the entire file into memory, which can be a problem for really large files.

0


source







All Articles