Python tarfile size

I can calculate the size of the files in the tarfile as follows:

import tarfile
tf = tarfile.open(name='my.tgz', mode='r')
reduce(lambda x,y: getattr(x, 'size', x)+getattr(y,'size',y), tf.getmembers())

      

but the total size returned is the sum of the items in the tarfile, not the compressed size of the file (at least that's what I believe after trying this). Is there a way to get the compressed size of the entire tar file without checking it through something like os.path.getsize?

+3


source to share


1 answer


Not.

The way tar.gz works, the file is piped via gzip to get a simple tar archive. tar (1) does not know that the archive was compressed in the first place, so it cannot know about the compressed sizes [*].

This is unlike archive formats like ZIP, which compress on their own.



The advantage of the tar approach is that you can use whatever compression you like. If some better compressor comes along, you can easily repack your archives. In addition, since everything is packaged in one big data stream, the compression ratio is slightly better and metadata such as filenames are compressed as well.

The downside is that you have to search the archive for a file to unpack the individual items.

[*]: first tar (1) implementations did not have the -z option; it was added later when people started using gzip a lot. In the early days, standard compression was used compress

to get tar.Z

.

+2


source







All Articles