How is gzip file size encoded?

The gzip file format contains the (uncompressed / original) file size encoded in the last 4 bytes of the compressed file. The "gzip -l" command reports compressed and uncompressed sizes, compression ratio, original filename.

Looking back on stackoverflow there are several mentions of decoding the size encoded in the last 4 bytes.

What is size encoding? Big-endian (high-order byte first), Little-endian (low-order byte) and is the value signed or unsigned?

This piece of code works for me,

FILE* fh; //assume file handle opened
unsigned char szbuf[4];
struct stat statbuf;
fstat(fn,&statbuf);
unsigned long clen=statbuf.st_size;
fseek(fh,clen-4,SEEK_SET);
int count=fread(szbuf,1,4,fh);
unsigned long ulen = ((((((szbuf[4-1] << 8) | szbuf[3-1]) << 8) | szbuf[2-1]) << 8) | szbuf[1-1]);

      

Here are some related posts that seem to imply little-endian and unsigned long (0..4GB-1).

Determine Uncompressed GZIP File Size

GZIPOutputStream does not update Gzip size bytes

Determine file size in gzip

Gzip.org more information on Gzip

+3


source to share


1 answer


The RFC says it's modulo 2 ^ 32 which means uint32_t

, and experimenting with .Net GZipStream

gives it as little-endian.



RFC 1952

+4


source







All Articles