Find file size inside GZIP file

Is there a way to find out the size of a source file that is inside a GZIP file in java?

As in I have a 15MB a.txt file that was GZipped for a 3MB a.gz. I want to know the size of a.txt present inside a.gz without unpacking a.gz.

+3


source to share


4 answers


There is no really reliable way other than the gun stream. You don't need to save the decompression result, so you can determine the size simply by reading and decode the entire file without taking up space with the decompression result.

There is an unreliable way to determine the uncompressed size, which is to look at the last four bytes of the gzip file, which is the uncompressed length of that entry modulo 32 in small trailing order.

This is unreliable because: a) the uncompressed data can be longer than 2 32 bytes, and b) the gzip file can consist of multiple gzip streams, in which case you can only find the length of the last of these streams.



If you control the source of the gzip files, you know they are made up of single gzip streams, and you know they are not compiled, not 2, and then and only then can you use those last four bytes with confidence.

pigz (which can be found at http://zlib.net/pigz/ ) can do it both ways. pigz -l will give you an unreliable length very quickly. pigz -lt will decode the entire input and give you a reliable length.

+19


source


Below is one approach to this problem - certainly not the best approach, however, since Java does not provide an API method for this (unlike when working with Zip files), this is the only way I could think of other than one of comments above, which talked about reading in the last 4 bytes (assuming the file size is less than 2GB).

GZIPInputStream zis = new GZIPInputStream(new FileInputStream(new File("myFile.gz")));
long size = 0;

while (zis.available() > 0)
{
  byte[] buf = new byte[1024];
  int read = zis.read(buf);
  if (read > 0) size += read;
}

System.out.println("File Size: " + size + "bytes");
zis.close();

      



As you can see, the gzip file is read and the number of bytes read is added to the size of the uncompressed file.

While this method works, I really cannot recommend using it for very large files as it can take a few seconds. (if time is not too tight)

+3


source


public class ReadStream {

    public static void main(String[] args) {
        try {
            RandomAccessFile raf = new RandomAccessFile(new File("D:/temp/temp.gz"), "r");
            try {
                raf.seek(raf.length() - 4);

                int b4 = raf.read();
                int b3 = raf.read();
                int b2 = raf.read();
                int b1 = raf.read();
                int val = (b1 << 24) | (b2 << 16) + (b3 << 8) + b4;

                System.out.println(val);

                raf.close();
            } catch (IOException ex) {
                Logger.getLogger(ReadStream.class.getName()).log(Level.SEVERE, null, ex);
            }
        } catch (FileNotFoundException ex) {
            Logger.getLogger(ReadStream.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

      

+1


source


GZIP doesn't let you know the size of the content in advance. These are the ways to manage it that I can think of depending on your requirements:

  • unzip the stream on the fly and or abort it if it is too large.
  • unzip the stream, but without writing out the contents. It will get
  • the size of the uncompressed data without taking up space. This is just the processing cost for reading and inflating.
  • switch to using zip files (which have entries that can tell you the length in advance)
  • If you know the type of data you usually get, you can statistically estimate the size based on the size of the gzip compressed.
0


source







All Articles