Most efficient way to read compressed Zlib file in Golang?

I am reading and at the same time parsing (decodes) a file in a custom format that is compressed using zlib. My question is, how can I efficiently decompress and then parse the uncompressed content without increasing the chunk? I would like to parse it by reading it into a reusable buffer.

This is for a fast application and so I would like to read it as efficiently as possible. I usually just ioutil.ReadAll

and then iterate over the data again to analyze it. This time, I would like to parse it as it read, without having to grow the buffer it reads into for maximum efficiency.

Basically, I think that if I can find a buffer of the ideal size, then I can read it, parse it, and then write over the buffer again, then parse it, etc. The problem here is that the zlib reader reads an arbitrary number of bytes each time it is called Read(b)

; it does not fill the slice. Because of this, I don't know what the ideal buffer size would be. I am concerned that it might split some data that I wrote into two parts, which makes it difficult to parse because it can be said that uint64 can be split into two reads and therefore not appear in the same buffer - or, perhaps it may never happen and is it always read in chunks of the same size as they were originally written?

  • What is the optimal buffer size or is there a way to calculate it?
  • If I have written the data to a zlib-writer with f.Write(b []byte)

    , is it possible that this same data can be split into two reads when reading the compressed data (which means I have to have history while parsing), or is it always returned in the same read?
+3


source to share


2 answers


OK, so I figured it out at the end using my own reader implementation.

Basically, the structure looks like this:

type reader struct {
 at int
 n int
 f io.ReadCloser
 buf []byte
}

      

This can be connected to a zlib reader:



// Open file for reading
fi, err := os.Open(filename)
if err != nil {
    return nil, err
}
defer fi.Close()
// Attach zlib reader
r := new(reader)
r.buf = make([]byte, 2048)
r.f, err = zlib.NewReader(fi)
if err != nil {
    return nil, err
}
defer r.f.Close()

      

Then x number of bytes can be read directly from the zlib reader using the following function:

mydata := r.readx(10)

func (r *reader) readx(x int) []byte {
    for r.n < x {
        copy(r.buf, r.buf[r.at:r.at+r.n])
        r.at = 0
        m, err := r.f.Read(r.buf[r.n:])
        if err != nil {
            panic(err)
        }
        r.n += m
    }
    tmp := make([]byte, x)
    copy(tmp, r.buf[r.at:r.at+x]) // must be copied to avoid memory leak
    r.at += x
    r.n -= x
    return tmp
}

      

Please note that I don't need to check for EOF, because my parser has to stop at the right place.

0


source


You can wrap your zlib reader in a bufio reader and then inject a specialized reader from above that will rebuild your chunks of data reading from the bufio reader until the full chunk is read. Be aware that bufio.Read calls Read at most once on the underlying Reader, so you need to call ReadByte in a loop. bufio, however, will take care of the unpredictable size of the data returned by the zlib reader for you.

If you don't want to implement a specialized reader, you can just go with the bufio reader and read as many bytes as needed using ReadByte () to populate the given data type. The optimal buffer size is at least the size of your largest data structure, to the extent that you can shove it into memory.



If you are reading directly from a zlib reader, there is no guarantee that your data will not be split between the two reads.

Another, possibly cleaner solution is to implement writing for your data, then use io.Copy (your_writer, zlib_reader).

0


source







All Articles