The best compression algorithm with the following characteristics

What is the best compression algorithm with the following features:

  • takes less time to decompress (may require significantly more time compression)
  • should be able to compress sorted data (approximate list of 3,000,000 lines / integers ...)

Please suggest along with the metric: compression ratio, algorithmic complexity of compression and decompression (if possible)?

+1


source to share


4 answers


The entire site dedicated to compression benchmarking is here



+11


source


Ok, if you just want speed, then standard ZIP compression is just fine and most likely integrated into your language / frame already (ex: .NET has this, it has Java). Sometimes the most versatile solution is the best, ZIP is a very mature format, any ZIP library and application will work with any other.

But if you want to improve compression, I would suggest 7-Zip as the author is very smart, easy to get around and encourages people to use the format.



It is not possible to provide you with compression times as it is directly related to your hardware. If you need a test, you must do it yourself.

+1


source


You don't need to worry about decompression times. The time taken for a higher level of compression basically finds the longest match pattern.

Decompression either

1) Writes the literal 
2) for (backward position, length)=(m,n) pair, 
   goes back, in the output buffer, m bytes, 
   reads n bytes and 
   writes n bytes at the end of the buffer.

      

Thus, the decompression time is independent of the compression level. And, having experience with the Universal Decompression Virtual Machine (RFC3320), I think the same is true for any decompression algorithm.

+1


source


This is an interesting question. With such sorted data of strings and integers, I would expect the difference coding compression approaches to outperform any out-of-the-box text compression approach, since LZ77 or LZ78 in terms of compression ratio. The general purpose encoder does not use special data properties.

0


source







All Articles