Sending a text file over a highly compressed network

I have a text file that I want to send over the network, this file can vary in size from 1KB to 500KB.
What algorithms / techniques can I use to tightly compress this file before sending it, so that the fewest bytes are transferred over the network and the compression is high?

+2


source to share


3 answers


For compression I would consider gzip, bzip2 and LZMA (this is not an exhaustive list, but it is IMO the most famous).

Then I would search the net for some benchmarks and try to collect metrics for different file types (text, binary, mixed) and size (small, large, huge). Even if you are interested in the compression ratio, you can look at: compression ratio, compression time, memory size, decompression time.

According to Quick Test: Gzip vs Bzip2 vs LZMA :

[...] gzip is very fast and has little memory. According to this benchmark, neither bzip2 nor lzma can compete with gzip in terms of speed or memory usage. bzip2 has a significantly better compression ratio than gzip, which should be the reason for the popularity of bzip2; it is slower than gzip, especially in decompression, and uses more memory. However, the memory requirements of bzip2 are not currently a problem, even for older hardware.

[...]

LZMA clearly has the potential to become the third widely used general purporse compression format on * NIX systems. It mainly competes with bzip2, offering significantly better compression ratios while keeping the decompression speed relatively close to that of gzip.

This is confirmed in LZMA - better than bzip2 :

The description is impressive, in short:

  • The best compression ratio (with the best compression ratio when gzip reaches 38%, bzip2 - 34%, LZMA - 25%).
  • Compression gain is shown mostly for binaries .
  • The decompression time is much faster (3-4 times) than bzip2.
  • The algorithm allows execution in parallel (but the tool I'll describe here is single-threaded).

There are also disadvantages:

  • Compression (excluding lower layers) is much slower than bzip2.
  • Compression requires much more memory than bzip2.


So, for compressing text files the same site reports:

The first thing I used was LZMA to compress my zip archive. Spam file (mail in mbox format) I chose 528MB large and I will use the maximum compression ratio. At the time of compression, the lzma process was large, 370 MB, that's a lot :) bzip2 was below 7MB. It took almost 15 minutes to compress the file from lzma and less than 4 minutes on bzip2. The compressed ration was very similar: the output file is 373 MB for bzip2 and 370 MB for lzma. Decompression times are 1m12s for lzma and 1m48s for bzip2.

Finally, here is another resource with graphical results: Compression tools: lzma, bzip2 and gzip

I would recommend that you do your own bench (since you will only compress text and very small files) to get real metrics in your environment, but my bet is that LZMA

it will not provide a significant advantage in small text files, so bzip2

would be worthy choice (even though the time and memory overhead LZMA

can be low on small files).

If you plan on doing compression from Java you will find the LZMA

implementation here , bzip2 implementation here (from Apache Ant AFAIK) gzip

included in the JDK. If you don't want or cannot rely on a third party library, use gzip.

+6


source


The answer depends on the content. GZip is included in jdk. Random string tests appear to decrease 33% on average.



[edit: content, not context]

+4


source


It depends. Can you control the size of the network packet? Are you going to bind them if there are more than 1 hitting the packet? Are you CPU limited at both ends? Not really a question, but still related to the fact that compressing and decompressing may take longer than sending bytes at times.

0


source







All Articles