The most efficient way to encode two positions between 0 and 64?

I have 64-bit values ​​that I want to compress using the fact that only a part somewhere in the middle contains data, and before and after that there are zeros.

Let's say the actual data is 1 bit long and padded with n 0s at the front and m 0s at the end, so n + l + m = 64. Instead of transferring / storing 64 bits, I can transfer l bits plus whatever you need to encode the position of the data in the 64-bit interval.

For example, say I was saving l, m and data bits, then I would restore the original 64-bit pattern by reading l, reading l data bits, reading m and shifting m data bits to the left.

The smallest overhead I could think of is twice 6 bits to store two of l, n, and m (each can be between 0 and 64). Can this number be reduced?

+2


source to share


5 answers


l can be between 0 and 64, so don't send l, send n and m as they can be zero and don't need to go up to 64 (they just have to be able to add to 64).

The l bits must start and end with 1, so they do not need to be transmitted.

send 6 bits for n send up to 6 bits for m (see below)
calculate l = 64 - (n + m)
if l = 0, number is 0, don't send anything else
if l = 1, number is 1 * 2 ^ m, do not send anything else
if l = 2, number is 3 * 2 ^ m, do not send anything else,
send middle l - 2 bits.



Maximum overhead = 10 bits.

The reduction in bits for m is because if n> 32, then you know m <32, so only 5 bits are required
if n> 48, then you know m <16, so only 4 bits are required
if n> 56, then you know m <8, so only 3 bits are required
if n> 60, then you know m <4, so only 2 bits are required
if n = 63, then you know m <2, so only 1 bit is required

+2


source


Your analysis sounds right for individual objects. But if you are passing a large number of such values ​​together, a general entropy encoding algorithm such as gzip will probably be better, as it can eliminate strings of zeros entirely, and also use redundant data in the data.



+4


source


As you stated the problem, you cannot do better than the suggested solution.

However, if the distribution of zeros in the numbers is skewed, you can get better compression on average by using Huffman codes or a similar method to represent counters. Another possibility is to use delta coding if the zero distribution is highly correlated from one 64-bit value to another.

In any case, you will need to use a variable number of bits to represent the numbers of zeros. And if your assumptions about skew or correlation turn out to be false, you may end up using more bits on average than if you did it in a simple way.

+3


source


Your solution seems pretty good.
Huffman coding is another way to compress your values, especially if there are high frequency values.

It's not very difficult to implement, but it can be overwhelming if you don't have much data to transfer.

+1


source


There are 64

possible starting positions for a n

sequence of ones, and the length of the sequence l

can be no more than 64 - n

. So,

r = sum(n = 0..63, 64 - n) + 1

      

generally. Added - for a sequence of all zeros. Doing some math gives the following.

r = 64 * 64 - (63 * 64) / 2 + 1
  = 2081

      

A bit is required to represent 2081 possible values log2(2081) = 11.023

. Your suggestion to encode information using two bit numbers 6

, generally requiring bits 12

, is optimal (assuming equal distributions of all possible values).

0


source







All Articles