Least number of characters

I need to represent both very large and small numbers in the shortest string. Unsigned numbers. I've tried just Base64 encoded, but for some smaller numbers, the encoded string is longer than just storing the number as a string. What would be the best way to optimally store a very large or short number in the shortest string if it were safe?

+3


source to share


3 answers


I've tried just Base64 encoded but for some smaller numbers the encoded string is longer than just storing the number as a string

Basic encoding of binary byte data will make it take about a third longer. It doesn't have to shorten it, but it allows you to safely carry binary data in formats that are not binary.

However, base 64 is more compact than the decimal representation of a number (or byte data), even though it is less compact than base 256 (raw byte data). Encoding your numbers in base 64 will directly make them more compact than decimals. This will do it:



private static final String base64Chars =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";

static String encodeNumber(long x) {
    char[] buf = new char[11];
    int p = buf.length;
    do {
        buf[--p] = base64Chars.charAt((int)(x % 64));
        x /= 64;
    } while (x != 0);
    return new String(buf, p, buf.length - p);
}

static long decodeNumber(String s) {
    long x = 0;
    for (char c : s.toCharArray()) {
        int charValue = base64Chars.indexOf(c);
        if (charValue == -1) throw new NumberFormatException(s);
        x *= 64;
        x += charValue;
    }
    return x;
}

      

Using this encoding scheme, there Long.MAX_VALUE

will be a string that H__________

is 11 characters long compared to its decimal representation 9223372036854775807

of 19 characters long. Numbers up to 16 million will fit in just 4 characters. It's about as short as you get it. (Technically, there are two other characters that don't need to be URL-encoded: .

and~

. You can include them to get base 66, which will be slightly smaller for some numbers, although that seems a little pedantic.)

+3


source


Assuming you are not doing compression, and that you are restricting yourself to safe URLs, the following procedure will give you the most compact encoding.

  • List all safe URLs
  • Count them up. Suppose you have N.
  • Submit your number in base N, representing 0 as the first character, 1 as the second, etc.

So how about compression ...

If you assume that the numbers you represent are evenly distributed over their range, then there is no real possibility of compression.



Otherwise, there is a potential for compression. If you can reduce the size of the total numbers, then you can usually make savings through compression. This is how Huffman encoding works.

But the disadvantage is that the compression at this level is not ideal over the range of numbers. This reduces the size of some numbers, but it inevitably increases the size of others.


So what does this mean for your use case?

I think this means that you are looking at the problem incorrectly. You shouldn't be aiming for a minimum coded size for each number. You should aim to minimize the size on average ... averaged over the actual distribution of your numbers.

+1


source


To extend on Stephen C's answer, here is a code snippet to convert to base 62 (but you can expand it by adding more characters to the string digits

(just pick which characters are valid for you):

public static String toString(long n) {
   String digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
   int base = digits.length();
   String s = "";
   while (n > 0) {
      long d = n % base;
      s = digits.charAt(d) + s;
      n = n / base;
   }
   return s;
}

      

This will never cause the string representation to be longer than a digit.

+1


source







All Articles