Generating URL shortcode in C #

I am using this article to generate a shortcode for a URL.

I've been working on this for a while and the pseudocode just doesn't make any sense to me. It states in "loop1" that I should look from the first 4 bytes to 4 by 4 bytes and then discard the bytes to an integer and then convert them to bits. I get 32 โ€‹โ€‹bits for every 4 bytes, but it uses 5 bytes in "loop3" which is not divisible by 32. I don't understand what he is trying to say.

Then I noticed that it closes "loop2" at the bottom after you have written the shortcode to the database. It doesn't make any sense to me because I will be writing the same shortcode into the database over and over again.

Then I have "loop1" going to loop on infinity, again I can't see why I need to update the database ad infinitum.

I tried to follow his example and run it through the debugger one by one, but that doesn't make sense.

Here is the code I have so far, according to what I was able to understand:

        private void button1_Click(object sender, EventArgs e)
    {
        string codeMap = "abcdefghijklmnopqrstuvwxyz012345"; // 32 bytes

        // Compute MD5 Hash
        MD5 md5 = MD5.Create();
        byte[] inputBytes = Encoding.ASCII.GetBytes(txtURL.Text);
        byte[] hash = md5.ComputeHash(inputBytes);

        // Loop from the first 4 bytes to the 4th 4 bytes
        byte[] FourBytes = new byte[4];
        for (int i = 0; i <= 3; i++)
        {
            FourBytes[i] = hash[i];
            //int CastedBytes = FourBytes[i];
            BitArray binary = new BitArray(FourBytes);
            int CastedBytes = 0;
            for(int ii = 0; i <=5; i++)
            {
                CastedBytes = CastedBytes + ii;
            }

        }

      

Can someone help me figure out what I am doing wrong so I can get this program to work? I just need to convert urls to short 6 digit unique codes.

Thank.

+3


source to share


2 answers


Your MD5 hash is 128 bits. The idea is to represent these 128 bits in 6 characters, ideally without any loss of information.

The code contains 32 characters

string codeMap = "abcdefghijklmnopqrstuvwxyz012345"

      

Note that 2 ^ 5 is also 32. The third loop uses 5 bits of the hash at a time and converts those 5 bits to a character in codeMap. For example, for the bit pattern

00001 00011 00100
  b     d     e

      

The algorithm uses 6 sets of 5 bits, so 30 bits in total. 2 bits are "lost".



Note that 128-bit MD5 is received 4 bytes at a time, and those 4 bytes are converted to integer. This is one way to use the MD5 bit, but certainly not the only one. It includes bit masking and bit offset.

You may find it easier to use BitArray to implement. While this is probably slightly less effective, it hardly matters. If you go this route, initialize the BitArray with the bits of your MD5 hash, and then just take 5 bits at a time, converting them to a number in the range 0..31 to use it as an index in the codeMap.

This bit from the article is misleading

6 shortcode characters can be used to match 32 ^ 6 (1,073,741,824) URLs, so they are unlikely to be used in the near future

Because of the potential for hash collisions, the system can manage much less than 1 billion URLs without significant risk of the same short URL being assigned to two long URLs. See Birthday for details .

+3


source


If you don't expect to get a very popular url, just use base 16 or base 64 from the auto-grow column.



Base 16 will provide 16 million unique URLs. Base 64 would provide ~ 2 ^^ 36.

+1


source







All Articles