Generating URL shortcode in C #
I am using this article to generate a shortcode for a URL.
I've been working on this for a while and the pseudocode just doesn't make any sense to me. It states in "loop1" that I should look from the first 4 bytes to 4 by 4 bytes and then discard the bytes to an integer and then convert them to bits. I get 32 โโbits for every 4 bytes, but it uses 5 bytes in "loop3" which is not divisible by 32. I don't understand what he is trying to say.
Then I noticed that it closes "loop2" at the bottom after you have written the shortcode to the database. It doesn't make any sense to me because I will be writing the same shortcode into the database over and over again.
Then I have "loop1" going to loop on infinity, again I can't see why I need to update the database ad infinitum.
I tried to follow his example and run it through the debugger one by one, but that doesn't make sense.
Here is the code I have so far, according to what I was able to understand:
private void button1_Click(object sender, EventArgs e)
{
string codeMap = "abcdefghijklmnopqrstuvwxyz012345"; // 32 bytes
// Compute MD5 Hash
MD5 md5 = MD5.Create();
byte[] inputBytes = Encoding.ASCII.GetBytes(txtURL.Text);
byte[] hash = md5.ComputeHash(inputBytes);
// Loop from the first 4 bytes to the 4th 4 bytes
byte[] FourBytes = new byte[4];
for (int i = 0; i <= 3; i++)
{
FourBytes[i] = hash[i];
//int CastedBytes = FourBytes[i];
BitArray binary = new BitArray(FourBytes);
int CastedBytes = 0;
for(int ii = 0; i <=5; i++)
{
CastedBytes = CastedBytes + ii;
}
}
Can someone help me figure out what I am doing wrong so I can get this program to work? I just need to convert urls to short 6 digit unique codes.
Thank.
source to share
Your MD5 hash is 128 bits. The idea is to represent these 128 bits in 6 characters, ideally without any loss of information.
The code contains 32 characters
string codeMap = "abcdefghijklmnopqrstuvwxyz012345"
Note that 2 ^ 5 is also 32. The third loop uses 5 bits of the hash at a time and converts those 5 bits to a character in codeMap. For example, for the bit pattern
00001 00011 00100
b d e
The algorithm uses 6 sets of 5 bits, so 30 bits in total. 2 bits are "lost".
Note that 128-bit MD5 is received 4 bytes at a time, and those 4 bytes are converted to integer. This is one way to use the MD5 bit, but certainly not the only one. It includes bit masking and bit offset.
You may find it easier to use BitArray to implement. While this is probably slightly less effective, it hardly matters. If you go this route, initialize the BitArray with the bits of your MD5 hash, and then just take 5 bits at a time, converting them to a number in the range 0..31 to use it as an index in the codeMap.
This bit from the article is misleading
6 shortcode characters can be used to match 32 ^ 6 (1,073,741,824) URLs, so they are unlikely to be used in the near future
Because of the potential for hash collisions, the system can manage much less than 1 billion URLs without significant risk of the same short URL being assigned to two long URLs. See Birthday for details .
source to share