Why do you need a canonical format for a GUID?

One hard day at work, I noticed that the GUID I generated in the usual .NET way Guid.NewGuid()

had the same number 4

at the beginning of the third block:

efeafa5f-fe21-4ab4-ba82-b9eefd5fa225
480b64d0-6762-4afe-8496-ac7cf3292898
397579c2-a4f4-4611-9fda-16e9c1e52d6a
...

      

There were ten of them appearing on the screen once a second or so. I followed this pattern right after the fifth GUID. Finally, the last one had the same four bits inside, and I decided I was a lucky guy. I went home and felt that the whole world was open to such an exceptional person like me. The next week I found a new job, cleaned my room, and called my parents.

But today I ran into the same model again. Thousands of times. And I don't feel the Chosen One anymore.

I googled and now I know about UUID and canonical format with 4 reserved bits for version

and 2 for variant

.

Here's a snippet to experiment with:

static void Main(string[] args)
{
    while (true)
    {
        var g = Guid.NewGuid();
        Console.WriteLine(BitConverter.ToString(g.ToByteArray()));
        Console.WriteLine(g.ToString());
        Console.ReadLine();
    }
}

      

But still there is one thing that I do not understand (except how to keep living). Why do we need these reserved bits? I see how it can do harm - revealing details of internal implementation, more collisions (still nothing to worry about, but one day ...), more suicide - but I don't see any benefit. Can you help me find someone?

Inside GUID generation algorythm

+3


source to share


1 answer


This is so that if you update the algorithm, you can change this number. Otherwise, 2 different algorithms could have output the same UUID for different reasons, resulting in a collision. This is the version identifier.

For example, consider the contrived simplified UUID format:

00000000-00000000
  time  -   ip

      

now suppose we change this format for some reason:



00000000-00000000
   ip   -  time

      

This can cause a collision when the machine with IP 12.34.56.78 generates the UUID using the first method at time 01234567, and then the second machine with IP 01.23.45.67 generates the UUID at time 12345678 using the newer method. But if we reserve some bits for the version identifier, it cannot cause a collision.

A value of 4 specifically refers to a randomly generated UUID (which is why it relies on the minimum chance of collisions given that many bits) and not other methods that might use combinations of time, mac address, pid, or other kinds of time and space identifiers to ensure uniqueness.

See the relevant spec here: https://tools.ietf.org/html/rfc4122#section-4.1.3

+5


source







All Articles