Gettimeofday clock_gettime solution to generate unique number

My process starts multiple instances (processes) and multiple threads, and they all write to the same database. Once the request is placed, a unique req id is generated for the entry to be added to the proprietary db. Here are our limitations: the length cannot be more than 9 char, it must be hhmmss as the first 6 characters. We decided to use ms for the last 3 digits to fill in 9 characters, and we do it all using gettimeofday (). However, with increasing traffic, there are now collision cases where multiple requests are placed in the ms period. This, coupled with the fact that gettimeofday () is not precise, causes more collisions. I tried to use clock_gettime, but in testing it is also not as accurate as I noticed from the following test program:

  • We couldn't use static or global variables due to threading issues.
  • Cannot use random numbers as they must be sequential

Appreciate any help.

#include <time.h>

int main( int argc, char **argv )
{
    long i;
    struct timespec start, stop;
    double gap;

    clock_gettime( CLOCK_REALTIME, &start);

    for (i =0; i< 123456789 ; i++);

    clock_gettime( CLOCK_REALTIME, &stop);

    gap = ( stop.tv_sec - start.tv_sec ) + ( stop.tv_nsec - start.tv_nsec ) / 1000000;
    printf( "%lf ms\n", gap );
    return 0;
}

      

0


source to share


4 answers


Using a timestamp as a unique identifier will never work reliably unless you limit yourself to just one transaction in the lowest tick (in this case, 1 millisecond).

Since you are stuck with using the time value for the first 6 of 9 bytes, you need to try to set the maximum allowed range to the last 3 bytes.

If you can manage to avoid using ASCII characters in the last three bytes, you should avoid doing this, as this will limit the values ​​that can make a big difference. If possible, you should try to use these bytes as a 24-bit integer (range 16777216) and just increment the counter every operation. Then you can set it to 0 every time gettimeofday informs you that the time has changed. (or you can set up a repeating SIGALRM to tell when to call gettimeofday again to update your time and 0 is a 24-bit integer).



If you are forced to use ASCII printable characters for these bytes, then things get a little more complicated. The easiest way to expand the range of this is to use hexadecimal rather than decimal numbers. This increases your display range from 1000 to 4096. However, you can do better if you use an even wider base of numbers. If you applied the first 22 characters of the alphabet (same as for the first 6 letters, for hexadecimal), you can imagine values 32x32x32

equal to 32768. That would be a lot of transactions per second. You can do even better if you expand your numeric alphabet even further, but it becomes more fragmented, as you are, since you probably want to restrict certain characters to appear in meaning. Using a view that's easy to work with strtol

orstrtoul

, will be easier to program.

If your application is multithreaded, you might want to consider part of your numeric range as a thread ID and let each thread keep its own transaction counter. This will make the relative time between two transactions being processed by different threads more difficult to compute, but it will keep threads from everyone wanting to increase the same memory location (which may require a mutex or semaphore).

0


source


The type of problem you are describing has more or less been resolved by issuing a UUID already. This is a system that is designed to solve all the problems you mentioned and some others.

Linux library: http://linux.die.net/man/3/uuid



More information is available here: http://en.wikipedia.org/wiki/Universally_unique_identifier

+1


source


It is generally a bad idea to use clock time in a heavily loaded system, such as resolution per second. Threads will sample the timestamp and then be scheduled in the middle of the operation, so you will see things go out of order.

The three characters left to encode things unambiguously isn't that much. Try to at least use some other encoding like base64.

If you are using it gcc

as a compiler, you have Thread Local Storage (TLS) as an extension that is quite efficient. Just attach the variable static

to __thread

(or so). If you have restrictions on phtreads, there are also tools to create special keys pthread_get_key

. But it would be better to have the information on the thread's stack as long as possible.

To get the flow counter that makes the serial number for your request use

  • your label is hhmmss as such
  • so many bits that you need to identify your themes
  • last bit for stream serial number as above, wrap in more than a second

You can even cheat and a yield

thread that runs too many requests within one second.

0


source


I think you could give each thread of each process a unique id when starting, I guess it will only take one of the 3 available characters if you don't have hundreds of threads. Then you can use a local counter for the stream to set the last two characters (using base64 or even more, whichever characters are allowed to get enough amplitude).

In this situation, the only case where a collision can occur is if the thread counter wraps for the same second.

Of course this is a dirty hack. The correct way would be to share the resource between threads / processes. This might be the simplest solution in your case.

0


source







All Articles