How to generate a smaller unique number from a larger (11 bytes) unique number? General C

I have multiple devices. Each of them has its own unique identifier, consisting of 11 bytes, for example: 0x34 0x57 0x36 0x31 0x38 0x39 0x15 0x0e 0x00 0x0f 0x00. I would like to generate a unique number (0-32767 or 0-65535) to defer a response when the FindDevices command is passed. When the devices start responding at the same moment, there is a problem of RS485 bus competition, so I would like to avoid this.

My first attempt was to sum all 11 bytes of the unique ID, call srand (sum) on the seed generator, and then call rand () to get my value. But unfortunately this is a bad solution and in my batch of devices I have 2 devices with a unique ID, but the same "(not that) unique" number :(

UID1: 34 57 36 31 38 39 15 0e 00 0f 00 sum: 405 decimal generated number: 23860

UID2: 34 57 33 31 35 39 04 18 00 1c 00 sum: 405 decimal generated number: 23860

Devices don't know what number was generated on other devices or what unique identifiers they have, so they can't just compare them.

Any idea for creating such a unique number (0-32767 or 0-65535)?

Edit: List of unique ids (as hex) of the batch from my bench: 01. 34 57 36 31 38 39 15 0E 00 02 00 02. 34 57 36 31 38 39 15 0E 00 06 00 03. 34 57 36 31 38 39 15 0E 00 0A 00 04. 34 57 36 31 38 39 15 0E 00 0E 00 05. 34 57 36 31 38 39 15 0A 00 14 00 06. 34 57 36 31 38 39 15 0A 00 1C 00 07. 34 57 36 31 38 39 15 09 00 23 00 08. 34 57 36 31 38 39 15 0A 00 24 00 09. 34 57 36 31 38 39 0E 1D 00 1A 00 10. 34 57 36 31 38 39 15 0E 00 09 00 11. 34 57 33 31 35 39 04 10 00 20 00 12. 34 57 33 31 35 39 04 18 00 1C 00 13. 34 57 36 31 38 39 15 0E 00 0F 00 14. 34 57 36 31 38 39 15 0E 00 13 00 15. 34 57 36 31 38 39 15 0E 00 17 00 16. 34 57 36 31 38 39 15 0E 00 1F 00 17. 34 57 36 31 38 39 15 0A 00 25 00

It looks like they are unique, but many bytes are repeated / constant. A good solution is to generate values ​​placed in the entire range even if the input values ​​are close to each other :)

Edit2: Here are the results for all solutions from your answers:

Test results:
OP:  1977, H1L: 14759, H1H: 13938, H2L:  7189, H2H: 36686, H3L: 14759, H3H: 13938, H4:  2652, PRS: 61086
OP:  3669, H1L: 13735, H1H: 12914, H2L:  8213, H2H: 37710, H3L: 13735, H3H: 12914, H4:  6748, PRS: 25852
OP:  5361, H1L: 16807, H1H: 15986, H2L:  5141, H2H: 34638, H3L: 16807, H3H: 15986, H4: 10844, PRS: 40974
OP:  7053, H1L: 15783, H1H: 14962, H2L:  6165, H2H: 35662, H3L: 15783, H3H: 14962, H4: 14940, PRS: 19836
OP:  7899, H1L: 18507, H1H: 25943, H2L:  3441, H2H: 24681, H3L: 18507, H3H: 25943, H4: 21076, PRS:  4898
OP: 11283, H1L: 20555, H1H: 27991, H2L:  1393, H2H: 22633, H3L: 20555, H3H: 27991, H4: 29268, PRS: 10065
OP: 13821, H1L:   391, H1H: 26260, H2L: 21557, H2H: 24364, H3L:   391, H3H: 26260, H4: 36434, PRS: 63904
OP: 14667, H1L: 30795, H1H: 38231, H2L: 23902, H2H: 12393, H3L: 30795, H3H: 38231, H4: 37460, PRS: 46300
OP: 15513, H1L: 23009, H1H: 40628, H2L: 31688, H2H:  9996, H3L: 23009, H3H: 40628, H4: 27066, PRS: 21678
OP: 21322, H1L: 17063, H1H: 16242, H2L:  4885, H2H: 34382, H3L: 17063, H3H: 16242, H4:  9820, PRS: 60787
OP: 22168, H1L: 31736, H1H: 54522, H2L: 22961, H2H: 61623, H3L: 31736, H3H: 54522, H4: 32801, PRS: 20737
OP: 23860, H1L:  3760, H1H: 10032, H2L: 18188, H2H: 40592, H3L:  3760, H3H: 10032, H4: 28721, PRS: 50696
OP: 23860, H1L: 15527, H1H: 14706, H2L:  6421, H2H: 35918, H3L: 15527, H3H: 14706, H4: 15964, PRS: 28319
OP: 25552, H1L: 10407, H1H:  9586, H2L: 11541, H2H: 41038, H3L: 10407, H3H:  9586, H4: 20060, PRS: 60097
OP: 27244, H1L:  9383, H1H:  8562, H2L: 12565, H2H: 42062, H3L:  9383, H3H:  8562, H4: 24156, PRS:  5512
OP: 30628, H1L: 11431, H1H: 10610, H2L: 10517, H2H: 40014, H3L: 11431, H3H: 10610, H4: 32348, PRS: 55107
OP: 31474, H1L: 30539, H1H: 37975, H2L: 24158, H2H: 12649, H3L: 30539, H3H: 37975, H4: 38484, PRS:  4379
OP: 20035, H1L:     0, H1H:     0, H2L: 21948, H2H: 50624, H3L:     0, H3H:     0, H4:     0, PRS: 26124

      

OP is my original method, which is bad and generates unique numbers for the UIDs that were listed. H1L and H1H are simplified solutions presented by chux. H2L, H2H, H3L, H3H are my modifications (adding ~ to the bottom or top) to see if it gives better results. H4 is the solution provided by alain. PRS is Pearson returning uint16_t suggested by mattinbits.

And the winner is ... PRS! :) All your suggestions generate unique numbers from the listed UIDs, so they are correct, but Pearson's hashing offers better variance of results (tested in Excel;)). Thanks for your help!

+3


source to share


3 answers


What you are trying to do is efficiently find the hash for your ids, where the hash output is smaller than the input, so there will always be a risk of collisions i.e. two device IDs producing the same short ID. However, you will manage to use the correct hashing function than your approach to sum. Just find a simple 8 or 16 bit hash function and use it. For example. https://en.m.wikipedia.org/wiki/Pearson_hashing



+3


source


To generate a unique number in the range of 15 or 16 bits from large numbers (11 bytes or 88 bits), it is > and collision-prone, as happened with OP 23860.

The sum of all 11 bytes is very weak, since the sum of 11 bytes ranges from 0 to 11 * 255 or 2805 with a very uneven distribution. So the code only generates 2806 different seeds for srand()

. It would be better to use a wider integer grouping than 8-bit. Recommend 64-bit groups exclusively or with each other.

Usage srand() / rand()

is a method, but has a weakness for consistency: 1) Portability: the same data on different platforms can give different numbers as C doesn't describe the method much rand()

. 2) Since the two functions are using a global state variable, the code must ensure that another thread / interrupt does not mess up the situation, or that these calls are inconsistent with the consistency in other functions with rand()

. The big problem with rand()

is on systems where RAND_MAX

is 32767, its minimum set value and the code tries [0 ... 65536].

I found consistency to be important as being able to use the same test code across multiple platforms: a significant advantage in code maintenance.

@mattinbits well recommends a good 8/16 bit solution. Why reinvent the wheel? But I don’t drive wheeled vehicles, so ...

If the OP really doesn't need an ID in the entire range [0 ... 32767] or [0 ... 65536] (did the OP have 65535?), Consider a simple portable iterable hash method that relies on %

using prime next to limit to mix the bits well.



// return numbers 0 ... 32748 or 0 ... 65536
unsigned long Hash(const unsigned char ID[11]) {
  unsigned long long Upper;
  unsigned long Lower;
  Upper = (ID[0]*1ULL<<56) | (ID[1]*1ULL<<48) | (ID[2]*1ULL<<40) | (ID[3]*1ULL<<32) |
      (ID[4]*1UL<<24)  | (ID[5]*1UL<<16) | (ID[6]*1U<<8) | ID[7];
  Lower = (ID[8]*1UL<<16) | (ID[9]*1U<<8) | ID[10];
  // Greatest prime <= 32768
  #define Prime_LE_32768 32749
  return (Upper ^ Lower) % Prime_LE_32768;

  // or 
  // Greatest prime <= 65537
  #define Prime_LE_65537 65537
  return (Upper ^ Lower) % Prime_LE_65537;
}

      

[Edit] with possible simplifications.

unsigned Hash(const uint8_t ID[11], unsigned prime) {
  uint64_t H[2] = {0};
  memcpy(H, ID, 11);
  return (H[0] ^ H[1]) % prime;
}

const unsigned char ID[][11] = {
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x02, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x06, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x0A, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x0E, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0A, 0x00, 0x14, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0A, 0x00, 0x1C, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x09, 0x00, 0x23, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0A, 0x00, 0x24, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x0E, 0x1D, 0x00, 0x1A, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x09, 0x00 },
    { 0x34, 0x57, 0x33, 0x31, 0x35, 0x39, 0x04, 0x10, 0x00, 0x20, 0x00 },
    { 0x34, 0x57, 0x33, 0x31, 0x35, 0x39, 0x04, 0x18, 0x00, 0x1C, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x0F, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x13, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x17, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0E, 0x00, 0x1F, 0x00 },
    { 0x34, 0x57, 0x36, 0x31, 0x38, 0x39, 0x15, 0x0A, 0x00, 0x25, 0x00 },
    { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } };

#define Prime_LE_32768 32749
#define Prime_LE_65536 65521u
void test() {
  int i, j;
  for (i = 0; i < sizeof ID / sizeof ID[0]; i++) {
    const char *comma = "";
    for (j = 0; j < 11; j++) {
      printf("%s%02X", comma, ID[i][j]);
      comma = "-";
    }
    printf(" %5u", Hash(ID[i], Prime_LE_32768));
    printf(" %5u\n", Hash(ID[i], Prime_LE_65536));
  }
  puts("");
}

      

Output

34-57-36-31-38-39-15-0E-00-02-00 14759 13938
34-57-36-31-38-39-15-0E-00-06-00 13735 12914
34-57-36-31-38-39-15-0E-00-0A-00 16807 15986
34-57-36-31-38-39-15-0E-00-0E-00 15783 14962
34-57-36-31-38-39-15-0A-00-14-00 18507 25943
34-57-36-31-38-39-15-0A-00-1C-00 20555 27991
34-57-36-31-38-39-15-09-00-23-00   391 26260
34-57-36-31-38-39-15-0A-00-24-00 30795 38231
34-57-36-31-38-39-0E-1D-00-1A-00 23009 40628
34-57-36-31-38-39-15-0E-00-09-00 17063 16242
34-57-33-31-35-39-04-10-00-20-00 31736 54522
34-57-33-31-35-39-04-18-00-1C-00  3760 10032
34-57-36-31-38-39-15-0E-00-0F-00 15527 14706
34-57-36-31-38-39-15-0E-00-13-00 10407  9586
34-57-36-31-38-39-15-0E-00-17-00  9383  8562
34-57-36-31-38-39-15-0E-00-1F-00 11431 10610
34-57-36-31-38-39-15-0A-00-25-00 30539 37975
00-00-00-00-00-00-00-00-00-00-00     0     0

      

+4


source


A solution in which each device generates an identifier on its own, for example using a hash approach, has the advantage of being unconnected. On the other hand, the number must be large enough to make collisions very unlikely. 16 bits is not much.

One practical way I see is to use a number based on a property of the device, such as the processor serial number or the RS485 bus address (but I suspect you want to use the ID we're talking about for this purpose). If the 11 byte unique identifier or serial number you have access to is not random, you can use this non-random function to make a better hash function than any generic one. This is because a general purpose hash function has to mix bits evenly because it doesn't know which bits change more than others. If you know which bits are different from each other, you can use that knowledge to create a better hash function. For example, if you have a counter, the least significant bits are selected for "unique"identifier.

If this is not possible, I will try to create a communication scheme that detects the counter IDs. This should be possible because every device sees all communication on the RS485 bus.

Edit: Based on the example data, I would pick these bits:

00 00 00 00 01 00 1B 1F 00 3F 00

      

and implement a hash function like:

unsigned short hash(const unsigned char ID[11]) {
    return (ID[9] << 10)
            | ((ID[6] & 0x18) << 5)
            | ((ID[6] & 0x03) << 6)
            | ((ID[7] & 0x1F) << 1)
            | (ID[4] & 0x01);
}

      

This feature will most likely create unique identifiers for devices added in the future.

+1


source







All Articles