Sorting algorithms for equal length strings C ++

I need to sort about 100,000 ASCII stringsbetically and by length, I sort by lengths by putting them in a 2D vector by string length, then sorting each array using quicksort (for ASCIIbetically). But is there a faster view for strings of equal length? I heard that radix is ​​great, but I find it difficult to understand. What would be the best way to sort equal length strings without using the sort () function? If you need the code, I can post it.

+3


source to share


2 answers


I think constructing the trie and then extracting the keys in the trie using pre-traversal is about as efficient as getting it to sort the strings and is actually a form of kind of radix. Here is a detailed tutorial document that discusses this method. In 2006, at least, it was the fastest method for sorting strings.



+2


source


For strings between 8 and 15 characters, the quick sort compare function can perform the first 8 characters in a single 64-bit chunk. And so on, from 16 to 31, etc. So, you get as many comparison functions as you feel it matters. Unless you have a very large number of lines with long total lines, just using what you know about line lengths can do the trick straight.

For completeness, you need to worry about alignment and byte order. So, sampling 8 bytes at a time in uint64_t:

  uint64_t u ;

  memcpy(&u, pv, 8) ;
  ...convert to big-endian if required...

      



will do the trick. I can tell you that with gcc and -O2 on x86_64 it memcpy()

compiles into one command as if it were u = *(uint64_t*)pv

:-) For processors with alignment problems, I would hope the compiler does something appropriate.

Unfortunately memcmp(foo, bar, 8)

doesn't get the same handling (at least on gcc 4.8, not even with -O3): --(

+1


source







All Articles