An algorithm for sorting an array of large objects has arrived; can anyone tell me what this algorithm is called? (couldn't find it on google)

I needed to sort a large array of objects and I thought: could there be a way to minimize the number of swaps?

So, I used quicksort (but any other quick view should work here too) to sort the indices to the elements in the array; indices are cheap to exchange. I then used these indices to swap the actual objects back into place. Unfortunately, O (n) extra space is used to store indexes. The code below illustrates the algorithm (which I call IndexSort ), and in my tests seems to be faster than simple sorting for arrays of large objects.

template <class Itr>
void IndexSort(Itr begin, Itr end)
{
    const size_t count = end - begin;

    // Create indices
    vector<size_t> ind(count);
    iota(ind.begin(), ind.end(), 0);

    // Sort indices
    sort(ind.begin(), ind.end(), [&begin] (const size_t i, const size_t j)
    {
        return begin[i] < begin[j];
    });

    // Create indices to indices. This provides
    // constant time search in the next step.
    vector<size_t> ind2(count);
    for(size_t i = 0; i < count; ++i)
        ind2[ind[i]] = i;

    // Swap the objects into their final places
    for(size_t i = 0; i < count; ++i)
    {
        if( ind[i] == i )
            continue;

        swap(begin[i], begin[ind[i]]);

        const size_t j = ind[i];

        swap(ind[i], ind[ind2[i]]);
        swap(ind2[i], ind2[j]);
    }
}

      

Now I have measured swaps (of large objects) done with both quicksort and IndexSort and found that quicksort does a lot more swaps. So I know why IndexSort can be faster.

But can someone with more academic background explain why / how this algorithm actually works? (this is not intuitive to me, although I somehow came up with it).

Thank!

Edit: The following code was used to check the IndexSort results

// A class whose objects will be large
struct A
{
    int id;
    char data[1024];

    // Use the id to compare less than ordering (for simplicity)
    bool operator < (const A &other) const
    {
        return id < other.id;
    }

    // Copy assign all data from another object
    void operator = (const A &other)
    {
        memcpy(this, &other, sizeof(A));
    }
};

int main()
{
    const size_t arrSize = 1000000;

    // Create an array of objects to be sorted
    vector<A> randArray(arrSize);
    for( auto &item: randArray )
        item.id = rand();

    // arr1 will be sorted using quicksort
    vector<A> arr1(arrSize);
    copy(randArray.begin(), randArray.end(), arr1.begin());

    // arr2 will be sorted using IndexSort
    vector<A> arr2(arrSize);
    copy(randArray.begin(), randArray.end(), arr2.begin());

    {
        // Measure time for this
        sort(arr1.begin(), arr1.end());
    }

    {
        // Measure time for this
        IndexSort(arr2.begin(), arr2.end());
    }

    // Check if IndexSort yielded the same result as quicksort
    if( memcmp(arr1.data(), arr2.data(), sizeof(A) * arr1.size()) != 0 )
        cout << "sort failed" << endl;

    return 0;
}

      

Edit: Made the test less pathological; reduced the size of the large object class to 1,024 bytes (plus one int) and increased the number of objects to be sorted to a million. This still makes IndexSort significantly faster than quicksort.

Edit: This requires more thorough testing. But it seems to me that if std :: sort at compile time can check the size of the object and (depending on the threshold of a certain size), then choose either an existing quicksort implementation or this IndexSort implementation.

Also, IndexSort can be described as "sorting tags in place (see samgak's answer and my comments below ).

+3


source to share


2 answers


It seems that sorting the tags :

For example, the popular recursive quicksort algorithm provides reasonably reasonable performance with enough RAM, but the recursive way of copying parts of an array becomes much less practical when the array does not fit in RAM, as it can cause a number of slow copy or move operations to disk and back. In this case, a different algorithm may be preferable, even if it requires more accurate comparisons.

One way to get around this problem, which works well when complex records (for example, in a relational database) are sorted by a relatively small key field, is to create an index on the array, then sort the index rather than the entire array . (A sorted version of the entire array can be produced in a single pass, reading from the index, but often even that is not necessary, since having a sorted index is sufficient.) Since the index is much smaller than the entire array, it can easily fit into memory where the entire array will not, effectively eliminating the problem of replacing the drive. This procedure is sometimes called a tag sorting " .

As described above, tag sorting can be used to sort a large array of data that cannot fit into memory. However, even if it can fit in memory, it still requires fewer memory reads and writes for arrays of large objects, as shown by your solution, since all objects are not copied every time.



Implementation details: While your implementation only sorts the indices and accesses the original array of objects through the index when doing comparisons, another way to implement it is to store the index / sort key pairs in the sort buffer, using the sort keys for comparison. This means that you can perform a sort without having the entire array of objects in memory at once.

One example of tag sorting is the LINQ to Objects sorting algorithm in .NET:

Sort is somewhat flexible as it allows you to provide a comparison delegate. However, this does not allow you to provide a swap delegate. This is in many cases. However, if you are sorting large structures (value types) or want to do an indirect sort (often called tag sorting), the exchange delegate is a very useful thing. The LINQ to Objects sorting algorithm, for example, uses its own tag type. You can check this by examining the source that is available in the .NET Reference Source. If you skip the exchange delegate it becomes a lot more flexible.

+5


source


I would definitely not call this algorithm in the same way as indirect.

The reason you do fewer LOB swaps is because you have sorted indexes (the end result is that there are no redundant intermediate swaps). If you count the number of index swaps in addition to subject swaps, then you get more index sorted swaps.

However, you are not necessarily bound by algorithmic complexity all the time. The expense of expensive sorting time, replacing cheap small indexes saves more time than it costs.

This way you have more general index sort swaps, but the bulk of them are cheaper, and you do much less costly source object swaps.

The reason this is faster is because the original objects are larger than the indices, but might not be suitable for a move constructor (not necessarily keeping dynamically allocated data).

At this level, the cost of the swap will be more related to the size of the structure of the items you are sorting, and it will be practical efficiency rather than theoretical algorithmic complexity. And if you get into the hardware details it will boil down to things like more appropriate in a cache line.



When sorting, the amount of computation performed on the same dataset is significant. We do on optimal O (NLogN) comparisons and swaps, often in practice. So when you use indexes, you make exchange and comparison potentially cheaper (in your case, just exchange, since you are still using the comparator predicate to compare the original objects).

In other words, std :: sort is O (NLogN). Index sorting is O (N + NLogN). However, you do a lot of the work of NLogN much cheaper by using indices and indirections.

In your updated test case, you are using a very pathological case with huge objects. Thus, sorting the indexes will generate large sums. More often than not, you don't have objects of type T where sizeof (T) spans 100 kilobytes. Usually, if an object stores data of that size, it is going to store a pointer to it elsewhere, and the move constructor just copies the pointers shallowly (making it cheaper to exchange like int

). So most of the time you won't necessarily make that much profit by sorting things indirectly in this way, but if you have such huge objects this kind of index or pointer sort would be a big optimization.

Edit: This requires more testing. But it seems to me that if std :: sort at compile time can check the size of the object and (depending on the threshold of a certain size) then choose either the existing quicksort implementation or this IndexSort implementation.

I think this is not a bad idea. At least that would be a good start. However, I would suggest against the automatic approach. The reason I think it would be better to leave this aside as a potential optimization that a developer can choose when appropriate, because sometimes there are cases where memory is more valuable than processing. Indexes will seem trivial if you create 1Kb objects, but there are many iffy scripts, edge cases where you can deal with something more like 32-64 bytes (ex: a list of 32 bytes, 4-part double precision mathematical vectors). In these edge cases, this method of sorting the indexes may still be faster.but the extra temporary memory usage of 2 extra indexes per item can actually be a factor (and can sometimes slow down at runtime depending on the physical state of the environment). Consider that an attempt to specialize cases withvector<bool>

- it often does more harm than good. It seemed like a good idea at the time to treat it vector<bool>

like a bitrate, now it gets in the way. So I suggest leaving it aside and letting people choose it, but having it in stock can be a welcome addition.

+4


source







All Articles