Why use array sorting. how is it better in some cases

I am trying to learn about sorting arrays. It seems pretty straightforward. But on the mozilla site, I came across a section that discussed sorting cards (about three-quarters down the page).

The compareFunction can be called multiple times for each element within an array. Depending on the compareFunction property, this can lead to high overhead. The more the compareFunction works, the more elements that need to be sorted, the wiser you might be to consider using a map for sorting.

Given example:

// the array to be sorted
var list = ["Delta", "alpha", "CHARLIE", "bravo"];
// temporary holder of position and sort-value
var map = [];
// container for the resulting order
var result = [];

// walk original array to map values and positions
for (var i=0, length = list.length; i < length; i++) {
  map.push({    
    // remember the index within the original array
    index: i, 
    // evaluate the value to sort
    value: list[i].toLowerCase()
  });
}

// sorting the map containing the reduced values
map.sort(function(a, b) {
  return a.value > b.value ? 1 : -1;
});

// copy values in right order
for (var i=0, length = map.length; i < length; i++) {
  result.push(list[map[i].index]);
}

// print sorted list
print(result);

      

I don't understand a few things. To wit: What does it mean, " compareFunction

can be called multiple times per element in an array"? Can anyone show me an example of this. Second, I understand what the example is doing, but I don't understand the potential "high [er] overhead" compareFunction

. The example shown here seems to be very simple and mapping an array to an object, sorting its value, and then returning it to an array would require a lot more overhead, which I think at first glance. I realize this is a simple example and is probably not intended for anything other than showing the procedure. But can anyone give an example of when there will be fewer overheads to apply such a pattern? It seems like a lot more work.

Thank!

+3


source to share


5 answers


When sorting a list, an item is not simply compared to one other item, it may need to be compared to several other items. Some elements can even be compared to all other elements.

See how many comparisons actually happen when sorting an array:

var list = ["Delta", "alpha", "CHARLIE", "bravo", "orch", "worm", "tower"];

var o = [];
for (var i = 0; i < list.length; i++) {
    o.push({
        value: list[i],
        cnt: 0
    });
}

o.sort(function(x, y){
    x.cnt++;
    y.cnt++;
    return x.value == y.value ? 0 : x.value < y.value ? -1 : 1;
});

console.log(o);

      

Result:



[
 { value="CHARLIE",  cnt=3},
 { value="Delta",  cnt=3},
 { value="alpha",  cnt=4},
 { value="bravo",  cnt=3},
 { value="orch",  cnt=3},
 { value="tower",  cnt=7},
 { value="worm",  cnt=3}
]

      

(Fiddle: http://jsfiddle.net/Guffa/hC6rV/ )

As you can see, each item was compared to other items. The string "tower"

even had more comparisons than other strings, which means it was compared to at least one other string at least twice.

If the comparison requires some calculation before the values โ€‹โ€‹are compared (for example, the method toLowerCase

in the example), then this calculation will be performed several times. By caching the values โ€‹โ€‹after this calculation, this will only be done once for each item.

+3


source


The primary time saving in this example is obtained by eliminating calls to toLowerCase()

in the compare function. The comparison function is called by the sorting code every time a pair of elements needs to be compared to save a lot of function calls. The cost of building and decompressing a map is worth it to use large arrays.

The fact that the comparison function can be called more than once per item is a natural consequence of how sort works. If only one comparison per item is required, this will be a linear time process.



edit - the number of comparisons to be made will be roughly proportional to the length of the array when the length of the base-2 log is equal to the length. For an array of 1000 elements, that's proportional to 10,000 comparisons (probably closer to 15,000, depending on the actual sorting algorithm). Saving 20,000 unnecessary function calls costs the 2,000 operations required to rebuild and unbuild a sort map.

+3


source


This is called the "decorate-sort - undecorate" pattern (you can find a good explanation on Wikipedia ).

The idea is that comparison-based sort should call the comparison function at least n

times (where n

is the number of elements in the list), since that's the amount of comparison you just need to check that the array is already sorted. Usually the comparison number will be larger ( O(n ln n)

if you are using a good algorithm) and according to the ping-end principle there is at least one value to be passed twice to the comparison function.

If your comparison function does some expensive processing before comparing the two values, you can reduce the cost by doing the expensive part first and storing the result for each value (since you know that, even in the best case scenario, you will have to do that processing). Then, when sorting, you use a cheaper compare function that only compares those cached outputs.

In this example, the "expensive" part converts the string to lowercase.

+2


source


Think of it like caching. It just goes to show that you don't have to compute much in the comparison function, because you will be evaluating the same value over and over.

What does it mean: "The compareFunction can be called multiple times per element in the array"?

This means exactly what he says. Allows you to have three elements: A, B, and C. They must be sorted by the result of the comparison function. The comparison can be done as follows:

compare(A) to compare(B)
compare(A) to compare(C)
compare(B) to compare(C)

      

So, here we have 3 values, but the function was compare()

executed 6 times. Using a temporary array for caching ensures that we only do the calculation once per item and can compare these results.

Second, I understand what is being done in this example, but I do not understand the potential "high [er] overhead" of compareFunction.

What if it compare()

fetches the database (comparing the number of matching rows)? Or a complex math calculation (factorial, recursive fibbins, or iterating over a large number of items) Things you don't want to do more than once.

I would say that most of the time it would be nice to leave very simple / fast inline calculations. Don't over-optimize. But if you want something complex or slow in comparison, you have to be smarter about it.

+1


source


To answer your first question, why compareFunction

is it called multiple times for each element of the array?

Sorting an array almost always requires more than N passes, where N is the size of the array (if the array is not already sorted). This way, for each element in your array, it can be compared to another element in your array up to N times (it takes no more than N ^ 2 comparisons to create bubbles). The one provided compareFunction

will be used each time to determine if two elements are less than / equal / greater and therefore will be called multiple times per element in the array.

The simple answer for you is the second question, why are there potentially higher overheads for compareFunction

?

Say yours compareFunction

is doing a lot of unnecessary work comparing two array elements. This can slow down the sort, and thus using compareFunction could potentially cause higher overhead.

0


source







All Articles