Removing from array, mirror (strange) behavior

Question

Removing from array, mirror (strange) behavior

The title might sound a little strange because I have no idea how to describe it in one sentence.

For the algorithms of the course, we have to micro-optimize some things, figure out how deletion from an array works. The purpose is to remove something from the array and re-align the content so that there are no spaces, I think this is very similar to how std :: vector :: erase works with C ++.

Because I like the idea of understanding everything at a low level, I went a little further and tried to evaluate my decisions. This has shown some strange results.

First, here's the little code I used:

class Test {

    Stopwatch sw;
    Obj[] objs;

    public Test() {
        this.sw = new Stopwatch();
        this.objs = new Obj[1000000];

        // Fill objs
        for (int i = 0; i < objs.Length; i++) {
            objs[i] = new Obj(i);
        }
    }

    public void test() {

        // Time deletion
        sw.Restart();
        deleteValue(400000, objs);
        sw.Stop();

        // Show timings
        Console.WriteLine(sw.Elapsed);
    }

    // Delete function
    // value is the to-search-for item in the list of objects
    private static void deleteValue(int value, Obj[] list) {

        for (int i = 0; i < list.Length; i++) {

            if (list[i].Value == value) {
                for (int j = i; j < list.Length - 1; j++) {
                    list[j] = list[j + 1];

                    //if (list[j + 1] == null) {
                    //    break;
                    //}
                }
                list[list.Length - 1] = null;
                break;
            }
        }
    }
}

I would just create this class and call the test () method. I did this in a loop 25 times.

My findings:

The first round takes a lot longer than the other 24, I think this is due to caching but I'm not sure.
When I use the value at the beginning of the list, it should move more items in memory than when I use the value at the end, although it still seems to take less time.
The desktop versions are slightly different.
When I turn on the comment, if the performance improves (10-20%), even if the value I'm looking for is almost at the end of the list (which means if it turns off many times without actually being useful).

I have no idea why this is happening, is there anyone who can explain (some of them)? And maybe if someone sees who a professional is, where can I find more information to do this in the most efficient way?

Edit after testing:

I did some testing and found interesting results. I am running a test on an array of a million items filled with a million objects. I am running this 25 times and reporting the cumulative time in milliseconds. I do this 10 times and take the average of this as the final value.

When I run the test using my function described above, I get the score: 362.1

When I run it with dbc response I get the score: 846.4

So mine was faster, but then I started experimenting with an empty empty array and things got weird. To get rid of the inevitable nullPointerExceptions, I added an additional if check (thinking it would mess up a little more performance):

if (fromItem != null && fromItem.Value != value)
    list[to++] = fromItem;

It not only seemed to work, but it also dramatically improved performance! Now I get the score: 247.9

The weird thing is that the scores seem low to be true, but sometimes spiked, this is the set I chose avg: 94, 26, 966, 36, 632, 95, 47, 35, 109, 439

Thus, additional grading improves my performance despite additional validation. How is this possible?

+3

c # algorithm

Gideon 03 Sep 14 at 12:54

source to share

1 answer

dbc · Accepted Answer · 2014-09-03T14:28:46+0000

You are using Stopwatch

for your method. This calculates the total time taken during a method call, which may include the time it takes for .Net to start JITing your method , interrupts for garbage collection, or slowdowns caused by system loading from other processes. Noise from these sources is likely to dominate over noise due to cache flaws.

This answer provides some guidance on how you can minimize some noise from garbage collection or other processes. To eliminate JIT noise, you must call your method once without synchronizing it, or show the time taken by the first call in a separate column in the results table as it will be so different. You can also consider using a proper profiler that will accurately report how long your code has been using, excluding "noise" from other threads or processes.

Finally, I want to point out that your algorithm to remove matched elements from an array and move everything else down, uses a nested loop, which is optional and will access the elements in the array after a matching match. The standard algorithm looks like this:

    public static void RemoveFromArray(this Obj[] array, int value)
    {
        int to = 0;
        for (int from = 0; from < array.Length; from++)
        {
            var fromItem = array[from];
            if (fromItem.Value != value)
                array[to++] = fromItem;
        }
        for (; to < array.Length; to++)
        {
            array[to] = default(Obj);
        }
    }

However, instead of using the standard algorithm, you can experiment with the help Array.RemoveAt()

with your version, since (I believe) internally it does delete in unmanaged code.

Removing from array, mirror (strange) behavior

More articles: