What is calling this GetHashCode implementation 20x slower than the .net implementation?

Question

What is calling this GetHashCode implementation 20x slower than the .net implementation?

I had a substring idea struct

from this post and this one. The second post has a .net String.GetHashCode () implementation. (I'm not sure which .net version it is from.)

Here is the implementation. (GetHashCode comes from the second source above.)

public struct Substring
{
    private string String;
    private int Offset;
    public int Length { get; private set; }
    public char this[int index] { get { return String[Offset + index]; } }

    public Substring(string str, int offset, int len) : this()
    {
        String = str;
        Offset = offset;
        Length = len;
    }

    /// <summary>
    /// See http://www.dotnetperls.com/gethashcode
    /// </summary>
    /// <returns></returns>
    public unsafe override int GetHashCode()
    {
        fixed (char* str = String + Offset)
        {
            char* chPtr = str;
            int num = 352654597;
            int num2 = num;
            int* numPtr = (int*)chPtr;
            for (int i = Length; i > 0; i -= 4)
            {
                num = (((num << 5) + num) + (num >> 27)) ^ numPtr[0];
                if (i <= 2)
                {
                    break;
                }
                num2 = (((num2 << 5) + num2) + (num2 >> 27)) ^ numPtr[1];
                numPtr += 2;
            }
            return (num + (num2 * 1566083941));
        }
    }
}

Here's a unit test:

    [Test]
    public void GetHashCode_IsAsFastAsString()
    {
        var s = "The quick brown fox";
        var sub = new Substring(s, 1, 5);
        var t = "quick";
        var sum = 0;

        sum += sub.GetHashCode(); // make sure GetHashCode is jitted 

        var count = 100000000;
        var sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += t.GetHashCode();
        var t1 = sw.Elapsed;
        sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += sub.GetHashCode();
        var t2 = sw.Elapsed;

        Debug.WriteLine(sum.ToString()); // make sure we use the return value
        var m1 = t1.Milliseconds;
        var m2 = t2.Milliseconds;
        Assert.IsTrue(m2 <= m1); // fat chance
    }

The problem is that m1 is 10 milliseconds and m2 is 190 milliseconds. (Note: this is 1,000,000 iterations.) FYI, I ran this on a .net 4.5 64-bit Release build with optimizations enabled.

+3

performance c # gethashcode

bright 10 Sep 14 at 18:09

source to share

1 answer

bright · Accepted Answer · 2014-09-11T07:56:47+0000

Confirmed by the comment, I double checked to make sure the optimized code is running. It turns out that the hidden debugger option turns off optimizations. So I am unchecked "Tools" - "Options" - "Debug" - "General" - disable JIT optimization when loading a module (only for managed). This caused the optimized code to load correctly.
Even with optimization enabled, there is still a 3x to 6x difference. However, it might be because the above code is a 32-bit version of .net and I am running a 64-bit .net network. Porting a 64-bit implementation of string.GetHashCode to a substring is not easy because it relies on the null end of the string marker (which is actually an error ).

I'm currently disappointed that I don't get parity performance, but it was a great use of my time in learning about some of the pitfalls and pitfalls of C # optimization.

What is calling this GetHashCode implementation 20x slower than the .net implementation?

More articles: