C # .NET Shared Collections Performance and Optimization Reference

I'm trying to optimize a piece of .NET 2.0 C # code that looks like this:

Dictionary<myType, string> myDictionary = new Dictionary<myType, string>();
// some other stuff
// inside a loop check if key is there and if not add element
if(!myDictionary.ContainsKey(currentKey))
{
   myDictionary.Add(currentKey, "");
}

      

It looks like the Dictionary was used by whoever wrote this piece of code, even if it is not needed (only the key is used to store a list of unique values) as it is faster than a list of myType objects to look up. This seems clearly wrong once the dictionary key is in place, but I'm trying to figure out what is the best way to fix it.

Questions:

1) I seem to understand that I am getting a nice performance boost even when using the .NET 3.5 HashSet. Is it correct?

2) What would be the best way to optimize the code above in .NET 2.0 and why?

EDIT : This is existing code that I am trying to optimize, it iterates over tens of thousands of items and calls ContainsKey for each one. There must be a better way to do this (even in .NET 2.0)! :)

+2


source to share


5 answers


I think you need to break this down into 2 questions.

It is Dictionary<myType,string>

the best available for this type of scenario

Not. Based on your breakdown, HashSet<myType>

by far the best choice because the usage pattern more closely matches the scenario

Will the move to HashSet<myType>

improve performance?



This is really subjective and only a profiler can give you the answer to this question. You will likely see a slight improvement in memory size for each item in the collection. But in terms of processing power, I doubt you will see a huge difference. Only a profiler can tell you if there is one.

Before you make performance-related changes to your code, remember the golden rule.

Don't make any performance-related changes until the profiler tells you exactly what is wrong with your code.

Changes that violate this rule are guesswork. The profiler is only a way to measure the success of a performance patch.

+10


source


1) No. The dictionary makes a hash on the key, so your search should be O (1). However, the Hashset should have less memory. But to be honest, it's not that much that you will actually see a performance boost.



2) Give us more details on what you are trying to accomplish. The code you posted is pretty simple. Have you already measured? Can you see that this method is slow? Don't forget: "We have to forget about a little efficiency, say about 97% of the time: premature optimization is the root of all evil." - Donald Knuth

+2


source


  • Depending on the size of your keys, you may see performance degradation.

  • One way in 2.0 would be to try to insert it and catch the exception (of course, this depends on how many duplicate keys you plan on having:

foreach(string key in keysToAdd)
{
  try
  {
    dictionary.Add(key, "myvalue");
  }
  catch(ArgumentException) 
  {
    // do something about extra key
  }
}

      

+2


source


Obvious error (if we're discussing performance), I see double work being done when calling ContainsKey and then adding a key-value pair. When a pair is added using the Add method, the key is checked internally for presence again. The whole if block can be safely replaced with the following:

... myDictionary [currentKey] = ""; ...

If the key already exists, the value will simply be overridden and no exception will be thrown. Moreover, if the value is not used at all, I personally would use null values ​​to fill it. I see no reason to use any string constant.

+1


source


The possible performance degradation mentioned by scottm is not for simple searches. It is designed to calculate the intersection between two sets. HashSet has a slightly faster lookup than a dictionary. The performance difference will indeed be very small, although as everyone says - lookups take most of the time and the creation of the KeyValuePair takes very little.

For 2.0, you can make the Value object one of the following:

public struct Empty {}

      

He might be a little better than "".

Or you can try making a reference to System.Core.dll in your 2.0 project so you can use the HashSet.

Also, make sure GetHashCode and Equals are as efficient as possible for MyType. I was bitten by using a dictionary for something with a very slow GetHashCode (I believe we were trying to use a delegate as a key or something.)

+1


source







All Articles