Make sure every item in the list is unique?

Let's say I have a list of 1 to 10,000,000 items. Type - List. CustomObj looks like this:

class Person 
{
   public string Prename;
   public string Lastname;

   public CustomObj(string pre, string last)
   {
      Prename = pre; 
      Lastname = last;
   }
}

      

I want to make sure every person on this list is unique. So if I try to add "Tim Stone" and there is already "Tim Stone" in the list, the new one will not be added or filtered.

I tried to do this using the List.Distinct () function to remove duplicates. Unfortunately it doesn't work that well with custom objects and I get duplicates.

Could a HashSet be what I'm looking for? If so, what would the implementation look like?

Hello

+3


source to share


3 answers


Instead of adding them to the list first, you can add them to HashSet

as you mentioned. Override methods Equals

and GetHashCode

. For example, you can do this

public class Person  
{
    public string Prename;
    public string Lastname;


    public Person(string pre, string last)
    {
        Prename = pre; Lastname = last;
    }

    public override bool Equals(object obj)
    {
        Person p = obj as Person;

        //can make this check case insensitive using the overload
        return (Prename + Lastname).Equals(p.Prename + p.Lastname);
    }

    public override int GetHashCode()
    {
        return (Prename + Lastname).GetHashCode();
    }

}

      

This way, when you add them to HashSet

, no duplicates will be added. If you already have a list, you can use constructor overloading HashSet's

like this:



HashSet<Person> hsPerson = new HashSet<Person>(myExistingList);

      

As a result, you will get HashSet

objects Person

that will not have duplicates.

My implementation above assumes that the duplicate is the one who has the same prename

and lastname

when combined, but you can change that to whatever you prefer.

+1


source


If you don't care about the order of the items in your collection, HashSet

this is the way to go.

Its methods are almost the same as those of List

, as they implement common interfaces such as ICollection

and IEnumerable

. Here's a sample:

HashSet<Person> people = new HashSet<Person>();
var heko = new Person("heko", "17");
people.Add(heko); // people now contains heko
people.Add(heko); // people still contains only heko since duplicates are not allowed
people.Add(new Person("Nikola", "Dimitroff")); // people contains heko and nikola

      



Several things can be noted. First, since it HashSet

doesn't keep the elements in order, you can't get the elements by their index, i.e. people[0]

is an invalid operation. Use to list the people in the set foreach

.

Second, it HashSet

uses method ==

and method GetHashCode

when comparing items. Be sure to reload them if you think so new Person("heko", 17") == new Person("heko", "17")

.

+1


source


If you want to use operations HashSet<T>

or any operations Distinct

on your custom objects, you can make your custom IEquatable object (by following all directions on this page, including overrides GetHashCode

). Once this is done, BCL collections and LINQ operations behave the way you want.

You should be aware that using properties GetHashCode

to use properties of a class that can be changed can lead to very bad things (for example, elements in a dictionary or collection can be "lost"). If you cannot make your important properties immutable, you can satisfy your requirements by creating your own implementation IList<T>

that wraps the standard List<T>

and implements Add

your collection type's method like this:

public void Add(Person person)
{
   if (!_list.Any(p => p.Prename == person.PreName && p.Lastname == person.Lastname))
   {
      _list.Add(person);
   }
}

      

This solution will be much less efficient, but it can save you a few cryptic mistakes.

0


source







All Articles