Make sure every item in the list is unique?
Let's say I have a list of 1 to 10,000,000 items. Type - List. CustomObj looks like this:
class Person
{
public string Prename;
public string Lastname;
public CustomObj(string pre, string last)
{
Prename = pre;
Lastname = last;
}
}
I want to make sure every person on this list is unique. So if I try to add "Tim Stone" and there is already "Tim Stone" in the list, the new one will not be added or filtered.
I tried to do this using the List.Distinct () function to remove duplicates. Unfortunately it doesn't work that well with custom objects and I get duplicates.
Could a HashSet be what I'm looking for? If so, what would the implementation look like?
Hello
source to share
Instead of adding them to the list first, you can add them to HashSet
as you mentioned. Override methods Equals
and GetHashCode
. For example, you can do this
public class Person
{
public string Prename;
public string Lastname;
public Person(string pre, string last)
{
Prename = pre; Lastname = last;
}
public override bool Equals(object obj)
{
Person p = obj as Person;
//can make this check case insensitive using the overload
return (Prename + Lastname).Equals(p.Prename + p.Lastname);
}
public override int GetHashCode()
{
return (Prename + Lastname).GetHashCode();
}
}
This way, when you add them to HashSet
, no duplicates will be added. If you already have a list, you can use constructor overloading HashSet's
like this:
HashSet<Person> hsPerson = new HashSet<Person>(myExistingList);
As a result, you will get HashSet
objects Person
that will not have duplicates.
My implementation above assumes that the duplicate is the one who has the same prename
and lastname
when combined, but you can change that to whatever you prefer.
source to share
If you don't care about the order of the items in your collection, HashSet
this is the way to go.
Its methods are almost the same as those of List
, as they implement common interfaces such as ICollection
and IEnumerable
. Here's a sample:
HashSet<Person> people = new HashSet<Person>();
var heko = new Person("heko", "17");
people.Add(heko); // people now contains heko
people.Add(heko); // people still contains only heko since duplicates are not allowed
people.Add(new Person("Nikola", "Dimitroff")); // people contains heko and nikola
Several things can be noted. First, since it HashSet
doesn't keep the elements in order, you can't get the elements by their index, i.e. people[0]
is an invalid operation. Use to list the people in the set foreach
.
Second, it HashSet
uses method ==
and method GetHashCode
when comparing items. Be sure to reload them if you think so new Person("heko", 17") == new Person("heko", "17")
.
source to share
If you want to use operations HashSet<T>
or any operations Distinct
on your custom objects, you can make your custom IEquatable object (by following all directions on this page, including overrides GetHashCode
). Once this is done, BCL collections and LINQ operations behave the way you want.
You should be aware that using properties GetHashCode
to use properties of a class that can be changed can lead to very bad things (for example, elements in a dictionary or collection can be "lost"). If you cannot make your important properties immutable, you can satisfy your requirements by creating your own implementation IList<T>
that wraps the standard List<T>
and implements Add
your collection type's method like this:
public void Add(Person person)
{
if (!_list.Any(p => p.Prename == person.PreName && p.Lastname == person.Lastname))
{
_list.Add(person);
}
}
This solution will be much less efficient, but it can save you a few cryptic mistakes.
source to share