Efficient data structure for finding the intersection of two lists
I have two very large List<List<int>>
A and B. I need to find the intersection between each element of these lists.
A[0] = { 1, 2, 3};
B[0] = {2, 3, 4};
Intersection = { 2, 3 };
My implementation:
List<int> intersection = A[0].Intersection(B[0]).ToList();
This decision takes a very long time. I am wondering if there is a better way to do this and a more efficient data structure that I can use to accomplish this at its best.
Thank!
source to share
You have to use Hashset for this, in C # HashSet<T>
. Searching in hashes is O (1) (if there is a decent hash function and using an array underneath) as opposed to O (n) for lists.
Using Linq in C # you basically get this "inline": Intersect()
will use the hashset internally to compute the intersection in O (n) instead of O (n ^ 2) when using two lists.
var intersection = a.Intersect(b).ToList();
source to share
Sample code using HashSet (T) .IntersectWith :
HashSet<string> lst1 = new HashSet<string>
{ "id1", "id2", "id3" };
HashSet<string> lst2 = new HashSet<string>
{ "id2", "id3", "id4" };
// what happens is that, lst1 will be modified by only leaving the intersect items
lst1.IntersectWith(lst2);
PS: I used a sample for String, but you can use your own integer values.
source to share