Efficient data structure for finding the intersection of two lists

I have two very large List<List<int>>

A and B. I need to find the intersection between each element of these lists.

A[0] = { 1, 2, 3};
B[0] = {2, 3, 4};

Intersection = { 2, 3 };

      

My implementation:

List<int> intersection = A[0].Intersection(B[0]).ToList();

      

This decision takes a very long time. I am wondering if there is a better way to do this and a more efficient data structure that I can use to accomplish this at its best.

Thank!

+3


source to share


2 answers


You have to use Hashset for this, in C # HashSet<T>

. Searching in hashes is O (1) (if there is a decent hash function and using an array underneath) as opposed to O (n) for lists.

Using Linq in C # you basically get this "inline": Intersect()

will use the hashset internally to compute the intersection in O (n) instead of O (n ^ 2) when using two lists.



var intersection = a.Intersect(b).ToList();

      

+6


source


Sample code using HashSet (T) .IntersectWith :

HashSet<string> lst1 = new HashSet<string> 

     { "id1", "id2", "id3" };

HashSet<string> lst2 = new HashSet<string> 

     { "id2", "id3", "id4" };

// what happens is that, lst1 will be modified by only leaving the intersect items
lst1.IntersectWith(lst2);

      



PS: I used a sample for String, but you can use your own integer values.

0


source







All Articles