New in C # 3: Should this be done with linq?
Assuming the length of list 1 is N and the length of list 2 is M:
I will filter first, as it is a cheap O (N + M) operation, and then intersection, a relatively expensive operation based on the current implementation. The cost of calling Intersect is complex and is mainly due to the behavior of the hash function:
- If the hash function is bad, performance can degrade to O (N * M) (since every row in one list is checked against every thread in the other.
- If the performance is good, then the cost is just a look in the hash, since it is O (1), so the cost M checks the hash and the cost M to construct the hash, so O (N + M) in time, but with an additional cost of O ( N) in space.
- Building a support kit will be a killer in terms of performance.
- If you knew that both lists were, say, sorted back then, you could write your own Intersects check with constant overhead in space and O (N + M) at worst, not to mention excellent memory locality.
This gives you:
int count = Enumerable.Intersect(
list1.Where(word => word.Length > 3),
list2.Where(word => word.Length > 3)).Count();
By the way, the performance behavior of the Enumerable.Intersect method can change significantly depending on the order of the arguments.
In most cases, making the smaller of the first two arguments will produce faster, more efficient memory code, and the first argument is used to create a temporary (hash set) backing. Of course, this is an encoding of a (hidden) implementation detail, so you should only consider this if it shows up as a problem after a performance analysis and, if so, stands out as a micro-optimization.
The second filter in list2 at the root is not needed for correctness (since intersections will remove such entries anyway)
If possible the next is faster
int count = Enumerable.Intersect(
list1.Where(word => word.Length > 3),
list2).Count();
However, filtering by length is very cheap for long strings compared to calculating their hash codes. The best one will only be found through benchmarking with inputs appropriate for your use.
source to share