New in C # 3: Should this be done with linq?

I have two word lists. I want to count words more than three characters that exist in both lists. Using C #, how would you resolve it?

+2


source to share


3 answers


I would use

var count = Enumerable.Intersect(listA, listB).Count(word => word.Length > 3);

      



Assuming listA

u listB

are of type IEnumerable<String>

.

+15


source


Assuming the length of list 1 is N and the length of list 2 is M:

I will filter first, as it is a cheap O (N + M) operation, and then intersection, a relatively expensive operation based on the current implementation. The cost of calling Intersect is complex and is mainly due to the behavior of the hash function:

  • If the hash function is bad, performance can degrade to O (N * M) (since every row in one list is checked against every thread in the other.
  • If the performance is good, then the cost is just a look in the hash, since it is O (1), so the cost M checks the hash and the cost M to construct the hash, so O (N + M) in time, but with an additional cost of O ( N) in space.
    • Building a support kit will be a killer in terms of performance.
    • If you knew that both lists were, say, sorted back then, you could write your own Intersects check with constant overhead in space and O (N + M) at worst, not to mention excellent memory locality.

This gives you:

int count = Enumerable.Intersect(
    list1.Where(word => word.Length > 3),
    list2.Where(word => word.Length > 3)).Count();

      

By the way, the performance behavior of the Enumerable.Intersect method can change significantly depending on the order of the arguments.



In most cases, making the smaller of the first two arguments will produce faster, more efficient memory code, and the first argument is used to create a temporary (hash set) backing. Of course, this is an encoding of a (hidden) implementation detail, so you should only consider this if it shows up as a problem after a performance analysis and, if so, stands out as a micro-optimization.

The second filter in list2 at the root is not needed for correctness (since intersections will remove such entries anyway)

If possible the next is faster

int count = Enumerable.Intersect(
    list1.Where(word => word.Length > 3),
    list2).Count();

      

However, filtering by length is very cheap for long strings compared to calculating their hash codes. The best one will only be found through benchmarking with inputs appropriate for your use.

+15


source


List<string> t = new List<string>();
List<string> b = new List<string>();    
...
Console.WriteLine(t.Count(x => x.Length > 3 && b.Contains(x)));

      

+2


source







All Articles