The best way to iterate over lists

So, I have several different lists that I am trying to process and combine into one list.

Below is a snippet of the code I want to see if there is a better way to do it. The reason I am asking is because some of these lists are quite large. I want to see if there is a more efficient way to do this.

As you can see, I am going through the list and the first thing I do is check if the CompanyId exists in the list. If this happens, I will find an item in the list that I will process.

pList is a list of handlers. I am adding values ​​from my different lists to this list.

I'm wondering if there is a "better way" to accomplish Exist and Find.

    boolean tstFind = false;
    foreach (parseAC item in pACList)
    {
        tstFind = pList.Exists(x => (x.CompanyId == item.key.ToString()));

        if (tstFind == true)
        {
            pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
            //Processing done here.  pItem gets updated here
            ...
         }

      

As a side note, I'm going to investigate a way to use unions to see if it's faster. But I haven't gotten there yet. The above code is my first solution in solving this problem and seems to work. However, since I have time, I want to see if there is an even better way.

Any input is greatly appreciated.

Time:

  • In my current code, Find and Exists takes 84 minutes to iterate over 5.5M items in pACList.

  • Using pList.firstOrDefault (x => x.CompanyId == item.key.ToString ()); takes 54 minutes to iterate over 5.5M items in pACList

+3


source to share


6 answers


You can get an element by using FirstOrDefault

instead of searching for an object two times (the first time to determine if the element exists and the second time to get an existing element):



var tstFind = pList.FirstOrDefault(x => x.CompanyId == item.key.ToString());

if (tstFind != null)
{            
   //Processing done here.  pItem gets updated here        
}

      

+3


source


Yes, use a hash table so your algorithm is O (n) instead of O (n * m) which it is right now.



var pListByCompanyId = pList.ToDictionary(x => x.CompanyId);
 foreach (parseAC item in pACList)
    {
        if (pListByCompanyId.ContainsKey(item.key.ToString()))
        {
            pItem = pListByCompanyId[item.key.ToString()];
            //Processing done here.  pItem gets updated here
            ...
         }

      

+3


source


You can iterate over a filtered list using linq

foreach (parseAC item in pACList.Where(i=>pList.Any(x => (x.CompanyId == i.key.ToString()))))
    {
            pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
            //Processing done here.  pItem gets updated here
            ...
    }

      

+2


source


Using lists for this type of operation is O (MxN) (M is pACList counter, N is pList counter). Also, you are looking for pACList twice. To avoid this issue use pList.FirstOrDefault

as @lazyberezovsky recommended.

However, if possible, I would avoid using lists. A Dictionary

with the index of the key you are looking for will greatly improve the search time.

+2


source


Performing a linear search in a list for every item in another list is ineffective for large datasets. Which is preferable is to put the keys in a table or dictionary, which can be searched more efficiently so that you can join the two tables. You don't even need to specify it yourself, what you need is an operation Join

. You want all pairs of items from each sequence to map each card to the same key.

Either pull out the implementation of the method below, or change Foo

it Bar

to the appropriate types and use it as a method.

public static IEnumerable<Tuple<Bar, Foo>> Merge(IEnumerable<Bar> pACList
    , IEnumerable<Foo> pList)
{
    return pACList.Join(pList, item => item.Key.ToString()
        , item => item.CompanyID.ToString()
            , (a, b) => Tuple.Create(a, b));
}

      

You can use the results of this call to concatenate the two items together, since they will have the same key.

Inside the method, a lookup table will be created that allows you to efficiently perform searches before actually performing the search.

+2


source


  • Convert pList to HashSet, then query pHashSet.Contains () file. Complexity O (N) + O (n)

  • Sort pList to CompanyId and do Array.BinarySearch () = O (N Log N) + O (n * Log N)

  • If the Max company id is not overly large, just create and create an array of them where the item with company id i exists at the i-th position. Nothing could be faster.

where N is the size of pList and n is the size of pACList

+1


source







All Articles