Filter large collection

I have a collection of MyClass objects and I need to filter it using a combination of 17 files.

I have implemented a MyClassFilter object with 17 possible fields and a condition for each one and a method:

bool PassFilter(MyClass ObjectToEvaluate)
{
  return PassFilterVal(this.Workstream, ObjectToEvaluate.WorkStream)
    && PassFilterVal(this.AssignedTo, ObjectToEvaluate.AssignedTo)
    && PassFilterVal(this.ProcessingGroup, ObjectToEvaluate.ProcessingGroup)
    && PassFilterVal(this.ScheduledStart, ObjectToEvaluate.ScheduledStart)
    && PassFilterVal(this.EntityName, ObjectToEvaluate.EntityName)
    && PassFilterVal(this.TaskIDs, ObjectToEvaluate.TaskID)
    && PassFilterVal(this.ElementIDs, ObjectToEvaluate.EntityID)
    && PassFilterVal(this.MappingIDs, ObjectToEvaluate.MappingID)
    && PassFilterVal(this.EntityStatus, ObjectToEvaluate.EntityStatus)
    && PassFilterVal(this.EntityType, ObjectToEvaluate.EntityType)
    && PassFilterVal(this.NumberOfSteps, ObjectToEvaluate.ListOfSteps.Count)
    && PassFilterVal(this.NumberOfDependancies, ObjectToEvaluate.ListOfParentDependancies.Count)
    && PassFilterVal(this.NumberOfOpenIssues, ObjectToEvaluate.ListOfAllIssues.CountOpen)
    && PassFilterVal(this.NumberOfRequirementsLinked, ObjectToEvaluate.RequierementsLinked)
    ;
}

      

and my collection has a method

ListOfMyClass FilterList(MyClassFilter Filter){
    ListOfMyClass FilteredList = new ListOfMyClass();
    foreach (MyClass Task in this)
    {
      if (Filter.TaskPassFilter(Task))
        FilteredList.Add(Task);
    }
    return FilteredList;
}

      

It works fine as long as the collection is small, but when I have 500 objects, it starts very slowly. I searched the net, but all examples are going object by object in the collection and asking for a field if it passed or not.

Any suggestions for improving performance?

thank

+2


source to share


3 answers


It doesn't have to be slow if your comparisons are not slow.

Scanning 500 objects should be very fast (of course, you don't specify what is "slow", or your hard, but even more ...).

Your PassFilterVal will be "more expensive" due to the method call than the inline comparison, but since it is the same for all of them, I think we are stuck with what we have.

You can order parameters, so choose the most selective ones first.



The goal is to use an AND short circuit to reset as quickly as possible, thereby limiting the number of actual comparisons.

Another thing you can do is optimize it for "most common queries" first.

Are ALL ALL criteria always used? If not, you should limit the comparison to those actually used. Equality is really expensive in this case (17 method calls and 17 comparisons of "unknown" complexity). If you have some kind of wildcard or "not caring" value you can try and skip them not to compare at all.

Another idea is to sort the items by 17 criteria. Then you use a binary search for an item that matches all 17 fields, and finally, iterate over the rest of the items until they STOP your criteria. Of course, you need to always sort the list correctly, but after sorting it, it's binary insertion, which will be pretty fast. If you read a lot more than you add to the list, it should be pure profit.

+1


source


Hmm.

I offer you a nice trick:

Perhaps you could implement .CompareTo

(or the equivalent in any language, I assume .NET) such that the "closer" matches are at the top. Then you just use a collection that sorts by insertion and it all happens for you.



This way you can just iterate over all the elements, and once you find the one that doesn't match, stop it because you know everything below that doesn't match.

I would be interested to see how this turned out. But maybe someone will have a better suggestion (and I'm willing to look like a fool if it's stupid for some reason).

0


source


Without additional context, it's impossible to tell why performance is so slow. However, I can offer some hints. If something slows down by about 500 positions, there may be an O (N ^ 2) algorithm out there (or worse) somewhere. I would suggest that one or more of your properties are traversing a large dataset during each comparison.

With properties in C #, it is difficult to understand how they are implemented, e.g. something as innocent as NumberOfDependancies can interfere with a very large graph every time it is called. Or it can generate a list to count the dependencies. If possible, I would calculate these values ​​once and store them in the class (if possible). However, when I see the context in which it is used, I see a different problem:

PassFilterVal(this.NumberOfDependancies, ObjectToEvaluate.ListOfParentDependancies.Count

      

If "ListOfParentDependancies" is IEnumerable <> then you will iterate over the list of all dependencies each time you call it to count the number.

Your filtering function is excellent in terms of performance. For elegance and a possible modest performance boost, I would implement it like this

IEnumerable<MyClass> FilterList(MyClass filter) {
    foreach (MyClass task in this)
       if (filter.TaskPassFilter(task))
         yield return task;
}     

      

0


source







All Articles