Optimizing Array Iteration with Nested Where Clauses (LINQ)

I am creating a tool (C #) that has a search function. Searching is like searching anywhere (like ReSharper has or VS2013).

The search context is a string array containing all the elements in front:

private string[] context; // contains thousands of elements

      

The search is incremental and occurs with each new input (character) the user provides.

I have implemented search using LINQ Where extension method:

// User searched for "c"
var input = "c";
var results = context.Where(s => s.Contains(input));

      

When the user searches for "ca" I tried to use the previous results as the search context, however this triggers (I think?) A nested Where iteration and doesn't work very well. Think something like this code:

// Cache these results.
var results = var results = context.Where(s => s.Contains(input));

// Next search uses the previous search results
var newResults = results.Where(s => s.Contains(input));

      

Is there a way to optimize this scenario?

Converting an IEnumerable to an array with each lookup causes large memory allocations and performs poorly.

+3


source to share


3 answers


Presenting thousands of search results to the user is pretty useless. You must add the "top" ( Take

in linq) statement to your query before presenting the result to the user.

var results = context.Where(s => s.Contains(input)).Take(100);

      

And if you want to present the following 100 results to the user:



var results = context.Where(s => s.Contains(input)).Skip(100).Take(100);

      

Also just use the original array for all searches, no nested ones Where

, as it has no benefit unless you materialize the query.

+1


source


I have a few useful points to add, too many to comment.

First of all, I agree with the other comments to start with .take(100)

, reducing load times. Better yet, add one result:

var results = context.Where(s => s.Contains(input));
var resultEnumerator = result.GetEnumerator()

      

Flip the resultEnumerator file to display the results one at a time, stop when the screen is full or a new search is started.

Second, throttle your input. If the user says Hello

, you do not want to shoot 5 searches H

, He

, Hel

, Hell

and Hello

you want to find only Hello

. When the user later adds world

, it would be wise to take your old result and add it Hello world

to the where clause.



results = results.Where(s => s.Contains(input));
resultEnumerator = result.GetEnumerator()

      

And of course, undo the current running result when the user adds new text.

Using Rx the throttle part is simple, you end up with something like this:

var result = context.AsEnumerable();
var oldStr = "";
var resultEnumerator = result.GetEnumerator();
Observable.FromEventPattern(h => txt.TextChanged += h, h => txt.TextChanged -= h)
         .Select(s => txt.Text)
         .DistinctUntilChanged().Throttle(TimeSpan.FromMilliseconds(300))
         .Subscribe(s =>
         {
             if (s.Contains(oldStr))
                 result = result.Where(t => t.Contains(s));
             else
                 result = context.Where(t => t.Contains(s));
             resultEnumerator = result.GetEnumerator();
             oldStr = s;
             // and probably start iterating resultEnumerator again,
             // but perhaps not on this thread.
         });

      

+1


source


If you are worried about problems with allocs and you do not want to write a trie implementation or use third-party code, you should get away with splitting your context array sequentially in order to link the matching entries in the front. Not very LINQ-ish, but fast and has zero memory cost.

C ++ based partition extension method std :: partition

/// <summary>
/// All elements for which predicate is true are moved to the front of the array.
/// </summary>
/// <param name="start">Index to start with</param>
/// <param name="end">Index to end with</param>
/// <param name="predicate"></param>
/// <returns>Index of the first element for which predicate returns false</returns>
static int Partition<T>(this T[] array, int start, int end, Predicate<T> predicate)
{
    while (start != end)
    {
        // move start to the first not-matching element
        while ( predicate(array[start]) )
        {
            if ( ++start == end )
            {
                return start;
            }
        }

        // move end to the last matching element
        do
        {
            if (--end == start)
            {
                return start;
            }
        }
        while (!predicate(array[end]));

        // swap the two
        var temp = array[start];
        array[start] = array[end];
        array[end] = temp;

        ++start;
    }
    return start;
}

      

So now you need to store the last index of the section to be initialized with context

length:

private int resultsCount = context.Length;

      

Then, for every input change that is incremental, you can run:

resultsCount = context.Partition(0, resultsCount, s => s.Contains(input));

      

Each time this will only perform checks on items that have not been previously filtered, this is exactly what you need.

For every non-incremental change, you will need to reset resultsCount

to the original value.

You can provide results in a convenient, debugger, and LINQ-friendly way:

public IEnumerable<string> Matches
{
    get { return context.Take(resultsCount); }
}

      

0


source







All Articles