Optimizing Array Iteration with Nested Where Clauses (LINQ)
I am creating a tool (C #) that has a search function. Searching is like searching anywhere (like ReSharper has or VS2013).
The search context is a string array containing all the elements in front:
private string[] context; // contains thousands of elements
The search is incremental and occurs with each new input (character) the user provides.
I have implemented search using LINQ Where extension method:
// User searched for "c"
var input = "c";
var results = context.Where(s => s.Contains(input));
When the user searches for "ca" I tried to use the previous results as the search context, however this triggers (I think?) A nested Where iteration and doesn't work very well. Think something like this code:
// Cache these results.
var results = var results = context.Where(s => s.Contains(input));
// Next search uses the previous search results
var newResults = results.Where(s => s.Contains(input));
Is there a way to optimize this scenario?
Converting an IEnumerable to an array with each lookup causes large memory allocations and performs poorly.
source to share
Presenting thousands of search results to the user is pretty useless. You must add the "top" ( Take
in linq) statement to your query before presenting the result to the user.
var results = context.Where(s => s.Contains(input)).Take(100);
And if you want to present the following 100 results to the user:
var results = context.Where(s => s.Contains(input)).Skip(100).Take(100);
Also just use the original array for all searches, no nested ones Where
, as it has no benefit unless you materialize the query.
source to share
I have a few useful points to add, too many to comment.
First of all, I agree with the other comments to start with .take(100)
, reducing load times. Better yet, add one result:
var results = context.Where(s => s.Contains(input));
var resultEnumerator = result.GetEnumerator()
Flip the resultEnumerator file to display the results one at a time, stop when the screen is full or a new search is started.
Second, throttle your input. If the user says Hello
, you do not want to shoot 5 searches H
, He
, Hel
, Hell
and Hello
you want to find only Hello
. When the user later adds world
, it would be wise to take your old result and add it Hello world
to the where clause.
results = results.Where(s => s.Contains(input));
resultEnumerator = result.GetEnumerator()
And of course, undo the current running result when the user adds new text.
Using Rx the throttle part is simple, you end up with something like this:
var result = context.AsEnumerable();
var oldStr = "";
var resultEnumerator = result.GetEnumerator();
Observable.FromEventPattern(h => txt.TextChanged += h, h => txt.TextChanged -= h)
.Select(s => txt.Text)
.DistinctUntilChanged().Throttle(TimeSpan.FromMilliseconds(300))
.Subscribe(s =>
{
if (s.Contains(oldStr))
result = result.Where(t => t.Contains(s));
else
result = context.Where(t => t.Contains(s));
resultEnumerator = result.GetEnumerator();
oldStr = s;
// and probably start iterating resultEnumerator again,
// but perhaps not on this thread.
});
source to share
If you are worried about problems with allocs and you do not want to write a trie implementation or use third-party code, you should get away with splitting your context array sequentially in order to link the matching entries in the front. Not very LINQ-ish, but fast and has zero memory cost.
C ++ based partition extension method std :: partition
/// <summary>
/// All elements for which predicate is true are moved to the front of the array.
/// </summary>
/// <param name="start">Index to start with</param>
/// <param name="end">Index to end with</param>
/// <param name="predicate"></param>
/// <returns>Index of the first element for which predicate returns false</returns>
static int Partition<T>(this T[] array, int start, int end, Predicate<T> predicate)
{
while (start != end)
{
// move start to the first not-matching element
while ( predicate(array[start]) )
{
if ( ++start == end )
{
return start;
}
}
// move end to the last matching element
do
{
if (--end == start)
{
return start;
}
}
while (!predicate(array[end]));
// swap the two
var temp = array[start];
array[start] = array[end];
array[end] = temp;
++start;
}
return start;
}
So now you need to store the last index of the section to be initialized with context
length:
private int resultsCount = context.Length;
Then, for every input change that is incremental, you can run:
resultsCount = context.Partition(0, resultsCount, s => s.Contains(input));
Each time this will only perform checks on items that have not been previously filtered, this is exactly what you need.
For every non-incremental change, you will need to reset resultsCount
to the original value.
You can provide results in a convenient, debugger, and LINQ-friendly way:
public IEnumerable<string> Matches
{
get { return context.Take(resultsCount); }
}
source to share