How to extract phrases and then words into a string of text?
I have a search method that takes a user-entered string, splits it at each whitespace, and then proceeds to find matches based on a list of shared terms:
string[] terms = searchTerms.ToLower().Trim().Split( ' ' );
Now I am given one more requirement: to be able to search for phrases through double quote delimiters a la Google. Therefore, if search terms are provided:
"string" text
The search will match occurrences of "string" and "text" rather than four separate terms [open and close double quotes must also be removed before searching].
How can I achieve this in C #? I would suggest regex is the way to go, but not too much of it, so don't know if they are the best solution.
If you need more information, please ask. Thanks in advance for your help.
source to share
Regular expressions will definitely be the way ...
You should check this MSDN link for information on the Regex class: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
and here is a great link for learning the regex syntax: http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx
Then, to add code examples, you can do something along these lines:
string searchString = "a line of";
Match m = Regex.Match(textToSearch, searchString);
or if you just want to know if a string contains a string or not:
bool success = Regex.Match(textToSearch, searchString).Success;
source to share
Use regular expressions ...
string textToSearchIn = "string" text ",
string result = Regex.Match (textToSearchIn," (? <= "). *? (? =") "). Value,
or if there is more than one, put that in a match collection ...
MatchCollection allPhrases = Regex.Matches (textToSearchIn, "(? <="). *? (? = ")");
source to share
Knuth-Morris-Pratt (KMP algorithm) is recognized as the fastest algorithm for finding substrings in strings (well, technically not strings, but byte arrays).
using System.Collections.Generic;
namespace KMPSearch
{
public class KMPSearch
{
public static int NORESULT = -1;
private string _needle;
private string _haystack;
private int[] _jumpTable;
public KMPSearch(string haystack, string needle)
{
Haystack = haystack;
Needle = needle;
}
public void ComputeJumpTable()
{
//Fix if we are looking for just one character...
if (Needle.Length == 1)
{
JumpTable = new int[1] { -1 };
}
else
{
int needleLength = Needle.Length;
int i = 2;
int k = 0;
JumpTable = new int[needleLength];
JumpTable[0] = -1;
JumpTable[1] = 0;
while (i <= needleLength)
{
if (i == needleLength)
{
JumpTable[needleLength - 1] = k;
}
else if (Needle[k] == Needle[i])
{
k++;
JumpTable[i] = k;
}
else if (k > 0)
{
JumpTable[i - 1] = k;
k = 0;
}
i++;
}
}
}
public int[] MatchAll()
{
List<int> matches = new List<int>();
int offset = 0;
int needleLength = Needle.Length;
int m = Match(offset);
while (m != NORESULT)
{
matches.Add(m);
offset = m + needleLength;
m = Match(offset);
}
return matches.ToArray();
}
public int Match()
{
return Match(0);
}
public int Match(int offset)
{
ComputeJumpTable();
int haystackLength = Haystack.Length;
int needleLength = Needle.Length;
if ((offset >= haystackLength) || (needleLength > ( haystackLength - offset)))
return NORESULT;
int haystackIndex = offset;
int needleIndex = 0;
while (haystackIndex < haystackLength)
{
if (needleIndex >= needleLength)
return haystackIndex;
if (haystackIndex + needleIndex >= haystackLength)
return NORESULT;
if (Haystack[haystackIndex + needleIndex] == Needle[needleIndex])
{
needleIndex++;
}
else
{
//Naive solution
haystackIndex += needleIndex;
//Go back
if (needleIndex > 1)
{
//Index of the last matching character is needleIndex - 1!
haystackIndex -= JumpTable[needleIndex - 1];
needleIndex = JumpTable[needleIndex - 1];
}
else
haystackIndex -= JumpTable[needleIndex];
}
}
return NORESULT;
}
public string Needle
{
get { return _needle; }
set { _needle = value; }
}
public string Haystack
{
get { return _haystack; }
set { _haystack = value; }
}
public int[] JumpTable
{
get { return _jumpTable; }
set { _jumpTable = value; }
}
}
}
Usage: -
using System;
using System.Collections.Generic;
using System.Text;
using System.Reflection;
namespace KMPSearch
{
class Program
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: " + Environment.GetCommandLineArgs()[0] + " haystack needle");
}
else
{
KMPSearch search = new KMPSearch(args[0], args[1]);
int[] matches = search.MatchAll();
foreach (int i in matches)
Console.WriteLine("Match found at position " + i+1);
}
}
}
}
source to share
Try this, it will return an array for text. ex: {"string of" text "notepad}}:
string textToSearch = "\"a line of\" text \" notepad\"";
MatchCollection allPhrases = Regex.Matches(textToSearch, "(?<=\").*?(?=\")");
var RegArray = allPhrases.Cast<Match>().ToArray();
output: {"string", "text", "notepad"}
source to share