Count occurrences of words in a textbox with LINQ

How can I get the number of instances in Word in a C LINQ database textbox?

Example Keyword Token: ASP.NET

EDIT 4:

Database records:

Entry 1: [TextField] = "Blah blah blah ASP.NET bli bli bli ASP.NET blu ASP.NET yop yop ASP.NET "

Entry 2: [TextField] = "Blah blah blah bli bli bli blu ASP.NET yop yop ASP.NET "

Entry 3: [TextField] = "Blah ASP.NET blah ASP.NET blah ASP.NET bli ASP. NET bli bli ASP.NET blu ASP.NET yop yop ASP.NET "

So,

Entry 1 Contains 4 occurrences of the keyword "ASP.NET"

Entry 2 Contains 2 occurrences of the keyword "ASP.NET"

Entry 3 Contains 7 keywords "ASP.NET"

Retrieving the IList <RecordModel> collection (sorted in descending order of words)

  • Entry 3
  • Entry 1
  • Entry 2

LinqToSQL should be the best, but LinqToObject too :)

NB: No question about "." ASP.NET keyword (that's not the target if this is the question)

+2


source to share


5 answers


Process this expression regularly. You can use a metacharacter \b

to anchor a word boundary and escape the keyword to avoid inadvertent regex special characters. It also handles trailing cases, commas, etc.

string[] records =
{
    "foo ASP.NET bar", "foo bar",
    "foo ASP.NET? bar ASP.NET",
    "ASP.NET foo ASP.NET! bar ASP.NET",
    "ASP.NET, ASP.NET ASP.NET, ASP.NET"
};
string keyword = "ASP.NET";
string pattern = @"\b" + Regex.Escape(keyword) + @"\b";
var query = records.Select((t, i) => new
            {
                Index = i,
                Text = t,
                Count = Regex.Matches(t, pattern).Count
            })
            .OrderByDescending(item => item.Count);

foreach (var item in query)
{
    Console.WriteLine("Record {0}: {1} occurrences - {2}",
        item.Index, item.Count, item.Text);
}

      



Voila! :)

+3


source


Edit 2: I see that you updated the question, changed things a bit, the number of words per word eh? Try the following:

string input = "some random text: how many times does each word appear in some random text, or not so random in this case";
char[] separators = new char[]{ ' ', ',', ':', ';', '?', '!', '\n', '\r', '\t' };

var query = from s in input.Split( separators )
            where s.Length > 0
            group s by s into g
            let count = g.Count()
            orderby count descending
            select new {
                Word = g.Key,
                Count = count
            };

      



Since you want words that can have "." in them (for example, "ASP.NET"). I removed this from the delimiter list, unfortunately, which would pollute some words as a sentence like "Blah blah blah blah blah blah". will show "blah" with a score of 3 and "blah". with a score of 2. You need to think about what kind of cleaning strategy you want here, for example. if a "." has a letter on both sides, it is considered part of the word, otherwise it is a space. This logic is best done with some RegEx.

+4


source


Use String.Split () to turn a string into an array of words, then use LINQ to filter that list to return only the words you want, then check the number of results, like this:

myDbText.Split(' ').Where(token => token.Equals(word)).Count();

      

+1


source


You could Regex.Matches(input, pattern).Count

or could do the following:

int count = 0; int startIndex = input.IndexOf(word);
while (startIndex != -1) { ++count; startIndex = input.IndexOf(word, startIndex + 1); }

      

using LINQ here would be ugly

0


source


I know this is more than the original question, but it is still relevant, and I am including it for others looking for this question later. This does not require all words to be matched in the search strings, however it can be easily modified using the code from Ahmad's post.

//use this method to order objects and keep the existing type
class Program
{
  static void Main(string[] args)
  {
    List<TwoFields> tfList = new List<TwoFields>();
    tfList.Add(new TwoFields { one = "foo ASP.NET barfoo bar", two = "bar" });
    tfList.Add(new TwoFields { one = "foo bar foo", two = "bar" });
    tfList.Add(new TwoFields { one = "", two = "barbarbarbarbar" });

    string keyword = "bar";
    string pattern = Regex.Escape(keyword);
    tfList = tfList.OrderByDescending(t => Regex.Matches(string.Format("{0}{1}", t.one, t.two), pattern).Count).ToList();

    foreach (TwoFields tf in tfList)
    {
      Console.WriteLine(string.Format("{0} : {1}", tf.one, tf.two));
    }

    Console.Read();
  }
}


//a class with two string fields to be searched on
public class TwoFields
{
  public string one { get; set; }
  public string two { get; set; }
}

      

...

//same as above, but uses multiple keywords
class Program
{
  static void Main(string[] args)
  {
    List<TwoFields> tfList = new List<TwoFields>();
    tfList.Add(new TwoFields { one = "one one, two; three four five", two = "bar" });
    tfList.Add(new TwoFields { one = "one one two three", two = "bar" });
    tfList.Add(new TwoFields { one = "one two three four five five", two = "bar" });

    string keywords = " five one    ";
    string keywordsClean = Regex.Replace(keywords, @"\s+", " ").Trim(); //replace multiple spaces with one space

    string pattern = Regex.Escape(keywordsClean).Replace("\\ ","|"); //escape special chars and replace spaces with "or"
    tfList = tfList.OrderByDescending(t => Regex.Matches(string.Format("{0}{1}", t.one, t.two), pattern).Count).ToList();

    foreach (TwoFields tf in tfList)
    {
      Console.WriteLine(string.Format("{0} : {1}", tf.one, tf.two));
    }

    Console.Read();
  }
}

public class TwoFields
{
  public string one { get; set; }
  public string two { get; set; }
}

      

0


source







All Articles