Getting the number of unique strings from the list <string []> in the dictionary

I want to enter List<string[]>

and

The output is a dictionary where the keys are unique strings used for the index and the values ​​are an array of floats with each position in the array representing the key counter for string[]

inList<string[]>

So far I have tried

static class CT
{
    //Counts all terms in array
    public static Dictionary<string, float[]> Termfreq(List<string[]> text)
    {
        List<string> unique = new List<string>();

        foreach (string[] s in text)
        {
            List<string> groups = s.Distinct().ToList();
            unique.AddRange(groups);
        }

        string[] index = unique.Distinct().ToArray();

        Dictionary<string, float[]> countset = new Dictionary<string, float[]>();


         return countset;
    }

}



 static void Main()
    {
        /* local variable definition */


        List<string[]> doc = new List<string[]>();
        string[] a = { "That", "is", "a", "cat" };
        string[] b = { "That", "bat", "flew","over","the", "cat" };
        doc.Add(a);
        doc.Add(b);

       // Console.WriteLine(doc);


        Dictionary<string, float[]> ret = CT.Termfreq(doc);

        foreach (KeyValuePair<string, float[]> kvp in ret)
        {
            Console.WriteLine("Key = {0}, Value = {1}", kvp.Key, kvp.Value);

        }


        Console.ReadLine();

    }

      

I am stuck on the dictionary part. What's the most efficient way to do this?

+3


source to share


1 answer


It looks like you could use something like:

var dictionary = doc
    .SelectMany(array => array)
    .Distinct()
    .ToDictionary(word => word,
                  word => doc.Select(array => array.Count(x => x == word))
                             .ToArray());

      

In other words, first find a different set of words, then create a match for each word.



To create a match, look at each array in the original document and find the number of occurrences of a word in that array. (Thus, each array is mapped to int

.) Use LINQ to perform this mapping throughout the document by ToArray

generating int[]

for a specific word ... and what is the meaning for that dictionary entry of the word.

Note that this creates Dictionary<string, int[]>

, not Dictionary<string, float[]>

- it seems more reasonable to me, but you can always distinguish the result from Count

before float

if you really wanted to.

+4


source







All Articles