Algorithm for finding the maximum value of a substring

Given string S, what's the best algorithm to find a substring that repeats the maximum number of times.

For example, in "assdssfssd" it is "ss", which is repeated the maximum number of times.

+1


source to share


5 answers


I can see how to build a tree to solve this specific problem.

There is a conditional root node. The first character is the first child. The second character is the child of the first character a -> s in your case. It also starts a new leaf of the root node. If, when adding a node, you visit an existing node, you increase its count (initial value 1).



Once you're done, you visit each node of the tree to find the one with the highest score at the deepest level (because if "asdf" happens 5 times then "a", "as" and "asd" happen at least 5 times , a-priory).

+2


source


The substring that repeats the most will be one letter, so you will find the letter that occurs the most. It's pretty simple:



>>> str = 'Can Berk Gรผder'
>>> letters = [l for l in str]
>>> uniq_letters = set(letters)
>>> counts = [(letters.count(l), l) for l in uniq_letters]
>>> counts
[(1, 'B'), (1, 'C'), (1, 'G'), (1, 'a'), (1, 'd'), (1, 'k'), (1, 'n'), (1, 'รผ'), (2, ' '), (2, 'e'), (2, 'r')]

      

+1


source


It looks like you are looking for something close to a compression algorithm. Compression works by looking for redundant (repetitive) information and replacing it with a pointer to the first occurrence. Here are some code examples for this:

http://www.developerfusion.com/code/1642/string-compression/

http://www.howtodothings.com/computers/a1223-simple-string-compression.html

+1


source


// C# code, finds one of the most occurred non-empty substrings in O(n), proof by the reader!
int[] x = new int[65536];
foreach (char c in myString)
     x[(int)c]++;
int max = 0;
for (int i = 0; i < x.Length; ++i)
    if (x[max] < x[i])
        max = i;
return ((char)max).ToString();

      

This is probably not what you want. You might have to look at something like Huffman coding ...

0


source


In the "N" lnghted Strings,

   No Of "1" character will be "N" which requires comparision of N * (N-1) / 2

   No of "2" characters will be "N-1" which requires comparision of (N-1) * (N-2) / 2


   No of "3" characters will be "N-2"  which requires comparision of (N-2) * (N-3) / 2

      

.............

and the number of characters "N" is "1" for which comparison is required (1 * 0/2)

Hence, No Of Max Substrings = "N" + "N-1" + .... "1" = (N * (N + 1) / 2) and the required comparisons are (N + 1) * (N) * ( N-1) / 6

If you are doing Bucket placement (not sorting) for each character of the same size, then

   No Of "1" character will be "N" which requires comparision of N -1 with buckets of N

   No of "2" characters will be "N-1" which requires comparision of (N-2) with Buckets of N-1

   No of "3" characters will be "N-2"  which requires comparision of (N-3) with Buckets of N-2

      

.............

and the number of characters "N" will be "1", which requires comparison of 0 with bucket 1

Here he reduces general comparisons to "N * (N-1) / 2"

Finally, after placing the bucket, take the highest numbered bucket for your answer.

0


source







All Articles