Algorithm implementation z

Since 4 days I have been reading about strings and some algorithms for pattern matching and for that I got a KMP algo lookup and that was good, but I also realized that there is another method for string matching which is the same as KMP in space and complexity time, but has an easy solution.

The algorithm was the Z-algorithm.

So, I searched google for this, but I didn't find a good explanation for algo. Could you please explain how to create an array of templates and how to apply the search procedure? It would be nice if you provide the code in C ++.

+3


source to share


2 answers


After a long time, I figured out how to construct an array Z. I will explain here in light words.

Let's first understand what a prefix is:

Example:

  • In apple, the prefix can be apple (or) app (or) app (or) ap (or) a.

  • In the word "banana" the prefix can be banana (or) banana (or) banana (or) prohibition (or) ba (or) b.

Explanation: Any substring S of string T that matches the beginning of string T to the end of string T or to the end is called as a prefix.

I hope you understand what the prefix is ​​here.

Let's see how to build a Z array.

Let's take this example text: aab $ baabaa

Index: 0 1 2 3 4 5 6 7 8 9

Text: aab $ baabaa

Z-value: x 1 0 0 0 3 1 0 2 1

Note:

and. The substring must start at the i-th position.

b. substring must be of maximum length, which is also a prefix

  • At index 0:

Search for a substring from i (0th index) to the end, which is also the prefix of the given text.

aab $ baabaa => of length 10 is the longest substring, which is also a prefix of the text. But that won't help with pattern matching, so we do it as x in array Z.

  1. Index 1:

Finds the longest substring from position 1 to the end, which is also the prefix of the text.

such a substring:

and. prefix "a" => text "aab $ baaaa a" and the length is 1.

b. "a b" => Not a prefix



from. "ab $" => Not a prefix

e. "ab $ b" => Not prefix

e. "ab $ b a" => Not a prefix

so...

Here the single longest substring, which is also a prefix, is "a", and its length is 1. It is stored in array Z.

  1. In index 2:

Substrings:

and. "b" => Not prefix

b. "b $" => Not a prefix

from. "b $ b" => Not a prefix

so...

There is no substring here, which is also a prefix of the text T. So we store zero at index 2 in array Z.

  1. At index 5: Substrings

:

and. prefix "a" => text "aab $ baaaa a" and the length is 1.

b. "a" => the text prefix is ​​"aab $ baaaa a" and the length is 2.

from. "aa b" => the text prefix is ​​"aab $ baaaa a", and the length is 3 ..

etc. "aab a" => Not a prefix

so...

Here, the longest substring, which is also a prefix of the text T, is "a a" of length 3. So we store 3 at index 5 in the Z array.

Finally, if any value in the Z array matches the length of the pattern, then that pattern is present in the text T.

+5


source


In Z-algo, we build an array Z.

What is a Z-array? For string str [0..n-1], array Z has the same length as the string. Element Z [i] of array Z stores the length of the longest substring, starting with str [i], which is also prefixed with str [0..n-1]. The first entry in the Z array means a lower value because the full string is always prefixed.

> Example: Index            0   1   2   3   4   5   6   7   8   9  10  11
> Text                      a   a   b   c   a   a   b   x   a   a   a   z   
> values         X          1   0   0   3   1   0   0   2   2   1   0  More
> Examples: str  = "aaaaaa" Z[]  = {x, 5, 4, 3, 2, 1}
> 
> str = "aabaacd" Z[] = {x, 1, 0, 2, 1, 0, 0}
> 
> str = "abababab" Z[] = {x, 0, 6, 0, 4, 0, 2, 0}

      



The idea is to concatenate the pattern and text and create the string "P $ T", where P is the pattern, $ is a special character that should not be present in the pattern and text, and T is the text. Build a Z array for the concatenated string. In an array Z, if the Z value at any point is equal to the length of the pattern, then the pattern is present at that point.

Example:
Pattern P = "aab",  Text T = "baabaa"

The concatenated string is = "aab$baabaa"

Z array for above concatenated string is {x, 1, 0, 0, 0, 
                                          3, 1, 0, 2, 1}.
Since length of pattern is 3, the value 3 in Z array 
indicates presence of pattern. 

      

Detailed explanation and implementation you can find here

+1


source







All Articles