Word decomposition algorithm

Let A be a finite alphabet and A * the set of all finite strixes that you can form from that alphabet. Then you are assigned a set F of some strings from A * (typically much less than A *), which contains the alphabet (A) itself and the string S carrying A *.

Is there a polynomial time algorithm to find the minimum decomposition of S in terms of the strings belonging to F?

As a result of the decomposition, I mean the notation S = F 1... F k concatenated strings F jthat belong to F and you are trying to minimize the number k.

+3


source to share


1 answer


We can solve this in polynomial time using dynamic programming as follows.

Let D [i] be the minimal decomposition of the first letters i of your string S.

If n is the length of S and f is the total length of all strings in F, we can compute D in O (nf) time by iterating over i.

For each value of i, we try to use all the options in F for the end of the line and choose the one that results in the smallest expansion.

Once we do that, the answer to the original question is in D [n].



Python code

F=['abcd','de','a','b','c','d']
S='abcde'

D = [ [] ]
for i in range(1,len(S)+1):
    s = S[:i]
    best = None
    for f in F:
        if s.endswith(f):
            a = D[i-len(f)]
            if a is None:
                continue
            if best is None or len(a) < len(best):
                best = a + [f]
    D.append(best)

print D[len(S)]

      

prints:

['a', 'b', 'c', 'de']

      

+2


source







All Articles