How to check if all substrings can be found in a dictionary

I have a problem that I want to solve as efficiently as possible. As an example, I am given a string of words: A B C D

and I have a "dictionary" with 5 entries:

A
B C
B D
D
E

      

The dictionary tells me what substrings can be in my input string. And I want to check as efficiently as possible if the entire input string can be split into substrings so that they are all found in the dictionary.

In the example, the input string can be found by dividing it by A

, B C

andD

I would like to know if there is a better way than just iterating over all possible substrings. The less I check to see if a substring is in the dictionary, the better.

It is not necessary to know which substrings cannot be found if there are no possible solutions.

Thank.

+3


source to share


3 answers


I would use a tree instead of a dictionary. This will improve search speed and eliminate search subtrees.



+1


source


If you can use the same substring multiple times, there is a natural dynamic programming solution for that.

Let n be the size of your string. Let v be a vector of size n such that v [i] = true if and only if the substring consisting of (ni) the last character of your original string can be split into your dictionary. Then you can fill the vector v backward, starting at the last index decreasing i at each step.



In pseudocode:

Let D be your dictionnary
Let s be your string
Let n be the size of s
(Let s[i:j] denote the substring of s made by characters between i and j (inclusive))
Let v be a vector of size n+1 filled with zeros
Let v[n] = 1
For int i = n-1; i>=0; i--
    For int j = i; j <=n-1; j++
        If (s[i:j] is in D) and (v[j+1] is equal to 1)
            v[i] = 1
            Exit the for loop
Return v[0]

      

0


source


You can run it in O(N^2)

with the following method.

First, store your entire line in a trie.

Second, use dynamic programming technique to solve your problem. For each position, i

we will calculate whether the substring of the first characters can be split i

into words from the dictionary (trie). We will use a forward-looking approach to dynamic programming for the sake of simplicity.

So, first we establish that the substring of the first characters 0

can be split. Then we repeat from 0

to N-1

. When we arrive at a position i

, we assume that we already know the answer for that position. If a split is possible, then we can go from this position and see which lines starting from this position are in three. For each such line, mark its end position. Using a trie, we can do this in O(N)

one iteration of the outer loop.

t = trie of given words
ans = {false}
ans[0] = true
for i=0..N-1
    if ans[i]   // if substring s[0]..s[i-1] can be split to given words
       cur = t.root
       for j=i to N-1    // go along all strings starting from position i
           cur=cur.child(s[j])   // move to the child in trie
            // therefore, the position cur corresponds to string
            // s[i]...s[j]
           if cur.isWordEnd    // if there is a word in trie that ends in cur
               ans[j+1] = true  // the string s[0]..s[j] can be split
your answer is in ans[N]

      

Total time O(N^2)

.

0


source







All Articles