How can I find a faster algorithm for this particular case of the longest common subsequence (LCS)?

I know that the demand time of the LCS problem is ~ O (mn), where m and n are the lengths of two sequences X and Y respectively. But my problem is a little simpler, so I expect a faster algorithm than ~ O (mn).

Here's my problem:

Entrance:

positive integer Q, two sequences X = x1, x2, x3 ..... xn and Y = y1, y2, y3 ... yn, both of length n.

Output:

True, if the length of LCS X and Y is not less than n - Q;

False otherwise.

The well-known algorithm costs O (n ^ 2) here, but in fact we can do better. Because whenever we remove that many Q elements in any sequence, I cannot find the common element, the result returns False. Someone said there must be a non-O (Q * n) algorithm, but I can't figure it out.

UPDATE: Found the answer already!

I was told that I can just calculate the diagonal block of the table c [i, j], because if | ij | > Q, means that both sequences already have more than Q incomparable elements. Therefore, we only need to calculate c [i, j] when | ij | <= Q.

+3


source to share


2 answers


Here is one possible way to do this:
1. Suppose that f(prefix_len, deleted_cnt)

is the leftmost position in Y

, so that elements prefix_len

from X

have already been processed and exactly deleted_cnt

from them have been removed. Obviously, only states exist O(N * Q)

, because it deleted_cnt

cannot exceed Q

.
2. Base case f(0, 0) = 0

(nothing was processed, so nothing was removed).
3. Transitions:
a) Delete current element: f(i + 1, j + 1) = min(f(i + 1, j + 1), f(i, j))

.
b) Compare the current element with the leftmost element of possible Y

equal and disposed after it f(i, j)

(assume that it has an index pos

) f(i + 1, j) = min(f(i + 1, j), pos)

.
4. So the only question remains is how to get the leftmost matching element to the right of the given position. Let's precompute the following pairs: (position in Y

, element X

) → the leftmost occurrence of an element Y

, equal to this element to the X

right of this position in Y

and put them into a hash table. It looks like O(n^2)

. But no. For a fixed position in, Y

we no longer need to go to the right of it than to positions Q + 1

. What for? If we go further, we are missing more elements Q

! Therefore, we can use this fact to study only pairs O(N * Q)

and obtain the desired time complexity. When we have this hash table, searchpos

during step 3 there is only one hash table lookup. Here is the pseudo code for this step:

map = EmptyHashMap()
for i = 0 ... n - 1:
    for j = i + 1 ... min(n - 1, i + q + 1)
        map[(i, Y[j])] = min(map[(i, Y[j])], j)

      



Unfortunately this solution uses hash tables, so it has medium complexity O(N * Q)

on average, not worst case, but it should be feasible.

0


source


You can also say that the process cost for the string to be equal must not be greater than Q.if greater than Q than the answer must be false. (EDIT DISTANCE PROBLEM)

Suppose string x is m and string y size is n, then we create a two-dimensional array d [0..m] [0..n], where d [i] [j] denotes the edit distance between the i-length prefix prefix x and j-length y.

The calculation of the array d is done using dynamic programming, which uses the following repetition:



d[i][0] = i , for i <= m
d[0][j] = j , for j <= n

d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + 1), otherwise.

      

LCS response if m>n, m-dp[m][m-n]

0


source







All Articles