How can I find a faster algorithm for this particular case of the longest common subsequence (LCS)?
I know that the demand time of the LCS problem is ~ O (mn), where m and n are the lengths of two sequences X and Y respectively. But my problem is a little simpler, so I expect a faster algorithm than ~ O (mn).
Here's my problem:
Entrance:
positive integer Q, two sequences X = x1, x2, x3 ..... xn and Y = y1, y2, y3 ... yn, both of length n.
Output:
True, if the length of LCS X and Y is not less than n - Q;
False otherwise.
The well-known algorithm costs O (n ^ 2) here, but in fact we can do better. Because whenever we remove that many Q elements in any sequence, I cannot find the common element, the result returns False. Someone said there must be a non-O (Q * n) algorithm, but I can't figure it out.
UPDATE: Found the answer already!
I was told that I can just calculate the diagonal block of the table c [i, j], because if | ij | > Q, means that both sequences already have more than Q incomparable elements. Therefore, we only need to calculate c [i, j] when | ij | <= Q.
Here is one possible way to do this:
1. Suppose that f(prefix_len, deleted_cnt)
is the leftmost position in Y
, so that elements prefix_len
from X
have already been processed and exactly deleted_cnt
from them have been removed. Obviously, only states exist O(N * Q)
, because it deleted_cnt
cannot exceed Q
.
2. Base case f(0, 0) = 0
(nothing was processed, so nothing was removed).
3. Transitions:
a) Delete current element: f(i + 1, j + 1) = min(f(i + 1, j + 1), f(i, j))
.
b) Compare the current element with the leftmost element of possible Y
equal and disposed after it f(i, j)
(assume that it has an index pos
) f(i + 1, j) = min(f(i + 1, j), pos)
.
4. So the only question remains is how to get the leftmost matching element to the right of the given position. Let's precompute the following pairs: (position in Y
, element X
) → the leftmost occurrence of an element Y
, equal to this element to the X
right of this position in Y
and put them into a hash table. It looks like O(n^2)
. But no. For a fixed position in, Y
we no longer need to go to the right of it than to positions Q + 1
. What for? If we go further, we are missing more elements Q
! Therefore, we can use this fact to study only pairs O(N * Q)
and obtain the desired time complexity. When we have this hash table, searchpos
during step 3 there is only one hash table lookup. Here is the pseudo code for this step:
map = EmptyHashMap()
for i = 0 ... n - 1:
for j = i + 1 ... min(n - 1, i + q + 1)
map[(i, Y[j])] = min(map[(i, Y[j])], j)
Unfortunately this solution uses hash tables, so it has medium complexity O(N * Q)
on average, not worst case, but it should be feasible.
You can also say that the process cost for the string to be equal must not be greater than Q.if greater than Q than the answer must be false. (EDIT DISTANCE PROBLEM)
Suppose string x is m and string y size is n, then we create a two-dimensional array d [0..m] [0..n], where d [i] [j] denotes the edit distance between the i-length prefix prefix x and j-length y.
The calculation of the array d is done using dynamic programming, which uses the following repetition:
d[i][0] = i , for i <= m
d[0][j] = j , for j <= n
d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + 1), otherwise.
LCS response if m>n, m-dp[m][m-n]