How can I find a faster algorithm for this particular case of the longest common subsequence (LCS)?
I know that the demand time of the LCS problem is ~ O (mn), where m and n are the lengths of two sequences X and Y respectively. But my problem is a little simpler, so I expect a faster algorithm than ~ O (mn).
Here's my problem:
Entrance:
positive integer Q, two sequences X = x1, x2, x3 ..... xn and Y = y1, y2, y3 ... yn, both of length n.
Output:
True, if the length of LCS X and Y is not less than n - Q;
False otherwise.
The well-known algorithm costs O (n ^ 2) here, but in fact we can do better. Because whenever we remove that many Q elements in any sequence, I cannot find the common element, the result returns False. Someone said there must be a non-O (Q * n) algorithm, but I can't figure it out.
UPDATE: Found the answer already!
I was told that I can just calculate the diagonal block of the table c [i, j], because if | ij | > Q, means that both sequences already have more than Q incomparable elements. Therefore, we only need to calculate c [i, j] when | ij | <= Q.
source to share
Here is one possible way to do this:
1. Suppose that f(prefix_len, deleted_cnt)
is the leftmost position in Y
, so that elements prefix_len
from X
have already been processed and exactly deleted_cnt
from them have been removed. Obviously, only states exist O(N * Q)
, because it deleted_cnt
cannot exceed Q
.
2. Base case f(0, 0) = 0
(nothing was processed, so nothing was removed).
3. Transitions:
a) Delete current element: f(i + 1, j + 1) = min(f(i + 1, j + 1), f(i, j))
.
b) Compare the current element with the leftmost element of possible Y
equal and disposed after it f(i, j)
(assume that it has an index pos
) f(i + 1, j) = min(f(i + 1, j), pos)
.
4. So the only question remains is how to get the leftmost matching element to the right of the given position. Let's precompute the following pairs: (position in Y
, element X
) → the leftmost occurrence of an element Y
, equal to this element to the X
right of this position in Y
and put them into a hash table. It looks like O(n^2)
. But no. For a fixed position in, Y
we no longer need to go to the right of it than to positions Q + 1
. What for? If we go further, we are missing more elements Q
! Therefore, we can use this fact to study only pairs O(N * Q)
and obtain the desired time complexity. When we have this hash table, searchpos
during step 3 there is only one hash table lookup. Here is the pseudo code for this step:
map = EmptyHashMap()
for i = 0 ... n - 1:
for j = i + 1 ... min(n - 1, i + q + 1)
map[(i, Y[j])] = min(map[(i, Y[j])], j)
Unfortunately this solution uses hash tables, so it has medium complexity O(N * Q)
on average, not worst case, but it should be feasible.
source to share
You can also say that the process cost for the string to be equal must not be greater than Q.if greater than Q than the answer must be false. (EDIT DISTANCE PROBLEM)
Suppose string x is m and string y size is n, then we create a two-dimensional array d [0..m] [0..n], where d [i] [j] denotes the edit distance between the i-length prefix prefix x and j-length y.
The calculation of the array d is done using dynamic programming, which uses the following repetition:
d[i][0] = i , for i <= m
d[0][j] = j , for j <= n
d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + 1), otherwise.
LCS response if m>n, m-dp[m][m-n]
source to share