Finding the longest common subsequence in O (NlogN) time

Is there a way to find the longest common subsequence of two sequences in O (NlogN) time?

I read somewhere that there is a way to achieve this using binary search.

I know of a dp approach that takes O (N 2 ) time.

+3


source to share


4 answers


In general, an O (N ^ 2) dynamic programming algorithm is the best you can do. However, in some special cases, better algorithms exist.

  • Alphabet size limited

This is a very common situation. This category contains sequences consisting of letters from some alphabet (for example, English). In this case, the O (N * M) algorithm can be optimized to get O (N ^ 2 / logN) with the four Russians method.I don't know how exactly, you might be looking for it.



  1. Both sequences are composed of separate elements

Example problem: "Given two permutations of numbers from 1 to N, find their LCS." This can be solved in O (N * logN).
Let the sequences be A and B.
Define the sequence C. C [i] is the index of B [i] in A. (A [C [i]] = B [i])
LCS A and B are the longest increasing subsequence of C.

+5


source


The dynamic programming approach , which is O (n 2 ), is completely general. In some other cases, there are algorithms with a lower degree of complexity:



  • For a fixed alphabet size (which does not grow with n), there is the Four Russians Method which brings the time down to O (n 2 / log n) (see here ).

  • See here for further optimized case.

+3


source


The longest common subsequence between two sequences is essentially n-square.

Masek and Patterson (1980) made a slight improvement in n-squared / log n using the so-called " Four Russians ".

In most cases, the added complexity resulting from such convoluted approaches is not justified by the small gains. For practical purposes, you might consider the n-square approach as a reasonable optimum in typical applications.

+1


source


Assuming the exponential time hypothesis (which is stricter than P, is not NP, but is still believed to be true), it cannot be done in O (N ^ {2 - eps}) time for any positive constant eps. see "Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping" by Karl Bringmann and Marvin Kunneman (preprint available on arXiv).

Roughly speaking, this means that the general case of this problem cannot be solved in time better than something like O (N ^ 2 / log N), so if you want faster algorithms you should consider additional constraints (some properties of the string ) or look for an approximate solution.

+1


source







All Articles