# Number of longest common subsequences

I need to find the number of distinct longest common subsequences between two strings A and B. I am currently using a normal dynamic programming approach and then generate all the different substrings using a backtracking array and then doing the first depth search from the starting index.

However, since there are so many possible answers, my code is too slow. Is there a way to count the number of such individual longest common subsequences without generating them at all?

My code:

``````import java.io.BufferedReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Stack;

class Node
{
String res = "";
int i;
int j;

public Node( int _i, int _j, String s )
{
i = _i;
j = _j;
res = s;
}
}

public class LCSRevisited
{
static String a;
static String b;
static int m,n;
static int[][] memo;
static int[][] bt; // 1 means [i+1][j], 2 means [i][j+1], 3 means [i+1][j+1]
// 4  - means both

static HashSet <String> filter;

static void printAllStrings( )
{
Iterator i = filter.iterator();

while( i.hasNext())
{
System.out.println( i.next() );
}
}

static void printSol()
{
System.out.print( memo[ 0 ][ 0 ]);

// check how many UNIQUE such strings exist

filter = new HashSet();
Stack<Node> s = new Stack();
Node start = new Node( 0, 0, "" );
s.push( start );
Node curr;
String res;

// use backtrack array to do a DFS

while( !s.isEmpty() )
{
curr = s.pop();
res = curr.res;

if( ( curr.i>=m) || ( curr.j >=n ) )
{
continue;
}

// check backtrack value
int i = curr.i;
int j = curr.j;
int back = bt[ i ][ j];

if( back == 1 )
{
s.push( new Node( i+1, j, res ));
}
if( back == 2 )
{
s.push( new Node( i, j+1, res ));
}
if( back == 3 )
{
s.push( new Node( i+1, j+1, res+a.charAt(i) ));
}
if( back == 4 )
{
s.push( new Node( i, j+1, res ));
s.push( new Node( i+1, j, res ));
}
}
//printAllStrings();
System.out.println(" " + filter.size() );
}

static void solve()
{
// fill base cases
m = a.length();
n = b.length();
memo = new int[ m+1 ][ n+1 ];
Arrays.fill( memo[m], 0 );

bt = new int[ m+1 ][ n+1 ];

for( int i=0; i<m; i++ )
{
memo[ i ][ n ] = 0;
}

// Now compute memo values
for( int i=m-1; i>=0; i-- )
{
for( int j=n-1; j>=0; j-- )
{
if( a.charAt(i) == b.charAt(j))
{
memo[ i ][ j ] = 1 + memo[ i+1 ][ j+1 ];
bt[ i ][ j ] = 3;
}
else
{
int r1 = memo[ i+1 ][ j ];
int r2 = memo[ i ][ j+1 ];

if( r1==r2 )
{
memo[ i ][ j ] = r1;
bt[ i ][ j ] = 4;
}
else if( r1 > r2 )
{
memo[ i ][ j ] = r1;
bt[ i ][ j ] = 1;
}
else
{
memo[ i ][ j ] = r2;
bt[ i ][ j ] = 2;
}
}
}
}

printSol();
}

public static void main( String[] args ) throws IOException
{

while( T--> 0 )
{

if( T>=1 )

solve();
// printArr( bt );
}
}
}
```

```
+3

source to share

I think you can use a hash roll function like Rabin Karp's. This way, you can calculate new hash values ​​of longer common subsequences without having to re-create the entire string and hash.

In fact, I think you can use pure DP to find your answer. Let's `memo[][]`

say you have already calculated the DP table values ​​for LCS ( in your code, I think). Then you can calculate the number of different LCS instances like

``````for j ← 0 to n do
for i ← 0 to m do
if i = 0 or j = 0 then
D[i, j] ← 1
else
D[i, j] ← 0
if ai  = bj  then
D[i, j] ← D[i − 1, j − 1]
else if L[i − 1, j] = L[i, j] then
D[i, j] ← D[i, j] + D[i − 1, j]
endif
if L[i, j − 1] = L[i, j] then
D[i, j] ← D[i, j] + D[i, j − 1]
endif
if L[i − 1, j − 1] = L[i, j] then
D[i, j] ← D[i, j] − D[i − 1, j − 1]
endif
end if
endfor
endfor
```

```

+1

source

Perhaps using a kind of trie will help you generate the actual sequences when calculating the length and after calculating the total with a traversal in trie (linear time).

Memo [i] [j] now represents the length of the common subsequence of A [i ... m] and B [j..n]. I suggest that you also have lp [i] [j] representing a list of pointers to a node in a trie, so that the path from that node to the root of the trie gives you one of the longest common subseqs for A [i ... m] and B [j..n]. To build this lp, for cases 1 and 2, you just copy the lists from lp [i + 1] [j] or lp [i] [j + 1] as for case 4, and for 3 you add a new node in the tree with the value A [i] = B [j], and all the nodes pointed to by lp [i + 1] [j + 1] are specified in the tree as sons of the new node. These operations would be linear (or maybe even faster with some advanced data structures for working with sets). Note that what I described is not actually a trie / tree (a node can have multiple parents).

Finally, for counting, I think a workaround would be good, with some extra handling - spreading counts: count [node] [level] = sum (count [sons (node)] [level- 1]) or if node is count leaves [node]  = 1, count [node] [l! = 1] = 0. And your answer will sum the counts for those "root" nodes (to the power of 0) that satisfy the "longest" constraint (\ sum_x {count [x] [l_max]}).

I'm not 100% sure of my solution, but it might be a good start to improve the answer to your problem.

0

source

All Articles