Difficulty calculating the similarity between two sequences
What is the computational complexity of the most well-known algorithm for computing the similarity between two sequences (as in DNA or protein alignment / string approximation)?
The similarity is based on:
-
by adjusting the alignment using substitution substitution matrices (for global or positional substitutions of 20 characters in the protein alphabet or 4 characters in the DNA alphabet)
Is the Burrows-Wheeler linear time transform used in Bowtie and BWA short-term readers equalizing the actual state of affairs, or are there sublinear algorithms the same problem?
[Edit]: thinking about applying LSH for an approximate match that will be sublinear, assuming preprocessing / indexing of the referenced dataset
+3
source to share