Finding DNA in the prologue

I am trying to learn basic Prolog. I've read some basic tutorials on basic list structures, variables and if /s and logic. The project I am trying to do to help learn about this is to match the DNA sequences.

Essentially, I want it to match the reverse compliments of the DNA sequences.

Examples of outputs can be seen below:

?- dnamatch([t, t, a, c],[g, t, a, a]).
true

      

While this is most likely relatively straightforward, being newer to Prolog, I am currently figuring it out.

I started by defining basic matching rules for DNA pairs:

pair(a,t).
pair(g,c).
etc...

      

Then I tried to implement this in lists somehow, but I'm not sure how to make this logic apply to longer sequence lists. I'm not sure if my attempts to start are even the correct approach. Any help would be appreciated.

+3


source to share


3 answers


Since your relationship describes lists, you can choose to use DCG. You can describe complementary nucleobases, for example:

complementary(t) -->    % thymine is complementary to
  [a].                  % adenine
complementary(a) -->    % adenine is complementary to
  [t].                  % thymine
complementary(g) -->    % guanine is complementary to
  [c].                  % cytosine
complementary(c) -->    % cytosine is complementary to
  [g].                  % guanine

      

This matches your / 2 predicate pair. To reverse the binding sequence, you can proceed as follows:

bond([]) -->            % the empty sequence
  [].                   % doesn't bond
bond([A|As]) -->        % the sequence [A|As] bonds with
  bond(As),             % a bonding sequence to As (in reverse order)
  complementary(A).     % followed by the complementary nucleobase of A

      

The reverse order is achieved by first writing a recursive target and then a target that describes the complementary nuclease base up to the one at the beginning of the list. You can request this with the / 2 phrase like this:

   ?- phrase(bond([t,t,a,c]),S).
S = [g,t,a,a]

      



Or, you can use a single-target wrapper predicate containing the / 2 phrase:

seq_complseq(D,M) :-
  phrase(bond(D),M).

      

And then request it:

   ?- seq_complseq([t,t,a,c],C).
C = [g,t,a,a]

      

I find that the description of lists with DCG is easier to read than the corresponding predicate version. Of course, describing the additional sequence in reverse order is relatively straightforward. But as soon as you want to describe more complex structures, for example, the cloverleaf tRNA DCG structure comes into reality.

+6


source


Solution with maplist / 3 and reverse / 2:



dnamatch(A,B) :- reverse(B,C), maplist(pairmatch,A,C).

      

+1


source


If you want to avoid double pass, you can also do it like this:

rev_comp(DNA, RC) :-
    rev_comp(DNA, [], RC).

rev_comp([], RC, RC).
rev_comp([X|Xs], RC0, RC) :-
    pair(X, Y),
    rev_comp(Xs, [Y|RC0], RC).

      

Then:

?- rev_comp([t,c,g,a], RC).
RC = [t, c, g, a].

      

This is just a combination of symbols reverse

and maplist

. Is it worth it? Maybe, maybe not. Probably not.

Now that I've thought about it a bit, you can also do it with foldl

that changes direction, but now you really want to reverse it, so it's more useful than annoying.

rev_comp([], []).
rev_comp([X|Xs], Ys) :-
    pair(X, Y),
    foldl(rc, Xs, [Y], Ys).

rc(X, Ys, [Y|Ys]) :- pair(X, Y).

      

But this is even less obvious than the solution above, and the above solution is even less obvious than the solution from @Capellic , so maybe you can look at the code I wrote, but please don't write code like that unless of course you're not answering Stackoverflow questions and want to look smarter or impress a girl who asks for your help for university studies.

+1


source







All Articles